<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Summary-Expanded Entity Embeddings for Entity Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shahrzad Naseri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Foley</string-name>
          <email>jjfoley@smith.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Allan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brendan T. O'Connor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Information and Computer Sciences University of Massachusetts Amherst</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science Smith College</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Entity retrieval is an important part of any modern retrieval system and often satis es user information needs directly. Word and entity embeddings are a promising opportunity for new improvements in retrieval, especially in the presence of vocabulary mismatch problems. We present an approach to entity embedding that leverages the summary of entity articles from Wikipedia in order to form a richer representation of entities. We present a brief evaluation using the DBPedia-Entity-v2 dataset. Our evaluation shows that our new, summary-inspired representation provides improvements over both standard retrieval and pseudo-relevance feedback baselines as well as over a straightforward word-embedding model. We observe that this representation is particularly helpful for the verbose queries in the INEX-LD and QALD-2 subsets of our test collection.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recently, knowledge cards, conversational answers, and
other focused responses to user queries have become
possible for most search engines. Underlying most of
these answers in search engine response pages is search
based on knowledge graphs and the availability of rich
information for named entities. In particular, named
entities such as people, organizations, or concepts are
often provided as the focused response to user queries.
In a study of the Yahoo web search query logs, Pound
et al. [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] showed that more than 50% of the queries
Copyright © CIKM 2018 for the individual papers by the papers'
authors. Copyright © CIKM 2018 for the volume as a collection
by its editors. This volume and its papers are published under
target speci c entities or lists of entities. Since their
study, more entity-focused responses have appeared in
major web search engines.
      </p>
      <p>Of course, rich knowledge bases play a key role in the
use of entities in a search. Structured data published
in knowledge bases such as DBpedia1, Freebase2, and
YAGO3 continue to grow in a variety of languages. In
order to answer the queries directly from such
knowledge bases, the entity retrieval task has been de ned:
return a ranked list of entities relevant to the user's
query. This task is typically approached by nding
entities with a \meaning" that is similar to the query.</p>
      <p>
        Capturing that semantic (\meaning") similarity
between vocabulary terms, pieces of text, and sentences
has been a substantial problem in information retrieval
and natural language processing (NLP), for which
a wide variety of approaches have been introduced
[
        <xref ref-type="bibr" rid="ref10 ref37">10, 37</xref>
        ]. The word embeddings method assigns terms
a low-dimensional (compared to the vocabulary size)
vector and represents vocabulary terms by capturing
co-occurrence information between the terms, using
a likelihood approximation of the terms' appearance
within a window context. Word2vec [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and GloVe [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]
are examples of widely used word embeddings that are
obtained based on a neural network-based language
model and matrix factorization technique, respectively.
      </p>
      <p>
        There has been substantial work on de ning
embeddings for not just single words but for
entities [
        <xref ref-type="bibr" rid="ref24 ref45 ref46 ref49 ref8">45, 49, 8, 46, 24</xref>
        ], but there is no clear baseline for
ranking entities with such compressed semantic
representations. In fact, when trying to re-use task-speci c
entity embeddings for retrieval tasks, results can be
less than impressive: e.g., RDF2Vec [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] was designed
for data mining and has been shown to under-perform
simple retrieval baselines like BM25 on more speci c
tasks [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Although fully-deep models that leverage
entities exist [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ], often we do not have enough data
1http://dbpedia.org
2http://freebase.org
3http://www.mpi-inf.mpg.de/yago-naga/yago/
to train supervised embeddings.
      </p>
      <p>We propose a simple entity embedding model that
focuses on representing an entity based on other entities
crucial to its summary. Here, we use the entities that
appear inside a DBPedia abstract. Since we use links
present in the abstract, these entity mentions were
e ectively annotated by the human authors of those
articles.</p>
      <p>
        In summary, we investigate the problem of entity
retrieval for improving retrieval results using word and
entity embeddings. We use the queries of
DBpediaEntity (v2) dataset introduced by Hasibi et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
in order to evaluate our EntityVec representation on
its ability to directly rank entities. We demonstrate
that this is an e ective representation for use in entity
ranking, one that provides gains beyond those provided
by single-word embeddings and query expansion.
      </p>
      <p>The rest of this work is organized in the
following manner: We provide some background on entity
retrieval in Section 2. In Section 3 we present our
approach in detail. Finally, in Section 4 we
empirically validate our hypotheses and discuss conclusions
in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>In this section, we rst introduce some prior work in
entity retrieval. Then we discuss the key ideas behind
the word embedding techniques whose purpose is to
capture the semantic similarity between vocabulary
terms.</p>
      <p>
        Entities are useful for a diverse set of tasks including
but not limited to academic search [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ], entity
disambiguation [
        <xref ref-type="bibr" rid="ref49">49</xref>
        ], entity summarization [
        <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
        ],
knowledge graph completion [
        <xref ref-type="bibr" rid="ref24 ref46">46, 24</xref>
        ], etc. We will focus our
discussion on entity retrieval.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Entity Retrieval</title>
        <p>
          Entity ranking is a task that focuses on retrieving
entities in a knowledge base and presenting them in
ranked order in response to a users' information need.
This task was the focus of various benchmarking
campaigns including the INEX Entity Ranking track [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ],
the INEX Linked Data Track [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ], the TREC Entity
track [
          <xref ref-type="bibr" rid="ref3 ref41 ref6">41, 6, 3</xref>
          ], the Semantic Search Challenge [
          <xref ref-type="bibr" rid="ref17 ref7">7, 17</xref>
          ],
and the Question Answering over Linked Data (QALD)
challenge series [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. A common goal between all of
these campaigns was to address the users' need in
an entity-speci c way, instead of returning documents
which might contain unnecessary information.
However, these campaigns focused on di erent tasks such
as list search [
          <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
          ], related entity nding [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] and
question answering [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. All of the datasets from those
campaigns were combined into the DBPedia Entity
v1 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and v2 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] datasets.
2.1.1
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Leveraging Knowledge Bases for Entity</title>
      </sec>
      <sec id="sec-2-3">
        <title>Retrieval</title>
        <p>
          Existing methods typically study the use of type
information to improve entity retrieval accuracy [
          <xref ref-type="bibr" rid="ref2 ref21 ref4">4, 21, 2</xref>
          ].
Knowledge bases are typically represented as tuples of
relations, often formatted in the Resource Description
Framework (RDF) triple format. As a result, entities
have rich elded information and elded retrieval
methods such as BM25F [
          <xref ref-type="bibr" rid="ref20 ref32 ref39">39, 32, 20</xref>
          ] and F-SDM [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ] are
especially helpful. Zhiltzov et al. in particular propose
the use of name, attribute, categories, similar entities,
and related entities as the elds for a elded retrieval
model [
          <xref ref-type="bibr" rid="ref48">48</xref>
          ].
        </p>
        <p>
          To take advantage of both structured and
unstructured data, Schuhmacher et al. used a learning-to-rank
approach which incorporates di erent features of both
text and entities [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. Foley et al. expand on results
for their dataset by exploring minimal knowledge-base
features for use in learning-to-rank [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Both of these
studies leverage crowd-sourced judgments of entity
relevance for traditional TREC ad-hoc queries.
2.1.2
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Entity Retrieval without a Knowledge</title>
      </sec>
      <sec id="sec-2-5">
        <title>Base</title>
        <p>
          There have also been e orts to answer entity queries
that cannot be satis ed via information in the
knowledge bases due to the various ways of addressing an
entity in the query. In earlier work on expert nding,
entities were de ned by their locations in text [
          <xref ref-type="bibr" rid="ref1 ref33">1, 33</xref>
          ].
More recently, Hong et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] tried to enrich their
knowledge base using linked web pages and queries
from a query log. In addition, Grause et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] tried
to present a dynamic representation for entities by
collecting di erent representation from a variety of
resources and combine them together.
        </p>
        <p>In this work, we focus on entities that can be found
in knowledge bases.
2.2</p>
      </sec>
      <sec id="sec-2-6">
        <title>Neural and Embedding Approaches for</title>
      </sec>
      <sec id="sec-2-7">
        <title>Entity Retrieval</title>
        <p>As our primary direction of study for this work is
toward an entity representation to improve retrieval,
the most relevant e orts are those that leverage word
or entity embeddings in their ranking tasks.</p>
        <p>
          Word embedding techniques learn a low-dimensional
vector (compared to the vocabulary size) for each
vocabulary term in which the similarity between the word
vectors captures the semantic as well as the syntactic
similarities between the corresponding words. Word
embeddings are unsupervised learning methods since
they only need raw textual data without any labels.
There are di erent methods to compute the word
embeddings. One of the most popular methods is using
neural networks to predict words based on the
context of a text. Mikolov et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] introduced word2vec
that learns vector representation of words via a neural
network with a single layer. Word2vec is proposed
in two ways, CBOW and Skip-gram. CBOW tries to
predict the word based on the context, i.e., neighboring
words. Skip-gram tries to predict the context. Given
the word w, it tries to predict the probability of word
w0 being in a xed window of word w. Another model
for learning embedding vectors is based on matrix
factorization, e.g., GloVe vectors [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. Although many
variants of word embeddings exist, skipgram
embeddings are quite e cient and not signi cantly di erent
from other variations if tuned correctly [
          <xref ref-type="bibr" rid="ref23 ref27">27, 23</xref>
          ].
        </p>
        <p>
          Xiong et al. propose a model for ad-hoc document
retrieval that represents documents in queries in both
text and entity spaces, leveraging entity embeddings in
their approach [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ]. However, such deep models require
a signi cant quantity of training data to learn e ective
models, and our approach uses far less supervision than
this direction.
        </p>
        <p>
          Entity embeddings are also used for academic
search [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ], for entity disambiguation [
          <xref ref-type="bibr" rid="ref49">49</xref>
          ], for
question answering [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and for knowledge graph
completion [
          <xref ref-type="bibr" rid="ref24 ref46">46, 24</xref>
          ]. The benchmark paper for TREC-CAR
(Complex Answer Retrieval) determined that RDF2Vec
entity embeddings [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] are not as e ective as BM25 for
their entity-focused paragraph ranking task [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Our
survey of related work suggests that opportunities to
customize entity vectors for ranking remain relatively
unexplored.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Embedding-Based Entity Retrieval</title>
      <p>
        Vocabulary mismatch is a long-standing problem in
information retrieval. Previous work [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ] has proposed
to incorporate word embeddings to solve this problem.
In this paper, we investigate the e ect of word
embeddings in entity retrieval with the goal of solving
vocabulary mismatches.
      </p>
      <p>
        Moreover, since in entity retrieval we retrieve entities
instead of documents, and since most of the queries are
entity centric, we learn an embedding representation
for entities and explore the e ect of those embeddings
on entity retrieval. We hypothesize that mapping the
query to the entity space and comparing with the
retrieved entities will improve the retrieval results. In
this section, we describe our approach to validate our
hypothesis that incorporating word embeddings and
entity embeddings enhances entity retrieval accuracy. We
also discuss query expansion [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], an approach that also
attempts to address the vocabulary gap by augmenting
the query with additional related words.
3.1
      </p>
      <sec id="sec-3-1">
        <title>General Scheme of Retrieval</title>
        <p>
          Given a query, q, that targets a speci c entity, our task
is to return a ranked list of entities likely to be relevant.
In this case, each entity is represented by a short textual
description. In our experiments, for example, we used
the short abstract of each entity available in DBpedia.
A list of candidate entities will also be retrieved using
term-based retrieval models such as query likelihood
model [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], e ciently creating a large pool of candidate
matches.
        </p>
        <p>In our model, we try to enhance the accuracy of
entity retrieval by representing queries and entities by
their corresponding embedding vectors. We explore
two methods to represent query and entity
embedding vectors, which we refer to them as WordVec and
EntityVec models.</p>
        <p>
          In the WordVec model each query is represented by
the average of the embedding vector of the query's
terms. Entities are also represented in a similar way,
by averaging over the embedding vectors of the terms in
the entity's abstract. The GloVe [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] pre-trained word
embedding is used for the words embedding vector in
the WordVec model.
        </p>
        <p>
          In the EntityVec model, an embedding vector for
entities is learned based on the Skip-gram model
implemented in gensim [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]. To learn this embedding,
following the approach presented in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], we replace the
Wikipedia pages' hyperlinks (links referring to other
pages, i.e., entities) with a placeholder representing the
entity. Consider the following excerpt, where links to
other Wikipedia articles (entities) are represented by
italics:
        </p>
        <p>Harry Potter is a series of fantasy novels
written by British author J. K. Rowling. The
novels chronicle the life of a young wizard, Harry
Potter, and his friends Hermione Granger and
Ron Weasley, all of whom are students at
Hogwarts School of Witchcraft and Wizardry
The excerpt will be replaced by:</p>
        <p>Harry Potter is a series of
Fantasy literature written by British
author J. K. Rowling. The novels chronicle
the life of a young Magician (fantasy),
Harry Potter (character), and his friends
Hermione Granger and Ron Weasley, all of
whom are students at Hogwarts
where the link is replaced by the corresponding article's
title and spaces are replaced by underscores. Now each
entity in the original excerpt is considered as a single
\term", and an embedding is learned based on the
Skipgram model.</p>
        <p>As mentioned before, entities are represented by the
abstract available in DBpedia. To also consider this
representation, the nal embedding of a target entity
is obtained by averaging over the embedding vectors of
referred entities appeared in the abstract of the target
entity.</p>
        <p>
          In the EntityVec model, queries are represented by
the average of the embedding vectors of the entities
in the query. The entities in the query are annotated
using TagMe [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] mention detection tool.
        </p>
        <p>For both WordVec and EntityVec the similarity
between query and the document is calculated by cosine
similarity between their respective embedding vectors.</p>
        <p>The nal entity retrieval score is obtained by linear
interpolation of the baseline, WordVec, and EntityVec
models.</p>
        <p>Table 1 reports the learning corpora for WordVec
and EntityVec models. Moreover, we summarize the
nal embedding vector for query and entity in table 2.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Query Expansion</title>
        <p>In an intuitive sense, query and document embedding
models solve the vocabulary mismatch problem by
virtue of expanding the representation. Therefore, it
makes sense to compare our work to techniques in the
query-expansion literature.</p>
        <p>
          Lavrenko and Croft introduce relevance modeling, an
approach to query expansion that derives a probabilistic
model of term importance from documents that receive
high scores, given the initial query [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. They present
a number of models, but the most utilized version is
RM3, which is a mixture model between the top k
expansion terms and the original query. Expansion
terms (t) are given the following weights derived from
a set of pseudo-relevant documents DQ a query Q:
w(t) =
        </p>
        <p>d2DQ
1 X P (djQ)P (tjd)</p>
        <p>Z</p>
        <p>
          Terms that occur frequently P (tjd) in high-scoring
documents P (djQ) are given the most weight in the
expansion. The Z is merely a normalizer allowing for
the weights to be turned into a probability distribution
over terms that occur in the pseudo-relevant document
set DQ. This baseline is often used for comparison in
entity-focused retrieval literature [
          <xref ref-type="bibr" rid="ref40 ref43 ref9">9, 40, 43</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>In this section, we introduce our experimental setup,
baselines, and evaluation metrics. Next, we report and
discuss our result.
4.1</p>
      <sec id="sec-4-1">
        <title>Data set</title>
        <p>
          Our experiments are conducted on the entity search
test collection DBpedia-Entity v2 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. This dataset
originally consists of queries gathered from the seven
previous competitions with relevance judgment on
entities from DBpedia version 2015-10.
        </p>
        <p>
          For word embeddings, we used the GloVe [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
pretrained word embedding with 300 dimensions. The
word embeddings were extracted from a 6 billion token
collection (the Wikipedia dump 2014 plus the
Gigawords 5).
        </p>
        <p>To train the entity embeddings, we used the full
article of Wikipedia pages obtained from the DBpedia
2016-10 dump.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Data Processing</title>
        <p>Retrieval results were obtained using the index built
from the abstract of the entities.</p>
        <p>
          We used TagMe [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] as the mention detection tool
for the entities in the queries. We used the Word2Vec
implementation in gensim [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] for learning entities
embeddings { i.e. EntityVec. As mentioned previously,
to obtain EntityVec embeddings we followed the
approach outlined by Ni et al. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and replaced the
outbound hyperlinks to Wikipedia pages with a unique
placeholder token. We learn embeddings of 3.0 million
entities out of 4.8 million entities in Wikipedia.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Hyperparameter Settings</title>
        <p>The parameter of the language modeling approach
is obtained by 2-fold cross validation over the queries.
The parameter is chosen from the set f100, 500, 1000,
1500g. To tune the RM3 hyperparameters { i.e., the
original query's weight and the number of expansion
terms { we use 2-fold and 5-fold cross-validation. The
original weight is changed from 0:1 to 0:9 in increments
of 0:1, and the number of terms is changed from 10
to 90 in increments of 20. With the tuned parameter
with 2-folds and 5-folds, RM3 for short queries did
not improve over the Language model approach. We
note that there were another parameter settings that
did improve RM3 over the language model but they
were not discoverable in the 2-fold or 5-fold approaches.
When we report RM3 results (Table 5), we report the
results for 2-fold cross-validation.</p>
        <p>The parameters for learning the EntityVec
embeddings are as follows: window-size = 10, sub-sampling
= 1e-3, cuto min-count = 0. The learned embedding
dimension is equal to 200 and it is learned based on
Skip-gram model.
Mean Average Precision (MAP) of the top-ranked 1000
documents is selected as the main evaluation metric to
evaluate the retrieval e ectiveness. Furthermore, we
consider precision of the top 10 retrieved documents
(P@10). Since we have graded relevance judgment, we
also report nDCG@10. Statistically signi cant di
erences in performances are determined using the
twotailed paired t-test computed at a 95% con dence level
based on the average precision per query.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        In this section, we explore the results of our entity
representation models atop two baselines. We look at
both a standard unigram approach { language modeling
(LM) [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] { and an approach built on query expansion
{ relevance modeling (RM3).
      </p>
      <p>In Table 3, we present the results of our model on
top of the LM baseline for short and verbose query
subsets as well as their union. We discuss the results
of our models with respect to query length in
Section 5.2. This Table is the appropriate table to look at
overall results of our models, particularly in the \All
Queries" section. Both proposed methods outperform
the baseline LM model, suggesting that there is value
in both our EntityVec representation and in the more
traditional WordVec query expansion. Combining the
two methods yields even greater accuracy across all
measures.</p>
      <p>In Table 4, we present the results of our di erent
models atop LM using the traditional dataset subsets
inside of DBPedia-Entity-v2. Since these datasets were
originally constructed for di erent variations of the
entity ranking task, we were curious if their di erent
query types would yield di erent results. We discuss
the results in terms of the di erent styles of queries in
Section 5.3.</p>
      <p>Finally, in Table 5, we examine our approaches on
top of a baseline with query expansion built-in. We
discuss the results of our models on this expanded
baseline in Section 5.4.
5.1
In the result tables, relative improvements over the
base retrieval models { i.e. LM and RM3 { are shown
as percentages to the right of the scores. Win/Tie/Loss
shows the number of queries improved, unchanged, or
hurt, respectively, comparing with the base retrieval
models and using the MAP measure. y; z, and x indicate
statistical signi cance over the (base retrieval model),
(base retrieval model)+WordVec, and (base retrieval
model)+EntityVec, respectively. As mentioned earlier
we use two base retrieval models (LM and RM3). The
best method for each metric is marked bold.
5.2</p>
      <sec id="sec-5-1">
        <title>Entity Representations for Short and Verbose Queries</title>
        <p>We found that results were quite di erent for verbose
queries (de ned as queries longer than four terms)
and short queries, so our tables are broken into three
sections to re ect the overall dataset and these
querylength subsets.</p>
        <p>Based on the results in Table 3 we can see that both
WordVec and EntityVec improve verbose queries more
than they improve short queries (particularly measured
by MAP). We speculate this could be due to short
queries being more prone to ambiguity, so those better
query representations are built from verbose queries
where the additional words provide disambiguation and
thus better matching of related entities. Also for the
WordVec model, it seems that the embedding of a short
query does not seem to help improve matching signi
cantly. It is also possible that some short queries are
more speci c so the embedding (implicitly
incorporating related words) is less important. Further analysis
is needed to understand this behavior fully, but we
recommend that systems that use entity representations
consider using query length to select an appropriate
model.</p>
        <p>If we now look at the win/tie/loss analysis for these
queries at the far right of Table 3, we can see that there
are many ties. This is a result of some queries lacking
entities in their description. In the current version of
our model, we cannot generate an entity representation
if our entity linker (TagMe, in this case) does not
identify any entities in queries, so each representation
is identical. Even ignoring ties, we can see that there
are more wins than losses so that our vector modeling
approaches are helpful when entities are identi ed, and
the magnitude of MAP improvements is higher for
EntityVec than for WordVec, even though WordVec
can be used for all queries and EntityVec only changes
a subset.</p>
        <p>We further note that combining WordVec and
EntityVec results in additional gains, indicating that
the two methods are complementary, capturing di
er</p>
        <p>+4.75%
+6.45%
+9.91%</p>
        <p>nDCG@10
0.2742
0.2863y
0.2871y
0.2983yzx
When we investigate the e ect of our entity vector
models on di erent types of queries, we can see some more
interesting results in Table 4. Since the queries are of
such diverse types, it is not surprising to observe some
variation. We see that the WordVec model does not
show a signi cant improvement in the SemSearch-ES
and QALD-2 results. Since SemSearch-ES queries are
mostly ambiguous keyword queries, it is possible that
the WordVec representations are not speci c enough to
be helpful.
5.4</p>
      </sec>
      <sec id="sec-5-2">
        <title>Entity Representations and Query Expansion</title>
        <p>
          Finally, we evaluate the proposed methods in the
pseudo-relevance feedback scenario. We choose RM3
which is a state-of-the-art PRF method that has been
shown to perform well in various collections [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
Table 5 shows the results for the proposed methods and
the RM3 baseline.
        </p>
        <p>We observe the same kind of improvements over the
RM3 baseline with our WordVec and EntityVec models
that we saw on top of our keyword-query baseline. This
is a really interesting observation because it shows that
our embedding models are somehow orthogonal to a
state-of-the-art query expansion model, which is often
pointed to as the source of improvement for embedding
approaches.</p>
        <p>We note that in this dataset, the RM3 methods
actually lowers the e ectiveness for short queries
compared to using LM alone. The WordVec and EntityVec
models compensate somewhere for that reduction, but
are not su cient to recover all of the loss.</p>
        <p>In future work, we hope to analyze the relevant
entities discovered by our embedding approaches that
are not present in the RM3 baselines in order to better
understand where our improvements are coming from.
For the EntityVec gains, we hypothesize that we have
been able to encode critical information about the
entity graph by modifying entity vectors to include
their most important neighbors.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion And Future Work</title>
      <p>In this study, we expanded on traditional entity
embeddings by incorporating information from related
entities that are mentioned in their summary. We
demonstrated the e cacy of this model on a
popular entity ranking collection in comparison to simpler
word2vec style models and traditional retrieval
models. In our comparison to RM3, a pseudo-relevance
feedback query-expansion approach, we demonstrate
that the utility of our entity modeling is not limited to
query expansion { or at least, it provides a useful and
novel method of query expansion in comparison to this
popular approach.</p>
      <p>In order to fully validate our model, we intend to
compare it to other unsupervised and semi-supervised
entity embedding representations. We hope to explore
more comparisons in future work, as well as more
variations of our entity embedding model.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>This work was supported in part by the Center for
Intelligent Information Retrieval and in part by NSF
grant #IIS-1617408. Any opinions, ndings and
conclusions or recommendations expressed in this material</p>
      <p>ListSearch</p>
      <p>-2.82%
+0.96%
+2.21%
are those of the authors and do not necessarily re ect
those of the sponsors.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. De Rijke</surname>
          </string-name>
          .
          <article-title>Formal models for expert nding in enterprise corpora</article-title>
          .
          <source>In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>43</volume>
          {
          <fpage>50</fpage>
          . ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bron</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. De Rijke</surname>
          </string-name>
          .
          <article-title>Query modeling for entity search based on terms, categories, and examples</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ):
          <fpage>22</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Carmel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Arjen</surname>
          </string-name>
          . de vries, daniel m.
          <article-title>herzig, peter mika, haggai roitman, ralf schenkel, pavel serdyukov, thanh tran duc</article-title>
          .
          <source>In The rst joint international workshop on entityoriented and semantic search (JIWES)</source>
          ,
          <source>ACM SIGIR Forum</source>
          , volume
          <volume>46</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Neumayer</surname>
          </string-name>
          .
          <article-title>Hierarchical target type identi cation for entity-oriented queries</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          , pages
          <volume>2391</volume>
          {
          <fpage>2394</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Neumayer</surname>
          </string-name>
          .
          <article-title>A test collection for entity search in dbpedia</article-title>
          .
          <source>In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>737</volume>
          {
          <fpage>740</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Serdyukov</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. P. d.</given-names>
            <surname>Vries</surname>
          </string-name>
          .
          <article-title>Overview of the trec 2010 entity track</article-title>
          .
          <source>Technical report, NORWEGIAN UNIV OF SCIENCE AND TECHNOLOGY TRONDHEIM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Blanco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Halpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Herzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pound</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Thompson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. T.</given-names>
            <surname>Duc</surname>
          </string-name>
          .
          <article-title>Entity search evaluation over structured web data</article-title>
          .
          <source>In Proceedings of the 1st international workshop on entity-oriented search workshop (SIGIR</source>
          <year>2011</year>
          ), ACM, New York,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          .
          <article-title>Open question answering with weakly supervised embedding models</article-title>
          .
          <source>In Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , pages
          <volume>165</volume>
          {
          <fpage>180</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dietz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Allan</surname>
          </string-name>
          .
          <article-title>Entity query feature expansion using knowledge base links</article-title>
          .
          <source>In Proceedings of the 37th international ACM SIGIR conference on Research &amp; development in information retrieval</source>
          , pages
          <volume>365</volume>
          {
          <fpage>374</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deerwester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Furnas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Harshman</surname>
          </string-name>
          .
          <article-title>Indexing by latent semantic analysis</article-title>
          .
          <source>Journal of the American society for information science</source>
          ,
          <volume>41</volume>
          (
          <issue>6</issue>
          ):
          <fpage>391</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Demartini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Iofciu</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. P. De Vries</surname>
          </string-name>
          .
          <article-title>Overview of the inex 2009 entity ranking track</article-title>
          .
          <source>In International Workshop of the Initiative for the Evaluation of XML Retrieval</source>
          , pages
          <volume>254</volume>
          {
          <fpage>264</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferragina</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Scaiella</surname>
          </string-name>
          .
          <article-title>Fast and accurate annotation of short texts with wikipedia pages</article-title>
          .
          <source>IEEE software</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <volume>70</volume>
          {
          <fpage>75</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Foley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and J.</given-names>
            <surname>Allan</surname>
          </string-name>
          .
          <article-title>Improving entity ranking for keyword queries</article-title>
          .
          <source>In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <year>2061</year>
          {
          <year>2064</year>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Graus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tsagkias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Weerkamp</surname>
          </string-name>
          , E. Meij, and M. de Rijke.
          <article-title>Dynamic collective entity representations for entity ranking</article-title>
          .
          <source>In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining</source>
          , pages
          <volume>595</volume>
          {
          <fpage>604</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gunaratna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thirunarayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          , and G. Cheng.
          <article-title>Gleaning types for literals in rdf triples with application to entity summarization</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <volume>85</volume>
          {
          <fpage>100</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gunaratna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thirunarayan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          . Faces:
          <article-title>Diversity-aware entity summarization using incremental hierarchical conceptual clustering</article-title>
          .
          <source>In AAAI</source>
          , pages
          <volume>116</volume>
          {
          <fpage>122</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Halpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Herzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Blanco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pound</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Thompon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          .
          <article-title>Evaluating ad-hoc object retrieval</article-title>
          .
          <source>In IWEST@ ISWC</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hasibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nikolaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Bratsberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kotov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Dbpediaentity v2: A test collection for entity search</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>1265</volume>
          {
          <fpage>1268</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>D.</surname>
          </string-name>
          Hakkani-Tur.
          <article-title>Entity ranking for descriptive queries</article-title>
          .
          <source>In Spoken Language Technology Workshop (SLT)</source>
          ,
          <year>2014</year>
          IEEE, pages
          <volume>200</volume>
          {
          <fpage>205</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K. Y.</given-names>
            <surname>Itakura</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Clarke</surname>
          </string-name>
          .
          <article-title>A framework for bm25f-based xml retrieval</article-title>
          .
          <source>In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>843</volume>
          {
          <fpage>844</fpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaptein</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          .
          <article-title>Exploiting the category structure of wikipedia for entity ranking</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>194</volume>
          :
          <fpage>111</fpage>
          {
          <fpage>129</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Relevance based language models</article-title>
          .
          <source>In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>120</volume>
          {
          <fpage>127</fpage>
          . ACM,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Dagan</surname>
          </string-name>
          .
          <article-title>Improving distributional similarity with lessons learned from word embeddings</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>3</volume>
          :
          <fpage>211</fpage>
          {
          <fpage>225</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>Learning entity and relation embeddings for knowledge graph completion</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>15</volume>
          , pages
          <fpage>2181</fpage>
          {
          <fpage>2187</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Unger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Motta</surname>
          </string-name>
          .
          <article-title>Evaluating question answering over linked data</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>21</volume>
          :3{
          <fpage>13</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <article-title>A comparative study of methods for estimating query language models with pseudo feedback</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Information and knowledge management</source>
          , pages
          <year>1895</year>
          {
          <year>1898</year>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Efcient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>F.</given-names>
            <surname>Nanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnusson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Dietz</surname>
          </string-name>
          .
          <article-title>Benchmark for complex answer retrieval</article-title>
          .
          <source>In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval</source>
          , pages
          <volume>293</volume>
          {
          <fpage>296</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sheinwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Cao</surname>
          </string-name>
          .
          <article-title>Semantic documents relatedness using concept graph representation</article-title>
          .
          <source>In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining</source>
          , pages
          <volume>635</volume>
          {
          <fpage>644</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          {
          <fpage>1543</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>J. R.</surname>
            Perez-Aguera,
            <given-names>J.</given-names>
            Arroyo, J.
          </string-name>
          <string-name>
            <surname>Greenberg</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Iglesias</surname>
            , and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Fresno</surname>
          </string-name>
          .
          <article-title>Using bm25f for semantic search</article-title>
          .
          <source>In Proceedings of the 3rd international semantic search workshop, page 2. ACM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>D.</given-names>
            <surname>Petkova</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Hierarchical language models for expert nding in enterprise corpora</article-title>
          .
          <source>International Journal on Arti cial Intelligence Tools</source>
          ,
          <volume>17</volume>
          (
          <issue>01</issue>
          ):
          <volume>5</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Ponte</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>A language modeling approach to information retrieval</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>275</volume>
          {
          <fpage>281</fpage>
          . ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pound</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <article-title>Ad-hoc object retrieval in the web of data</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web</source>
          , pages
          <volume>771</volume>
          {
          <fpage>780</fpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rehurek</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Sojka</surname>
          </string-name>
          .
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          .
          <source>In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          , pages
          <volume>45</volume>
          {
          <fpage>50</fpage>
          ,
          <string-name>
            <surname>Valletta</surname>
          </string-name>
          , Malta, May
          <year>2010</year>
          . ELRA. http://is.muni.cz/ publication/884893/en.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnik</surname>
          </string-name>
          .
          <article-title>Using information content to evaluate semantic similarity in a taxonomy</article-title>
          .
          <source>arXiv preprint cmp-lg/9511007</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          . Rdf2vec:
          <article-title>Rdf graph embeddings for data mining</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <volume>498</volume>
          {
          <fpage>514</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <article-title>Simple bm25 extension to multiple weighted elds</article-title>
          .
          <source>In Proceedings of the thirteenth ACM international conference on Information and knowledge management</source>
          , pages
          <volume>42</volume>
          {
          <fpage>49</fpage>
          . ACM,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuhmacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dietz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. Paolo</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          .
          <article-title>Ranking entities for web queries through text and knowledge</article-title>
          .
          <source>In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <volume>1461</volume>
          {
          <fpage>1470</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>P.</given-names>
            <surname>Serdyukov</surname>
          </string-name>
          and
          <string-name>
            <surname>A. De Vries</surname>
          </string-name>
          . Delft university at the trec
          <year>2009</year>
          <article-title>entity track: Ranking wikipedia entities</article-title>
          .
          <source>Technical report, DELFT UNIV OF TECHNOLOGY (NETHERLANDS)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Camps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gurajada</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Mishra</surname>
          </string-name>
          .
          <article-title>Overview of the inex 2012 linked data track</article-title>
          .
          <source>In CLEF (Online Working Notes/Labs/Workshop)</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          . Esdrank:
          <article-title>Connecting query and documents through external semistructured data</article-title>
          .
          <source>In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <volume>951</volume>
          {
          <fpage>960</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          , and T.-Y. Liu.
          <article-title>Word-entity duet representations for document ranking</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>763</volume>
          {
          <fpage>772</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Power</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Explicit semantic ranking for academic search via knowledge graph embedding</article-title>
          .
          <source>In Proceedings of the 26th international conference on world wide web</source>
          , pages
          <volume>1271</volume>
          {
          <fpage>1279</fpage>
          . International World Wide Web Conferences Steering Committee,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          .
          <article-title>Embedding entities and relations for learning and inference in knowledge bases</article-title>
          .
          <source>arXiv preprint arXiv:1412.6575</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Embedding-based query language models</article-title>
          .
          <source>In Proceedings of the 2016 ACM international conference on the theory of information retrieval</source>
          , pages
          <volume>147</volume>
          {
          <fpage>156</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>N.</given-names>
            <surname>Zhiltsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kotov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Nikolaev</surname>
          </string-name>
          .
          <article-title>Fielded sequential dependence model for ad-hoc entity retrieval in the web of data</article-title>
          .
          <source>In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>253</volume>
          {
          <fpage>262</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zwicklbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Seifert</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Granitzer</surname>
          </string-name>
          .
          <article-title>Robust and collective entity disambiguation through semantic embeddings</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>425</volume>
          {
          <fpage>434</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>