=Paper=
{{Paper
|id=Vol-2482/paper7
|storemode=property
|title=Exploring Summary-Expanded Entity Embeddings for Entity Retrieval
|pdfUrl=https://ceur-ws.org/Vol-2482/paper7.pdf
|volume=Vol-2482
|authors=Shahrzad Naseri,John Foley,James Allan,Brendan T. O’Connor
|dblpUrl=https://dblp.org/rec/conf/cikm/NaseriFAO18
}}
==Exploring Summary-Expanded Entity Embeddings for Entity Retrieval==
<pdf width="1500px">https://ceur-ws.org/Vol-2482/paper7.pdf</pdf>
<pre>
    Exploring Summary-Expanded Entity Embeddings for
                    Entity Retrieval

           Shahrzad Naseri1               John Foley2            James Allan1           Brendan T. O’Connor1
                1                                                        2
                    College of Information and Computer Sciences             Department of Computer Science
                        University of Massachusetts Amherst                           Smith College
                      {shnaseri,allan,brenocon}@cs.umass.edu                       jjfoley@smith.edu


                                                                   target specific entities or lists of entities. Since their
                                                                   study, more entity-focused responses have appeared in
                          Abstract                                 major web search engines.
                                                                      Of course, rich knowledge bases play a key role in the
     Entity retrieval is an important part of any                  use of entities in a search. Structured data published
     modern retrieval system and often satisfies user              in knowledge bases such as DBpedia1 , Freebase2 , and
     information needs directly. Word and entity                   YAGO3 continue to grow in a variety of languages. In
     embeddings are a promising opportunity for                    order to answer the queries directly from such knowl-
     new improvements in retrieval, especially in                  edge bases, the entity retrieval task has been defined:
     the presence of vocabulary mismatch problems.                 return a ranked list of entities relevant to the user’s
    We present an approach to entity embed-                        query. This task is typically approached by finding
    ding that leverages the summary of entity                      entities with a “meaning” that is similar to the query.
    articles from Wikipedia in order to form a                        Capturing that semantic (“meaning”) similarity be-
    richer representation of entities. We present a                tween vocabulary terms, pieces of text, and sentences
    brief evaluation using the DBPedia-Entity-v2                   has been a substantial problem in information retrieval
    dataset. Our evaluation shows that our new,                    and natural language processing (NLP), for which
    summary-inspired representation provides im-                   a wide variety of approaches have been introduced
    provements over both standard retrieval and                    [10, 37]. The word embeddings method assigns terms
    pseudo-relevance feedback baselines as well as                 a low-dimensional (compared to the vocabulary size)
    over a straightforward word-embedding model.                   vector and represents vocabulary terms by capturing
    We observe that this representation is partic-                 co-occurrence information between the terms, using
    ularly helpful for the verbose queries in the                  a likelihood approximation of the terms’ appearance
    INEX-LD and QALD-2 subsets of our test col-                    within a window context. Word2vec [28] and GloVe [31]
    lection.                                                       are examples of widely used word embeddings that are
                                                                   obtained based on a neural network-based language
1    Introduction                                                  model and matrix factorization technique, respectively.
                                                                      There has been substantial work on defining em-
Recently, knowledge cards, conversational answers, and             beddings for not just single words but for enti-
other focused responses to user queries have become                ties [45, 49, 8, 46, 24], but there is no clear baseline for
possible for most search engines. Underlying most of               ranking entities with such compressed semantic repre-
these answers in search engine response pages is search            sentations. In fact, when trying to re-use task-specific
based on knowledge graphs and the availability of rich             entity embeddings for retrieval tasks, results can be
information for named entities. In particular, named               less than impressive: e.g., RDF2Vec [38] was designed
entities such as people, organizations, or concepts are            for data mining and has been shown to under-perform
often provided as the focused response to user queries.            simple retrieval baselines like BM25 on more specific
In a study of the Yahoo web search query logs, Pound               tasks [29]. Although fully-deep models that leverage
et al. [35] showed that more than 50% of the queries               entities exist [44], often we do not have enough data
Copyright © CIKM 2018 for the individual papers by the papers'       1 http://dbpedia.org
                                                                     2 http://freebase.org
authors. Copyright © CIKM 2018 for the volume as a collection
                                                                     3 http://www.mpi-inf.mpg.de/yago-naga/yago/
by its editors. This volume and its papers are published under
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
to train supervised embeddings.                               2.1.1    Leveraging Knowledge Bases for Entity
   We propose a simple entity embedding model that                     Retrieval
focuses on representing an entity based on other entities
                                                              Existing methods typically study the use of type infor-
crucial to its summary. Here, we use the entities that
                                                              mation to improve entity retrieval accuracy [4, 21, 2].
appear inside a DBPedia abstract. Since we use links
                                                              Knowledge bases are typically represented as tuples of
present in the abstract, these entity mentions were
                                                              relations, often formatted in the Resource Description
effectively annotated by the human authors of those
                                                              Framework (RDF) triple format. As a result, entities
articles.
                                                              have rich fielded information and fielded retrieval meth-
   In summary, we investigate the problem of entity
                                                              ods such as BM25F [39, 32, 20] and F-SDM [48] are
retrieval for improving retrieval results using word and
                                                              especially helpful. Zhiltzov et al. in particular propose
entity embeddings. We use the queries of DBpedia-
                                                              the use of name, attribute, categories, similar entities,
Entity (v2) dataset introduced by Hasibi et al. [18]
                                                              and related entities as the fields for a fielded retrieval
in order to evaluate our EntityVec representation on
                                                              model [48].
its ability to directly rank entities. We demonstrate
                                                                 To take advantage of both structured and unstruc-
that this is an effective representation for use in entity
                                                              tured data, Schuhmacher et al. used a learning-to-rank
ranking, one that provides gains beyond those provided
                                                              approach which incorporates different features of both
by single-word embeddings and query expansion.
                                                              text and entities [40]. Foley et al. expand on results
   The rest of this work is organized in the follow-
                                                              for their dataset by exploring minimal knowledge-base
ing manner: We provide some background on entity
                                                              features for use in learning-to-rank [13]. Both of these
retrieval in Section 2. In Section 3 we present our
                                                              studies leverage crowd-sourced judgments of entity rel-
approach in detail. Finally, in Section 4 we empiri-
                                                              evance for traditional TREC ad-hoc queries.
cally validate our hypotheses and discuss conclusions
in Section 6.
                                                              2.1.2    Entity Retrieval without a Knowledge
2     Related Work                                                     Base

In this section, we first introduce some prior work in        There have also been efforts to answer entity queries
entity retrieval. Then we discuss the key ideas behind        that cannot be satisfied via information in the knowl-
the word embedding techniques whose purpose is to             edge bases due to the various ways of addressing an
capture the semantic similarity between vocabulary            entity in the query. In earlier work on expert finding,
terms.                                                        entities were defined by their locations in text [1, 33].
   Entities are useful for a diverse set of tasks including   More recently, Hong et al. [19] tried to enrich their
but not limited to academic search [45], entity disam-        knowledge base using linked web pages and queries
biguation [49], entity summarization [16, 15], knowl-         from a query log. In addition, Grause et al. [14] tried
edge graph completion [46, 24], etc. We will focus our        to present a dynamic representation for entities by
discussion on entity retrieval.                               collecting different representation from a variety of
                                                              resources and combine them together.
2.1   Entity Retrieval                                           In this work, we focus on entities that can be found
                                                              in knowledge bases.
Entity ranking is a task that focuses on retrieving
entities in a knowledge base and presenting them in           2.2     Neural and Embedding Approaches for
ranked order in response to a users’ information need.                Entity Retrieval
This task was the focus of various benchmarking cam-
paigns including the INEX Entity Ranking track [11],          As our primary direction of study for this work is
the INEX Linked Data Track [42], the TREC Entity              toward an entity representation to improve retrieval,
track [41, 6, 3], the Semantic Search Challenge [7, 17],      the most relevant efforts are those that leverage word
and the Question Answering over Linked Data (QALD)            or entity embeddings in their ranking tasks.
challenge series [25]. A common goal between all of              Word embedding techniques learn a low-dimensional
these campaigns was to address the users’ need in             vector (compared to the vocabulary size) for each vo-
an entity-specific way, instead of returning documents        cabulary term in which the similarity between the word
which might contain unnecessary information. How-             vectors captures the semantic as well as the syntactic
ever, these campaigns focused on different tasks such         similarities between the corresponding words. Word
as list search [3, 11], related entity finding [41] and       embeddings are unsupervised learning methods since
question answering [25]. All of the datasets from those       they only need raw textual data without any labels.
campaigns were combined into the DBPedia Entity               There are different methods to compute the word em-
v1 [5] and v2 [18] datasets.                                  beddings. One of the most popular methods is using
neural networks to predict words based on the con-            3.1    General Scheme of Retrieval
text of a text. Mikolov et al. [28] introduced word2vec
                                                              Given a query, q, that targets a specific entity, our task
that learns vector representation of words via a neural
                                                              is to return a ranked list of entities likely to be relevant.
network with a single layer. Word2vec is proposed
                                                              In this case, each entity is represented by a short textual
in two ways, CBOW and Skip-gram. CBOW tries to
                                                              description. In our experiments, for example, we used
predict the word based on the context, i.e., neighboring
                                                              the short abstract of each entity available in DBpedia.
words. Skip-gram tries to predict the context. Given
                                                              A list of candidate entities will also be retrieved using
the word w, it tries to predict the probability of word
                                                              term-based retrieval models such as query likelihood
w0 being in a fixed window of word w. Another model
                                                              model [34], efficiently creating a large pool of candidate
for learning embedding vectors is based on matrix fac-
                                                              matches.
torization, e.g., GloVe vectors [31]. Although many
                                                                  In our model, we try to enhance the accuracy of
variants of word embeddings exist, skipgram embed-
                                                              entity retrieval by representing queries and entities by
dings are quite efficient and not significantly different
                                                              their corresponding embedding vectors. We explore
from other variations if tuned correctly [27, 23].
                                                              two methods to represent query and entity embed-
   Xiong et al. propose a model for ad-hoc document           ding vectors, which we refer to them as WordVec and
retrieval that represents documents in queries in both        EntityVec models.
text and entity spaces, leveraging entity embeddings in           In the WordVec model each query is represented by
their approach [44]. However, such deep models require        the average of the embedding vector of the query’s
a significant quantity of training data to learn effective    terms. Entities are also represented in a similar way,
models, and our approach uses far less supervision than       by averaging over the embedding vectors of the terms in
this direction.                                               the entity’s abstract. The GloVe [31] pre-trained word
   Entity embeddings are also used for academic               embedding is used for the words embedding vector in
search [45], for entity disambiguation [49], for ques-        the WordVec model.
tion answering [8] and for knowledge graph comple-                In the EntityVec model, an embedding vector for
tion [46, 24]. The benchmark paper for TREC-CAR               entities is learned based on the Skip-gram model im-
(Complex Answer Retrieval) determined that RDF2Vec            plemented in gensim [36]. To learn this embedding,
entity embeddings [38] are not as effective as BM25 for       following the approach presented in [30], we replace the
their entity-focused paragraph ranking task [29]. Our         Wikipedia pages’ hyperlinks (links referring to other
survey of related work suggests that opportunities to         pages, i.e., entities) with a placeholder representing the
customize entity vectors for ranking remain relatively        entity. Consider the following excerpt, where links to
unexplored.                                                   other Wikipedia articles (entities) are represented by
                                                              italics:

3    Embedding-Based Entity Retrieval                               Harry Potter is a series of fantasy novels writ-
                                                                    ten by British author J. K. Rowling. The nov-
Vocabulary mismatch is a long-standing problem in                   els chronicle the life of a young wizard, Harry
information retrieval. Previous work [47] has proposed              Potter, and his friends Hermione Granger and
to incorporate word embeddings to solve this problem.               Ron Weasley, all of whom are students at Hog-
In this paper, we investigate the effect of word em-                warts School of Witchcraft and Wizardry
beddings in entity retrieval with the goal of solving
vocabulary mismatches.                                        The excerpt will be replaced by:
   Moreover, since in entity retrieval we retrieve entities         Harry     Potter    is   a    series    of
instead of documents, and since most of the queries are             Fantasy literature written by British
entity centric, we learn an embedding representation                author J. K. Rowling. The novels chronicle
for entities and explore the effect of those embeddings             the life of a young Magician (fantasy),
on entity retrieval. We hypothesize that mapping the                Harry Potter (character), and his friends
query to the entity space and comparing with the re-                Hermione Granger and Ron Weasley, all of
trieved entities will improve the retrieval results. In             whom are students at Hogwarts
this section, we describe our approach to validate our
hypothesis that incorporating word embeddings and en-         where the link is replaced by the corresponding article’s
tity embeddings enhances entity retrieval accuracy. We         title and spaces are replaced by underscores. Now each
also discuss query expansion [22], an approach that also       entity in the original excerpt is considered as a single
attempts to address the vocabulary gap by augmenting          “term”, and an embedding is learned based on the Skip-
the query with additional related words.                       gram model.
                                                            4     Experimental Setup
Table 1: Learning corpora for WordVec and EntityVec
embedding vectors                                           In this section, we introduce our experimental setup,
      Model                 Learning Corpora                baselines, and evaluation metrics. Next, we report and
                  Pre-trained GloVe word embedding          discuss our result.
   WordVec
                (6B tokens of Wikipedia + Gigawords 5)
                     Full article of Wikipedia pages        4.1   Data set
  EntityVec
                 pre-processed according to Section 3.1
                                                            Our experiments are conducted on the entity search
   As mentioned before, entities are represented by the     test collection DBpedia-Entity v2 [18]. This dataset
abstract available in DBpedia. To also consider this        originally consists of queries gathered from the seven
representation, the final embedding of a target entity      previous competitions with relevance judgment on en-
is obtained by averaging over the embedding vectors of      tities from DBpedia version 2015-10.
referred entities appeared in the abstract of the target        For word embeddings, we used the GloVe [31] pre-
entity.                                                     trained word embedding with 300 dimensions. The
   In the EntityVec model, queries are represented by       word embeddings were extracted from a 6 billion token
the average of the embedding vectors of the entities        collection (the Wikipedia dump 2014 plus the Giga-
in the query. The entities in the query are annotated       words 5).
using TagMe [12] mention detection tool.                        To train the entity embeddings, we used the full
   For both WordVec and EntityVec the similarity be-        article of Wikipedia pages obtained from the DBpedia
tween query and the document is calculated by cosine        2016-10 dump.
similarity between their respective embedding vectors.
   The final entity retrieval score is obtained by linear   4.2   Data Processing
interpolation of the baseline, WordVec, and EntityVec       Retrieval results were obtained using the index built
models.                                                     from the abstract of the entities.
   Table 1 reports the learning corpora for WordVec            We used TagMe [12] as the mention detection tool
and EntityVec models. Moreover, we summarize the            for the entities in the queries. We used the Word2Vec
final embedding vector for query and entity in table 2.     implementation in gensim [36] for learning entities em-
                                                            beddings – i.e. EntityVec. As mentioned previously,
3.2    Query Expansion                                      to obtain EntityVec embeddings we followed the ap-
                                                            proach outlined by Ni et al. [30] and replaced the out-
In an intuitive sense, query and document embedding
                                                            bound hyperlinks to Wikipedia pages with a unique
models solve the vocabulary mismatch problem by
                                                            placeholder token. We learn embeddings of 3.0 million
virtue of expanding the representation. Therefore, it
                                                            entities out of 4.8 million entities in Wikipedia.
makes sense to compare our work to techniques in the
query-expansion literature.
                                                            4.3   Hyperparameter Settings
   Lavrenko and Croft introduce relevance modeling, an
approach to query expansion that derives a probabilistic    The µ parameter of the language modeling approach
model of term importance from documents that receive        is obtained by 2-fold cross validation over the queries.
high scores, given the initial query [22]. They present     The µ parameter is chosen from the set {100, 500, 1000,
a number of models, but the most utilized version is        1500}. To tune the RM3 hyperparameters – i.e., the
RM3, which is a mixture model between the top k             original query’s weight and the number of expansion
expansion terms and the original query. Expansion           terms – we use 2-fold and 5-fold cross-validation. The
terms (t) are given the following weights derived from      original weight is changed from 0.1 to 0.9 in increments
a set of pseudo-relevant documents DQ a query Q:            of 0.1, and the number of terms is changed from 10
                                                            to 90 in increments of 20. With the tuned parameter
                       1 X                                  with 2-folds and 5-folds, RM3 for short queries did
              w(t) =       P (d|Q)P (t|d)
                       Z                                    not improve over the Language model approach. We
                         d∈DQ
                                                            note that there were another parameter settings that
   Terms that occur frequently P (t|d) in high-scoring      did improve RM3 over the language model but they
documents P (d|Q) are given the most weight in the          were not discoverable in the 2-fold or 5-fold approaches.
expansion. The Z is merely a normalizer allowing for        When we report RM3 results (Table 5), we report the
the weights to be turned into a probability distribution    results for 2-fold cross-validation.
over terms that occur in the pseudo-relevant document          The parameters for learning the EntityVec embed-
set DQ . This baseline is often used for comparison in      dings are as follows: window-size = 10, sub-sampling
entity-focused retrieval literature [9, 40, 43].            = 1e-3, cutoff min-count = 0. The learned embedding
                Table 2: Query and retrieved entity representations for WordVec and EntityVec models.
      Model                        Query                                                  Retrieved Entity
     WordVec     Average of query terms’ embedding vectors          Average of embedding vectors of terms in the entity’s abstract
    EntityVec    Average of query entities’ embedding vectors   Average of embedding vectors of referred entities in the entity’s abstract
dimension is equal to 200 and it is learned based on                    shows the number of queries improved, unchanged, or
Skip-gram model.                                                        hurt, respectively, comparing with the base retrieval
                                                                        models and using the MAP measure. †, ‡, and § indicate
4.4     Evaluation Metrics                                              statistical significance over the (base retrieval model),
                                                                        (base retrieval model)+WordVec, and (base retrieval
Mean Average Precision (MAP) of the top-ranked 1000
                                                                        model)+EntityVec, respectively. As mentioned earlier
documents is selected as the main evaluation metric to
                                                                        we use two base retrieval models (LM and RM3). The
evaluate the retrieval effectiveness. Furthermore, we
                                                                        best method for each metric is marked bold.
consider precision of the top 10 retrieved documents
(P@10). Since we have graded relevance judgment, we
also report nDCG@10. Statistically significant differ-                  5.2    Entity Representations for Short and Ver-
ences in performances are determined using the two-                            bose Queries
tailed paired t-test computed at a 95% confidence level                 We found that results were quite different for verbose
based on the average precision per query.                               queries (defined as queries longer than four terms)
                                                                        and short queries, so our tables are broken into three
5     Results                                                           sections to reflect the overall dataset and these query-
                                                                        length subsets.
In this section, we explore the results of our entity
                                                                            Based on the results in Table 3 we can see that both
representation models atop two baselines. We look at
                                                                        WordVec and EntityVec improve verbose queries more
both a standard unigram approach – language modeling
                                                                        than they improve short queries (particularly measured
(LM) [34] – and an approach built on query expansion
                                                                        by MAP). We speculate this could be due to short
– relevance modeling (RM3).
                                                                        queries being more prone to ambiguity, so those better
   In Table 3, we present the results of our model on
                                                                        query representations are built from verbose queries
top of the LM baseline for short and verbose query
                                                                        where the additional words provide disambiguation and
subsets as well as their union. We discuss the results
                                                                        thus better matching of related entities. Also for the
of our models with respect to query length in Sec-
                                                                        WordVec model, it seems that the embedding of a short
tion 5.2. This Table is the appropriate table to look at
                                                                        query does not seem to help improve matching signifi-
overall results of our models, particularly in the “All
                                                                        cantly. It is also possible that some short queries are
Queries” section. Both proposed methods outperform
                                                                        more specific so the embedding (implicitly incorporat-
the baseline LM model, suggesting that there is value
                                                                        ing related words) is less important. Further analysis
in both our EntityVec representation and in the more
                                                                        is needed to understand this behavior fully, but we rec-
traditional WordVec query expansion. Combining the
                                                                        ommend that systems that use entity representations
two methods yields even greater accuracy across all
                                                                        consider using query length to select an appropriate
measures.
                                                                        model.
   In Table 4, we present the results of our different
                                                                            If we now look at the win/tie/loss analysis for these
models atop LM using the traditional dataset subsets
                                                                        queries at the far right of Table 3, we can see that there
inside of DBPedia-Entity-v2. Since these datasets were
                                                                        are many ties. This is a result of some queries lacking
originally constructed for different variations of the
                                                                        entities in their description. In the current version of
entity ranking task, we were curious if their different
                                                                        our model, we cannot generate an entity representation
query types would yield different results. We discuss
                                                                        if our entity linker (TagMe, in this case) does not
the results in terms of the different styles of queries in
                                                                        identify any entities in queries, so each representation
Section 5.3.
                                                                        is identical. Even ignoring ties, we can see that there
   Finally, in Table 5, we examine our approaches on
                                                                        are more wins than losses so that our vector modeling
top of a baseline with query expansion built-in. We
                                                                        approaches are helpful when entities are identified, and
discuss the results of our models on this expanded
                                                                        the magnitude of MAP improvements is higher for
baseline in Section 5.4.
                                                                        EntityVec than for WordVec, even though WordVec
                                                                        can be used for all queries and EntityVec only changes
5.1     Table Notation and Significance Testing
                                                                        a subset.
In the result tables, relative improvements over the                        We further note that combining WordVec and
base retrieval models – i.e. LM and RM3 – are shown                     EntityVec results in additional gains, indicating that
as percentages to the right of the scores. Win/Tie/Loss                 the two methods are complementary, capturing differ-
Table 3: Effect of WordVec and EntityVec models on top of LM baseline for verbose, short queries and their
union. Notation explained in Section 5.1.
                                                                       Verbose Queries
  Method                                MAP@1000                    P@10                 nDCG@10       Win/Tie/Loss
  LM                                 0.1609      -            0.1992        -        0.2261       -         -
  LM + WordVec                       0.1708† +6.15%          0.2168†     +8.84%     0.2429†   +7.43%    171/14/77
  LM + EntityVec                     0.1731† +7.58%          0.2218†     +11.35%    0.2415†   +6.81%    162/28/72
  LM + WordVec + EntityVec          0.1786†‡  +11%          0.2328†‡§ +16.87% 0.2554†‡§ +12.96%         189/16/57
                                                                        Short Queries
  Method                                 MAP@1000                   P@10                 nDCG@10       Win/Tie/Loss
  LM                                  0.2445      -           0.2922        -        0.3357       -         -
  LM + WordVec                        0.2498  +2.17%          0.2956     +1.16%      0.3417   +1.79%    111/23/71
  LM + EntityVec                      0.2532  +3.56%          0.2985     +2.16%      0.3454   +2.89%    92/49/64
  LM + Wordvec + EntityVec          0.2635†‡§ +7.77%        0.3034†‡     +3.83%    0.3531†‡   +5.18%    135/20/50
                                                                         All Queries
  Method                                 MAP@1000                   P@10                 nDCG@10       Win/Tie/Loss
  LM                                  0.1976      -           0.2400        -        0.2742       -          -
  LM + WordVec                       0.2055†  +3.99%         0.2514†     +4.75%     0.2863†    4.41%    282/37/148
  LM + EntityVec                     0.2083†  +5.41%         0.2555†     +6.45%     0.2871†   +4.70%    254/77/136
  LM + WordVec + EntityVec          0.2159†‡§ +9.26%        0.2638 †‡§
                                                                         +9.91%   0.2983†‡§   +8.78%    324/36/107
ent aspects of entities that are each useful.                 models compensate somewhere for that reduction, but
                                                              are not sufficient to recover all of the loss.
5.3   Entity Representations           for      Different        In future work, we hope to analyze the relevant
      Query Sources                                           entities discovered by our embedding approaches that
                                                              are not present in the RM3 baselines in order to better
When we investigate the effect of our entity vector mod-      understand where our improvements are coming from.
els on different types of queries, we can see some more       For the EntityVec gains, we hypothesize that we have
interesting results in Table 4. Since the queries are of      been able to encode critical information about the
such diverse types, it is not surprising to observe some      entity graph by modifying entity vectors to include
variation. We see that the WordVec model does not             their most important neighbors.
show a significant improvement in the SemSearch-ES
and QALD-2 results. Since SemSearch-ES queries are            6    Conclusion And Future Work
mostly ambiguous keyword queries, it is possible that
                                                              In this study, we expanded on traditional entity em-
the WordVec representations are not specific enough to
                                                              beddings by incorporating information from related
be helpful.
                                                              entities that are mentioned in their summary. We
                                                              demonstrated the efficacy of this model on a popu-
5.4   Entity Representations and Query Expan-                 lar entity ranking collection in comparison to simpler
      sion                                                    word2vec style models and traditional retrieval mod-
Finally, we evaluate the proposed methods in the              els. In our comparison to RM3, a pseudo-relevance
pseudo-relevance feedback scenario. We choose RM3             feedback query-expansion approach, we demonstrate
which is a state-of-the-art PRF method that has been          that the utility of our entity modeling is not limited to
shown to perform well in various collections [26]. Ta-        query expansion – or at least, it provides a useful and
ble 5 shows the results for the proposed methods and          novel method of query expansion in comparison to this
the RM3 baseline.                                             popular approach.
                                                                 In order to fully validate our model, we intend to
   We observe the same kind of improvements over the
                                                              compare it to other unsupervised and semi-supervised
RM3 baseline with our WordVec and EntityVec models
                                                              entity embedding representations. We hope to explore
that we saw on top of our keyword-query baseline. This
                                                              more comparisons in future work, as well as more vari-
is a really interesting observation because it shows that
                                                              ations of our entity embedding model.
our embedding models are somehow orthogonal to a
state-of-the-art query expansion model, which is often
pointed to as the source of improvement for embedding         Acknowledgement
approaches.                                                   This work was supported in part by the Center for
   We note that in this dataset, the RM3 methods              Intelligent Information Retrieval and in part by NSF
actually lowers the effectiveness for short queries com-      grant #IIS-1617408. Any opinions, findings and con-
pared to using LM alone. The WordVec and EntityVec            clusions or recommendations expressed in this material
Table 4: Effect of WordVec and EntityVec models on top of LM baseline for different query types. Notation
explained in Section 5.1.
                                                               SemSearch-ES
 Method                              MAP@1000                P@10                nDCG@10        Win/Tie/Loss
 LM                               0.3188      -       0.2805        -        0.3901      -           -
 LM + WordVec                     0.3242  +1.69%      0.2726     -2.82%      0.3908   +0.18%     42/27/44
 LM + EntityVec                  0.3365†  +5.55%      0.2832‡   +0.96%       0.4014    +2.9%     45/45/23
 LM + WordVec + EntityVec         0.3358  +5.33%     0.2867‡    +2.21%       0.3995   +2.41%     57/15/41
                                                                  ListSearch
 Method                              MAP@1000                P@10                nDCG@10        Win/Tie/Loss
 LM                               0.1683      -       0.2800        -        0.2431      -           -
 LM + WordVec                     0.1724  +2.44%      0.2878    + 2.79%      0.2493   +2.55%     58/11/46
 LM + EntityVec                  0.1854†‡ +10.16%     0.2957    +5.61%       0.2597†  +6.83%      75/8/32
 LM + WordVec + EntityVec        0.1874†‡ +11.35%    0.2991†    +6.82%      0.2673†‡  +9.95%      76/5/34
                                                                  INEX-LD
 Method                              MAP@1000                P@10                nDCG@10        Win/Tie/Loss
 LM                               0.1593      -       0.2596        -        0.2800      -           -
 LM + WordVec                     0.1619  +1.63%      0.2747    +5.82%       0.2908   +3.86%      54/5/40
 LM + EntityVec                  0.1788†‡ +7.85%      0.2859†   +10.13%      0.3077†  +9.89%      62/9/28
 LM + WordVec + EntityVec       0.1837†‡§ +15.32%    0.2949†‡   +13.6%      0.3201†‡§ +14.32%     71/5/23
                                                                   QALD-2
 Method                              MAP@1000                P@10                nDCG@10        Win/Tie/Loss
 LM                               0.1554      -       0.1907        -        0.2224      -           -
 LM + WordVec                     0.1557  +0.19%      0.1929    +1.15%       0.2226   +0.09%     62/30/48
 LM + EntityVec                  0.1653†‡ + 6.17%    0.2100†‡ +10.12%        0.2338   +5.13%     94/18/28
 LM + WordVec + EntityVec        0.1653†‡ +6.17%     0.2100†‡   +10.12%      0.2338   +5.13%     94/18/28


Table 5: Effect of WordVec and EntityVec models on top of RM3 baseline for verbose, short queries and their
union. Notation explained in Section 5.1.
                                                               Verbose Queries
 Method                               MAP@1000               P@10                nDCG@10        Win/Tie/Loss
 RM3                               0.1614      -       0.2103        -       0.2264      -           -
 RM3 + WordVec                    0.1714†   +6.2%      0.2286†    +8.7%     0.2459†   +8.61%     166/13/83
 RM3 + EntityVec                  0.1759†   +8.98%     0.2233†   +6.18%     0.2435†   +7.55%     167/31/64
 RM3 + WordVec + EntityVec       0.1810†‡§ +12.14%    0.2298†§ +9.27%      0.2508†§  +10.78%     185/15/62
                                                                Short Queries
 Method                               MAP@1000               P@10                nDCG@10        Win/Tie/Loss
 RM3                              0.2387       -       0.2902        -       0.3289      -           -
 RM3 + WordVec                    0.2465    +3.27%     0.2941    +1.34%      0.3369   +2.43%     117/19/69
 RM3 + EntityVec                  0.2524†   +5.74%     0.2976    +2.55%     0.3397†   +3.28%     104/51/50
 RM3 + WordVec + EntityVec       0.2546†‡   +6.66%    0.3010†‡ +3.72%      0.3461†‡   +5.23%     131/15/59
                                                                 All Queries
 Method                               MAP@1000               P@10                nDCG@10        Win/Tie/Loss
 RM3                              0.1954       -       0.2454        -       0.2714      -            -
 RM3 + WordVec                    0.2044†   +4.60%     0.2574†   +4.88%     0.2859†   +5.34%     283/32/152
 RM3 + EntityVec                  0.2095†   +7.21%     0.2559†   +4.27%     0.2857†   +5.26%     271/82/114
 RM3 + WordVec + EntityVec       0.2133†‡§  +9.16%    0.2610 †§
                                                                 +6.35% 0.2926†‡§     +7.81%     316/30/121
are those of the authors and do not necessarily reflect     [10] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K.
those of the sponsors.                                           Landauer, and R. Harshman. Indexing by latent
                                                                 semantic analysis. Journal of the American society
References                                                       for information science, 41(6):391, 1990.
 [1] K. Balog, L. Azzopardi, and M. De Rijke. Formal        [11] G. Demartini, T. Iofciu, and A. P. De Vries.
     models for expert finding in enterprise corpora. In         Overview of the inex 2009 entity ranking track.
     Proceedings of the 29th annual international ACM            In International Workshop of the Initiative for
     SIGIR conference on Research and development                the Evaluation of XML Retrieval, pages 254–264.
     in information retrieval, pages 43–50. ACM, 2006.           Springer, 2009.
 [2] K. Balog, M. Bron, and M. De Rijke. Query mod-         [12] P. Ferragina and U. Scaiella. Fast and accurate
     eling for entity search based on terms, categories,         annotation of short texts with wikipedia pages.
     and examples. ACM Transactions on Information               IEEE software, 29(1):70–75, 2012.
     Systems (TOIS), 29(4):22, 2011.
                                                            [13] J. Foley, B. O’Connor, and J. Allan. Improving
 [3] K. Balog, D. Carmel, and P. Arjen. de vries,                entity ranking for keyword queries. In Proceedings
     daniel m. herzig, peter mika, haggai roitman, ralf          of the 25th ACM International on Conference on
     schenkel, pavel serdyukov, thanh tran duc. In               Information and Knowledge Management, pages
     The first joint international workshop on entity-           2061–2064. ACM, 2016.
     oriented and semantic search (JIWES), ACM SI-
     GIR Forum, volume 46, 2012.                            [14] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij,
                                                                 and M. de Rijke. Dynamic collective entity rep-
 [4] K. Balog and R. Neumayer. Hierarchical target               resentations for entity ranking. In Proceedings of
     type identification for entity-oriented queries. In         the Ninth ACM International Conference on Web
     Proceedings of the 21st ACM international confer-           Search and Data Mining, pages 595–604. ACM,
     ence on Information and knowledge management,               2016.
     pages 2391–2394. ACM, 2012.
                                                            [15] K. Gunaratna, K. Thirunarayan, A. Sheth, and
 [5] K. Balog and R. Neumayer. A test collection for             G. Cheng. Gleaning types for literals in rdf triples
     entity search in dbpedia. In Proceedings of the 36th        with application to entity summarization. In Inter-
     international ACM SIGIR conference on Research              national Semantic Web Conference, pages 85–100.
     and development in information retrieval, pages             Springer, 2016.
     737–740. ACM, 2013.
                                                            [16] K. Gunaratna, K. Thirunarayan, and A. P. Sheth.
 [6] K. Balog, P. Serdyukov, and A. P. d. Vries.                 Faces: Diversity-aware entity summarization using
     Overview of the trec 2010 entity track. Technical           incremental hierarchical conceptual clustering. In
     report, NORWEGIAN UNIV OF SCIENCE AND                       AAAI, pages 116–122, 2015.
     TECHNOLOGY TRONDHEIM, 2010.
                                                            [17] H. Halpin, D. M. Herzig, P. Mika, R. Blanco,
 [7] R. Blanco, H. Halpin, D. M. Herzig, P. Mika,                J. Pound, H. Thompon, and D. T. Tran. Evalu-
     J. Pound, H. S. Thompson, and T. T. Duc. En-                ating ad-hoc object retrieval. In IWEST@ ISWC,
     tity search evaluation over structured web data.            2010.
     In Proceedings of the 1st international workshop
     on entity-oriented search workshop (SIGIR 2011),       [18] F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E.
     ACM, New York, 2011.                                        Bratsberg, A. Kotov, and J. Callan. Dbpedia-
                                                                 entity v2: A test collection for entity search. In
 [8] A. Bordes, J. Weston, and N. Usunier. Open ques-            Proceedings of the 40th International ACM SIGIR
     tion answering with weakly supervised embedding             Conference on Research and Development in Infor-
     models. In Joint European Conference on Machine             mation Retrieval, pages 1265–1268. ACM, 2017.
     Learning and Knowledge Discovery in Databases,
     pages 165–180. Springer, 2014.                         [19] K. Hong, P. Pei, Y.-Y. Wang, and D. Hakkani-Tür.
                                                                 Entity ranking for descriptive queries. In Spo-
 [9] J. Dalton, L. Dietz, and J. Allan. Entity query             ken Language Technology Workshop (SLT), 2014
     feature expansion using knowledge base links. In            IEEE, pages 200–205. IEEE, 2014.
     Proceedings of the 37th international ACM SIGIR
     conference on Research & development in infor-         [20] K. Y. Itakura and C. L. Clarke. A framework
     mation retrieval, pages 365–374. ACM, 2014.                 for bm25f-based xml retrieval. In Proceedings of
    the 33rd international ACM SIGIR conference on           [31] J. Pennington, R. Socher, and C. Manning. Glove:
    Research and development in information retrieval,            Global vectors for word representation. In Proceed-
    pages 843–844. ACM, 2010.                                     ings of the 2014 conference on empirical methods
                                                                  in natural language processing (EMNLP), pages
[21] R. Kaptein and J. Kamps. Exploiting the category             1532–1543, 2014.
     structure of wikipedia for entity ranking. Artificial
     Intelligence, 194:111–129, 2013.                        [32] J. R. Pérez-Agüera, J. Arroyo, J. Greenberg, J. P.
                                                                  Iglesias, and V. Fresno. Using bm25f for semantic
[22] V. Lavrenko and W. B. Croft. Relevance based                 search. In Proceedings of the 3rd international
     language models. In Proceedings of the 24th annual           semantic search workshop, page 2. ACM, 2010.
     international ACM SIGIR conference on Research          [33] D. Petkova and W. B. Croft. Hierarchical language
     and development in information retrieval, pages              models for expert finding in enterprise corpora. In-
     120–127. ACM, 2001.                                          ternational Journal on Artificial Intelligence Tools,
[23] O. Levy, Y. Goldberg, and I. Dagan. Improving                17(01):5–18, 2008.
     distributional similarity with lessons learned from     [34] J. M. Ponte and W. B. Croft. A language mod-
     word embeddings. Transactions of the Association             eling approach to information retrieval. In Pro-
     for Computational Linguistics, 3:211–225, 2015.              ceedings of the 21st annual international ACM
                                                                  SIGIR conference on Research and development in
[24] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learn-           information retrieval, pages 275–281. ACM, 1998.
     ing entity and relation embeddings for knowledge
     graph completion. In AAAI, volume 15, pages             [35] J. Pound, P. Mika, and H. Zaragoza. Ad-hoc
     2181–2187, 2015.                                             object retrieval in the web of data. In Proceedings
                                                                  of the 19th international conference on World wide
[25] V. Lopez, C. Unger, P. Cimiano, and E. Motta.                web, pages 771–780. ACM, 2010.
     Evaluating question answering over linked data.
                                                             [36] R. Řehůřek and P. Sojka. Software Framework for
     Web Semantics: Science, Services and Agents on
                                                                  Topic Modelling with Large Corpora. In Proceed-
     the World Wide Web, 21:3–13, 2013.
                                                                  ings of the LREC 2010 Workshop on New Chal-
[26] Y. Lv and C. Zhai. A comparative study of meth-              lenges for NLP Frameworks, pages 45–50, Valletta,
     ods for estimating query language models with                Malta, May 2010. ELRA. http://is.muni.cz/
     pseudo feedback. In Proceedings of the 18th ACM              publication/884893/en.
     conference on Information and knowledge manage-         [37] P. Resnik. Using information content to evaluate
     ment, pages 1895–1898. ACM, 2009.                            semantic similarity in a taxonomy. arXiv preprint
                                                                  cmp-lg/9511007, 1995.
[27] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Ef-
     ficient estimation of word representations in vector    [38] P. Ristoski and H. Paulheim. Rdf2vec: Rdf graph
     space. arXiv preprint arXiv:1301.3781, 2013.                 embeddings for data mining. In International Se-
                                                                  mantic Web Conference, pages 498–514. Springer,
[28] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado,            2016.
     and J. Dean. Distributed representations of words
     and phrases and their compositionality. In Ad-          [39] S. Robertson, H. Zaragoza, and M. Taylor. Simple
     vances in neural information processing systems,             bm25 extension to multiple weighted fields. In
     pages 3111–3119, 2013.                                       Proceedings of the thirteenth ACM international
                                                                  conference on Information and knowledge manage-
[29] F. Nanni, B. Mitra, M. Magnusson, and L. Di-                 ment, pages 42–49. ACM, 2004.
     etz. Benchmark for complex answer retrieval. In         [40] M. Schuhmacher, L. Dietz, and S. Paolo Ponzetto.
     Proceedings of the ACM SIGIR International Con-              Ranking entities for web queries through text and
     ference on Theory of Information Retrieval, pages            knowledge. In Proceedings of the 24th ACM In-
     293–296. ACM, 2017.                                          ternational on Conference on Information and
                                                                  Knowledge Management, pages 1461–1470. ACM,
[30] Y. Ni, Q. K. Xu, F. Cao, Y. Mass, D. Sheinwald,
                                                                  2015.
     H. J. Zhu, and S. S. Cao. Semantic documents
     relatedness using concept graph representation.         [41] P. Serdyukov and A. De Vries. Delft university at
     In Proceedings of the Ninth ACM International                the trec 2009 entity track: Ranking wikipedia enti-
     Conference on Web Search and Data Mining, pages              ties. Technical report, DELFT UNIV OF TECH-
     635–644. ACM, 2016.                                          NOLOGY (NETHERLANDS), 2009.
[42] Q. Wang, J. Kamps, G. R. Camps, M. Marx,
     A. Schuth, M. Theobald, S. Gurajada, and
     A. Mishra.      Overview of the inex 2012
     linked data track. In CLEF (Online Working
     Notes/Labs/Workshop), 2012.
[43] C. Xiong and J. Callan. Esdrank: Connect-
     ing query and documents through external semi-
     structured data. In Proceedings of the 24th ACM
     International on Conference on Information and
     Knowledge Management, pages 951–960. ACM,
     2015.
[44] C. Xiong, J. Callan, and T.-Y. Liu. Word-entity
     duet representations for document ranking. In
     Proceedings of the 40th International ACM SI-
     GIR Conference on Research and Development in
     Information Retrieval, pages 763–772. ACM, 2017.
[45] C. Xiong, R. Power, and J. Callan. Explicit se-
     mantic ranking for academic search via knowledge
     graph embedding. In Proceedings of the 26th in-
     ternational conference on world wide web, pages
     1271–1279. International World Wide Web Con-
     ferences Steering Committee, 2017.
[46] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng.
     Embedding entities and relations for learning and
     inference in knowledge bases. arXiv preprint
     arXiv:1412.6575, 2014.
[47] H. Zamani and W. B. Croft. Embedding-based
     query language models. In Proceedings of the 2016
     ACM international conference on the theory of
     information retrieval, pages 147–156. ACM, 2016.
[48] N. Zhiltsov, A. Kotov, and F. Nikolaev. Fielded
     sequential dependence model for ad-hoc entity re-
     trieval in the web of data. In Proceedings of the
     38th International ACM SIGIR Conference on Re-
     search and Development in Information Retrieval,
     pages 253–262. ACM, 2015.
[49] S. Zwicklbauer, C. Seifert, and M. Granitzer. Ro-
     bust and collective entity disambiguation through
     semantic embeddings. In Proceedings of the 39th
     International ACM SIGIR conference on Research
     and Development in Information Retrieval, pages
     425–434. ACM, 2016.

</pre>