=Paper=
{{Paper
|id=Vol-2482/paper7
|storemode=property
|title=Exploring Summary-Expanded Entity Embeddings for Entity Retrieval
|pdfUrl=https://ceur-ws.org/Vol-2482/paper7.pdf
|volume=Vol-2482
|authors=Shahrzad Naseri,John Foley,James Allan,Brendan T. O’Connor
|dblpUrl=https://dblp.org/rec/conf/cikm/NaseriFAO18
}}
==Exploring Summary-Expanded Entity Embeddings for Entity Retrieval==
Exploring Summary-Expanded Entity Embeddings for Entity Retrieval Shahrzad Naseri1 John Foley2 James Allan1 Brendan T. O’Connor1 1 2 College of Information and Computer Sciences Department of Computer Science University of Massachusetts Amherst Smith College {shnaseri,allan,brenocon}@cs.umass.edu jjfoley@smith.edu target specific entities or lists of entities. Since their study, more entity-focused responses have appeared in Abstract major web search engines. Of course, rich knowledge bases play a key role in the Entity retrieval is an important part of any use of entities in a search. Structured data published modern retrieval system and often satisfies user in knowledge bases such as DBpedia1 , Freebase2 , and information needs directly. Word and entity YAGO3 continue to grow in a variety of languages. In embeddings are a promising opportunity for order to answer the queries directly from such knowl- new improvements in retrieval, especially in edge bases, the entity retrieval task has been defined: the presence of vocabulary mismatch problems. return a ranked list of entities relevant to the user’s We present an approach to entity embed- query. This task is typically approached by finding ding that leverages the summary of entity entities with a “meaning” that is similar to the query. articles from Wikipedia in order to form a Capturing that semantic (“meaning”) similarity be- richer representation of entities. We present a tween vocabulary terms, pieces of text, and sentences brief evaluation using the DBPedia-Entity-v2 has been a substantial problem in information retrieval dataset. Our evaluation shows that our new, and natural language processing (NLP), for which summary-inspired representation provides im- a wide variety of approaches have been introduced provements over both standard retrieval and [10, 37]. The word embeddings method assigns terms pseudo-relevance feedback baselines as well as a low-dimensional (compared to the vocabulary size) over a straightforward word-embedding model. vector and represents vocabulary terms by capturing We observe that this representation is partic- co-occurrence information between the terms, using ularly helpful for the verbose queries in the a likelihood approximation of the terms’ appearance INEX-LD and QALD-2 subsets of our test col- within a window context. Word2vec [28] and GloVe [31] lection. are examples of widely used word embeddings that are obtained based on a neural network-based language 1 Introduction model and matrix factorization technique, respectively. There has been substantial work on defining em- Recently, knowledge cards, conversational answers, and beddings for not just single words but for enti- other focused responses to user queries have become ties [45, 49, 8, 46, 24], but there is no clear baseline for possible for most search engines. Underlying most of ranking entities with such compressed semantic repre- these answers in search engine response pages is search sentations. In fact, when trying to re-use task-specific based on knowledge graphs and the availability of rich entity embeddings for retrieval tasks, results can be information for named entities. In particular, named less than impressive: e.g., RDF2Vec [38] was designed entities such as people, organizations, or concepts are for data mining and has been shown to under-perform often provided as the focused response to user queries. simple retrieval baselines like BM25 on more specific In a study of the Yahoo web search query logs, Pound tasks [29]. Although fully-deep models that leverage et al. [35] showed that more than 50% of the queries entities exist [44], often we do not have enough data Copyright © CIKM 2018 for the individual papers by the papers' 1 http://dbpedia.org 2 http://freebase.org authors. Copyright © CIKM 2018 for the volume as a collection 3 http://www.mpi-inf.mpg.de/yago-naga/yago/ by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). to train supervised embeddings. 2.1.1 Leveraging Knowledge Bases for Entity We propose a simple entity embedding model that Retrieval focuses on representing an entity based on other entities Existing methods typically study the use of type infor- crucial to its summary. Here, we use the entities that mation to improve entity retrieval accuracy [4, 21, 2]. appear inside a DBPedia abstract. Since we use links Knowledge bases are typically represented as tuples of present in the abstract, these entity mentions were relations, often formatted in the Resource Description effectively annotated by the human authors of those Framework (RDF) triple format. As a result, entities articles. have rich fielded information and fielded retrieval meth- In summary, we investigate the problem of entity ods such as BM25F [39, 32, 20] and F-SDM [48] are retrieval for improving retrieval results using word and especially helpful. Zhiltzov et al. in particular propose entity embeddings. We use the queries of DBpedia- the use of name, attribute, categories, similar entities, Entity (v2) dataset introduced by Hasibi et al. [18] and related entities as the fields for a fielded retrieval in order to evaluate our EntityVec representation on model [48]. its ability to directly rank entities. We demonstrate To take advantage of both structured and unstruc- that this is an effective representation for use in entity tured data, Schuhmacher et al. used a learning-to-rank ranking, one that provides gains beyond those provided approach which incorporates different features of both by single-word embeddings and query expansion. text and entities [40]. Foley et al. expand on results The rest of this work is organized in the follow- for their dataset by exploring minimal knowledge-base ing manner: We provide some background on entity features for use in learning-to-rank [13]. Both of these retrieval in Section 2. In Section 3 we present our studies leverage crowd-sourced judgments of entity rel- approach in detail. Finally, in Section 4 we empiri- evance for traditional TREC ad-hoc queries. cally validate our hypotheses and discuss conclusions in Section 6. 2.1.2 Entity Retrieval without a Knowledge 2 Related Work Base In this section, we first introduce some prior work in There have also been efforts to answer entity queries entity retrieval. Then we discuss the key ideas behind that cannot be satisfied via information in the knowl- the word embedding techniques whose purpose is to edge bases due to the various ways of addressing an capture the semantic similarity between vocabulary entity in the query. In earlier work on expert finding, terms. entities were defined by their locations in text [1, 33]. Entities are useful for a diverse set of tasks including More recently, Hong et al. [19] tried to enrich their but not limited to academic search [45], entity disam- knowledge base using linked web pages and queries biguation [49], entity summarization [16, 15], knowl- from a query log. In addition, Grause et al. [14] tried edge graph completion [46, 24], etc. We will focus our to present a dynamic representation for entities by discussion on entity retrieval. collecting different representation from a variety of resources and combine them together. 2.1 Entity Retrieval In this work, we focus on entities that can be found in knowledge bases. Entity ranking is a task that focuses on retrieving entities in a knowledge base and presenting them in 2.2 Neural and Embedding Approaches for ranked order in response to a users’ information need. Entity Retrieval This task was the focus of various benchmarking cam- paigns including the INEX Entity Ranking track [11], As our primary direction of study for this work is the INEX Linked Data Track [42], the TREC Entity toward an entity representation to improve retrieval, track [41, 6, 3], the Semantic Search Challenge [7, 17], the most relevant efforts are those that leverage word and the Question Answering over Linked Data (QALD) or entity embeddings in their ranking tasks. challenge series [25]. A common goal between all of Word embedding techniques learn a low-dimensional these campaigns was to address the users’ need in vector (compared to the vocabulary size) for each vo- an entity-specific way, instead of returning documents cabulary term in which the similarity between the word which might contain unnecessary information. How- vectors captures the semantic as well as the syntactic ever, these campaigns focused on different tasks such similarities between the corresponding words. Word as list search [3, 11], related entity finding [41] and embeddings are unsupervised learning methods since question answering [25]. All of the datasets from those they only need raw textual data without any labels. campaigns were combined into the DBPedia Entity There are different methods to compute the word em- v1 [5] and v2 [18] datasets. beddings. One of the most popular methods is using neural networks to predict words based on the con- 3.1 General Scheme of Retrieval text of a text. Mikolov et al. [28] introduced word2vec Given a query, q, that targets a specific entity, our task that learns vector representation of words via a neural is to return a ranked list of entities likely to be relevant. network with a single layer. Word2vec is proposed In this case, each entity is represented by a short textual in two ways, CBOW and Skip-gram. CBOW tries to description. In our experiments, for example, we used predict the word based on the context, i.e., neighboring the short abstract of each entity available in DBpedia. words. Skip-gram tries to predict the context. Given A list of candidate entities will also be retrieved using the word w, it tries to predict the probability of word term-based retrieval models such as query likelihood w0 being in a fixed window of word w. Another model model [34], efficiently creating a large pool of candidate for learning embedding vectors is based on matrix fac- matches. torization, e.g., GloVe vectors [31]. Although many In our model, we try to enhance the accuracy of variants of word embeddings exist, skipgram embed- entity retrieval by representing queries and entities by dings are quite efficient and not significantly different their corresponding embedding vectors. We explore from other variations if tuned correctly [27, 23]. two methods to represent query and entity embed- Xiong et al. propose a model for ad-hoc document ding vectors, which we refer to them as WordVec and retrieval that represents documents in queries in both EntityVec models. text and entity spaces, leveraging entity embeddings in In the WordVec model each query is represented by their approach [44]. However, such deep models require the average of the embedding vector of the query’s a significant quantity of training data to learn effective terms. Entities are also represented in a similar way, models, and our approach uses far less supervision than by averaging over the embedding vectors of the terms in this direction. the entity’s abstract. The GloVe [31] pre-trained word Entity embeddings are also used for academic embedding is used for the words embedding vector in search [45], for entity disambiguation [49], for ques- the WordVec model. tion answering [8] and for knowledge graph comple- In the EntityVec model, an embedding vector for tion [46, 24]. The benchmark paper for TREC-CAR entities is learned based on the Skip-gram model im- (Complex Answer Retrieval) determined that RDF2Vec plemented in gensim [36]. To learn this embedding, entity embeddings [38] are not as effective as BM25 for following the approach presented in [30], we replace the their entity-focused paragraph ranking task [29]. Our Wikipedia pages’ hyperlinks (links referring to other survey of related work suggests that opportunities to pages, i.e., entities) with a placeholder representing the customize entity vectors for ranking remain relatively entity. Consider the following excerpt, where links to unexplored. other Wikipedia articles (entities) are represented by italics: 3 Embedding-Based Entity Retrieval Harry Potter is a series of fantasy novels writ- ten by British author J. K. Rowling. The nov- Vocabulary mismatch is a long-standing problem in els chronicle the life of a young wizard, Harry information retrieval. Previous work [47] has proposed Potter, and his friends Hermione Granger and to incorporate word embeddings to solve this problem. Ron Weasley, all of whom are students at Hog- In this paper, we investigate the effect of word em- warts School of Witchcraft and Wizardry beddings in entity retrieval with the goal of solving vocabulary mismatches. The excerpt will be replaced by: Moreover, since in entity retrieval we retrieve entities Harry Potter is a series of instead of documents, and since most of the queries are Fantasy literature written by British entity centric, we learn an embedding representation author J. K. Rowling. The novels chronicle for entities and explore the effect of those embeddings the life of a young Magician (fantasy), on entity retrieval. We hypothesize that mapping the Harry Potter (character), and his friends query to the entity space and comparing with the re- Hermione Granger and Ron Weasley, all of trieved entities will improve the retrieval results. In whom are students at Hogwarts this section, we describe our approach to validate our hypothesis that incorporating word embeddings and en- where the link is replaced by the corresponding article’s tity embeddings enhances entity retrieval accuracy. We title and spaces are replaced by underscores. Now each also discuss query expansion [22], an approach that also entity in the original excerpt is considered as a single attempts to address the vocabulary gap by augmenting “term”, and an embedding is learned based on the Skip- the query with additional related words. gram model. 4 Experimental Setup Table 1: Learning corpora for WordVec and EntityVec embedding vectors In this section, we introduce our experimental setup, Model Learning Corpora baselines, and evaluation metrics. Next, we report and Pre-trained GloVe word embedding discuss our result. WordVec (6B tokens of Wikipedia + Gigawords 5) Full article of Wikipedia pages 4.1 Data set EntityVec pre-processed according to Section 3.1 Our experiments are conducted on the entity search As mentioned before, entities are represented by the test collection DBpedia-Entity v2 [18]. This dataset abstract available in DBpedia. To also consider this originally consists of queries gathered from the seven representation, the final embedding of a target entity previous competitions with relevance judgment on en- is obtained by averaging over the embedding vectors of tities from DBpedia version 2015-10. referred entities appeared in the abstract of the target For word embeddings, we used the GloVe [31] pre- entity. trained word embedding with 300 dimensions. The In the EntityVec model, queries are represented by word embeddings were extracted from a 6 billion token the average of the embedding vectors of the entities collection (the Wikipedia dump 2014 plus the Giga- in the query. The entities in the query are annotated words 5). using TagMe [12] mention detection tool. To train the entity embeddings, we used the full For both WordVec and EntityVec the similarity be- article of Wikipedia pages obtained from the DBpedia tween query and the document is calculated by cosine 2016-10 dump. similarity between their respective embedding vectors. The final entity retrieval score is obtained by linear 4.2 Data Processing interpolation of the baseline, WordVec, and EntityVec Retrieval results were obtained using the index built models. from the abstract of the entities. Table 1 reports the learning corpora for WordVec We used TagMe [12] as the mention detection tool and EntityVec models. Moreover, we summarize the for the entities in the queries. We used the Word2Vec final embedding vector for query and entity in table 2. implementation in gensim [36] for learning entities em- beddings – i.e. EntityVec. As mentioned previously, 3.2 Query Expansion to obtain EntityVec embeddings we followed the ap- proach outlined by Ni et al. [30] and replaced the out- In an intuitive sense, query and document embedding bound hyperlinks to Wikipedia pages with a unique models solve the vocabulary mismatch problem by placeholder token. We learn embeddings of 3.0 million virtue of expanding the representation. Therefore, it entities out of 4.8 million entities in Wikipedia. makes sense to compare our work to techniques in the query-expansion literature. 4.3 Hyperparameter Settings Lavrenko and Croft introduce relevance modeling, an approach to query expansion that derives a probabilistic The µ parameter of the language modeling approach model of term importance from documents that receive is obtained by 2-fold cross validation over the queries. high scores, given the initial query [22]. They present The µ parameter is chosen from the set {100, 500, 1000, a number of models, but the most utilized version is 1500}. To tune the RM3 hyperparameters – i.e., the RM3, which is a mixture model between the top k original query’s weight and the number of expansion expansion terms and the original query. Expansion terms – we use 2-fold and 5-fold cross-validation. The terms (t) are given the following weights derived from original weight is changed from 0.1 to 0.9 in increments a set of pseudo-relevant documents DQ a query Q: of 0.1, and the number of terms is changed from 10 to 90 in increments of 20. With the tuned parameter 1 X with 2-folds and 5-folds, RM3 for short queries did w(t) = P (d|Q)P (t|d) Z not improve over the Language model approach. We d∈DQ note that there were another parameter settings that Terms that occur frequently P (t|d) in high-scoring did improve RM3 over the language model but they documents P (d|Q) are given the most weight in the were not discoverable in the 2-fold or 5-fold approaches. expansion. The Z is merely a normalizer allowing for When we report RM3 results (Table 5), we report the the weights to be turned into a probability distribution results for 2-fold cross-validation. over terms that occur in the pseudo-relevant document The parameters for learning the EntityVec embed- set DQ . This baseline is often used for comparison in dings are as follows: window-size = 10, sub-sampling entity-focused retrieval literature [9, 40, 43]. = 1e-3, cutoff min-count = 0. The learned embedding Table 2: Query and retrieved entity representations for WordVec and EntityVec models. Model Query Retrieved Entity WordVec Average of query terms’ embedding vectors Average of embedding vectors of terms in the entity’s abstract EntityVec Average of query entities’ embedding vectors Average of embedding vectors of referred entities in the entity’s abstract dimension is equal to 200 and it is learned based on shows the number of queries improved, unchanged, or Skip-gram model. hurt, respectively, comparing with the base retrieval models and using the MAP measure. †, ‡, and § indicate 4.4 Evaluation Metrics statistical significance over the (base retrieval model), (base retrieval model)+WordVec, and (base retrieval Mean Average Precision (MAP) of the top-ranked 1000 model)+EntityVec, respectively. As mentioned earlier documents is selected as the main evaluation metric to we use two base retrieval models (LM and RM3). The evaluate the retrieval effectiveness. Furthermore, we best method for each metric is marked bold. consider precision of the top 10 retrieved documents (P@10). Since we have graded relevance judgment, we also report nDCG@10. Statistically significant differ- 5.2 Entity Representations for Short and Ver- ences in performances are determined using the two- bose Queries tailed paired t-test computed at a 95% confidence level We found that results were quite different for verbose based on the average precision per query. queries (defined as queries longer than four terms) and short queries, so our tables are broken into three 5 Results sections to reflect the overall dataset and these query- length subsets. In this section, we explore the results of our entity Based on the results in Table 3 we can see that both representation models atop two baselines. We look at WordVec and EntityVec improve verbose queries more both a standard unigram approach – language modeling than they improve short queries (particularly measured (LM) [34] – and an approach built on query expansion by MAP). We speculate this could be due to short – relevance modeling (RM3). queries being more prone to ambiguity, so those better In Table 3, we present the results of our model on query representations are built from verbose queries top of the LM baseline for short and verbose query where the additional words provide disambiguation and subsets as well as their union. We discuss the results thus better matching of related entities. Also for the of our models with respect to query length in Sec- WordVec model, it seems that the embedding of a short tion 5.2. This Table is the appropriate table to look at query does not seem to help improve matching signifi- overall results of our models, particularly in the “All cantly. It is also possible that some short queries are Queries” section. Both proposed methods outperform more specific so the embedding (implicitly incorporat- the baseline LM model, suggesting that there is value ing related words) is less important. Further analysis in both our EntityVec representation and in the more is needed to understand this behavior fully, but we rec- traditional WordVec query expansion. Combining the ommend that systems that use entity representations two methods yields even greater accuracy across all consider using query length to select an appropriate measures. model. In Table 4, we present the results of our different If we now look at the win/tie/loss analysis for these models atop LM using the traditional dataset subsets queries at the far right of Table 3, we can see that there inside of DBPedia-Entity-v2. Since these datasets were are many ties. This is a result of some queries lacking originally constructed for different variations of the entities in their description. In the current version of entity ranking task, we were curious if their different our model, we cannot generate an entity representation query types would yield different results. We discuss if our entity linker (TagMe, in this case) does not the results in terms of the different styles of queries in identify any entities in queries, so each representation Section 5.3. is identical. Even ignoring ties, we can see that there Finally, in Table 5, we examine our approaches on are more wins than losses so that our vector modeling top of a baseline with query expansion built-in. We approaches are helpful when entities are identified, and discuss the results of our models on this expanded the magnitude of MAP improvements is higher for baseline in Section 5.4. EntityVec than for WordVec, even though WordVec can be used for all queries and EntityVec only changes 5.1 Table Notation and Significance Testing a subset. In the result tables, relative improvements over the We further note that combining WordVec and base retrieval models – i.e. LM and RM3 – are shown EntityVec results in additional gains, indicating that as percentages to the right of the scores. Win/Tie/Loss the two methods are complementary, capturing differ- Table 3: Effect of WordVec and EntityVec models on top of LM baseline for verbose, short queries and their union. Notation explained in Section 5.1. Verbose Queries Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.1609 - 0.1992 - 0.2261 - - LM + WordVec 0.1708† +6.15% 0.2168† +8.84% 0.2429† +7.43% 171/14/77 LM + EntityVec 0.1731† +7.58% 0.2218† +11.35% 0.2415† +6.81% 162/28/72 LM + WordVec + EntityVec 0.1786†‡ +11% 0.2328†‡§ +16.87% 0.2554†‡§ +12.96% 189/16/57 Short Queries Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.2445 - 0.2922 - 0.3357 - - LM + WordVec 0.2498 +2.17% 0.2956 +1.16% 0.3417 +1.79% 111/23/71 LM + EntityVec 0.2532 +3.56% 0.2985 +2.16% 0.3454 +2.89% 92/49/64 LM + Wordvec + EntityVec 0.2635†‡§ +7.77% 0.3034†‡ +3.83% 0.3531†‡ +5.18% 135/20/50 All Queries Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.1976 - 0.2400 - 0.2742 - - LM + WordVec 0.2055† +3.99% 0.2514† +4.75% 0.2863† 4.41% 282/37/148 LM + EntityVec 0.2083† +5.41% 0.2555† +6.45% 0.2871† +4.70% 254/77/136 LM + WordVec + EntityVec 0.2159†‡§ +9.26% 0.2638 †‡§ +9.91% 0.2983†‡§ +8.78% 324/36/107 ent aspects of entities that are each useful. models compensate somewhere for that reduction, but are not sufficient to recover all of the loss. 5.3 Entity Representations for Different In future work, we hope to analyze the relevant Query Sources entities discovered by our embedding approaches that are not present in the RM3 baselines in order to better When we investigate the effect of our entity vector mod- understand where our improvements are coming from. els on different types of queries, we can see some more For the EntityVec gains, we hypothesize that we have interesting results in Table 4. Since the queries are of been able to encode critical information about the such diverse types, it is not surprising to observe some entity graph by modifying entity vectors to include variation. We see that the WordVec model does not their most important neighbors. show a significant improvement in the SemSearch-ES and QALD-2 results. Since SemSearch-ES queries are 6 Conclusion And Future Work mostly ambiguous keyword queries, it is possible that In this study, we expanded on traditional entity em- the WordVec representations are not specific enough to beddings by incorporating information from related be helpful. entities that are mentioned in their summary. We demonstrated the efficacy of this model on a popu- 5.4 Entity Representations and Query Expan- lar entity ranking collection in comparison to simpler sion word2vec style models and traditional retrieval mod- Finally, we evaluate the proposed methods in the els. In our comparison to RM3, a pseudo-relevance pseudo-relevance feedback scenario. We choose RM3 feedback query-expansion approach, we demonstrate which is a state-of-the-art PRF method that has been that the utility of our entity modeling is not limited to shown to perform well in various collections [26]. Ta- query expansion – or at least, it provides a useful and ble 5 shows the results for the proposed methods and novel method of query expansion in comparison to this the RM3 baseline. popular approach. In order to fully validate our model, we intend to We observe the same kind of improvements over the compare it to other unsupervised and semi-supervised RM3 baseline with our WordVec and EntityVec models entity embedding representations. We hope to explore that we saw on top of our keyword-query baseline. This more comparisons in future work, as well as more vari- is a really interesting observation because it shows that ations of our entity embedding model. our embedding models are somehow orthogonal to a state-of-the-art query expansion model, which is often pointed to as the source of improvement for embedding Acknowledgement approaches. This work was supported in part by the Center for We note that in this dataset, the RM3 methods Intelligent Information Retrieval and in part by NSF actually lowers the effectiveness for short queries com- grant #IIS-1617408. Any opinions, findings and con- pared to using LM alone. The WordVec and EntityVec clusions or recommendations expressed in this material Table 4: Effect of WordVec and EntityVec models on top of LM baseline for different query types. Notation explained in Section 5.1. SemSearch-ES Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.3188 - 0.2805 - 0.3901 - - LM + WordVec 0.3242 +1.69% 0.2726 -2.82% 0.3908 +0.18% 42/27/44 LM + EntityVec 0.3365† +5.55% 0.2832‡ +0.96% 0.4014 +2.9% 45/45/23 LM + WordVec + EntityVec 0.3358 +5.33% 0.2867‡ +2.21% 0.3995 +2.41% 57/15/41 ListSearch Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.1683 - 0.2800 - 0.2431 - - LM + WordVec 0.1724 +2.44% 0.2878 + 2.79% 0.2493 +2.55% 58/11/46 LM + EntityVec 0.1854†‡ +10.16% 0.2957 +5.61% 0.2597† +6.83% 75/8/32 LM + WordVec + EntityVec 0.1874†‡ +11.35% 0.2991† +6.82% 0.2673†‡ +9.95% 76/5/34 INEX-LD Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.1593 - 0.2596 - 0.2800 - - LM + WordVec 0.1619 +1.63% 0.2747 +5.82% 0.2908 +3.86% 54/5/40 LM + EntityVec 0.1788†‡ +7.85% 0.2859† +10.13% 0.3077† +9.89% 62/9/28 LM + WordVec + EntityVec 0.1837†‡§ +15.32% 0.2949†‡ +13.6% 0.3201†‡§ +14.32% 71/5/23 QALD-2 Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss LM 0.1554 - 0.1907 - 0.2224 - - LM + WordVec 0.1557 +0.19% 0.1929 +1.15% 0.2226 +0.09% 62/30/48 LM + EntityVec 0.1653†‡ + 6.17% 0.2100†‡ +10.12% 0.2338 +5.13% 94/18/28 LM + WordVec + EntityVec 0.1653†‡ +6.17% 0.2100†‡ +10.12% 0.2338 +5.13% 94/18/28 Table 5: Effect of WordVec and EntityVec models on top of RM3 baseline for verbose, short queries and their union. Notation explained in Section 5.1. Verbose Queries Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss RM3 0.1614 - 0.2103 - 0.2264 - - RM3 + WordVec 0.1714† +6.2% 0.2286† +8.7% 0.2459† +8.61% 166/13/83 RM3 + EntityVec 0.1759† +8.98% 0.2233† +6.18% 0.2435† +7.55% 167/31/64 RM3 + WordVec + EntityVec 0.1810†‡§ +12.14% 0.2298†§ +9.27% 0.2508†§ +10.78% 185/15/62 Short Queries Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss RM3 0.2387 - 0.2902 - 0.3289 - - RM3 + WordVec 0.2465 +3.27% 0.2941 +1.34% 0.3369 +2.43% 117/19/69 RM3 + EntityVec 0.2524† +5.74% 0.2976 +2.55% 0.3397† +3.28% 104/51/50 RM3 + WordVec + EntityVec 0.2546†‡ +6.66% 0.3010†‡ +3.72% 0.3461†‡ +5.23% 131/15/59 All Queries Method MAP@1000 P@10 nDCG@10 Win/Tie/Loss RM3 0.1954 - 0.2454 - 0.2714 - - RM3 + WordVec 0.2044† +4.60% 0.2574† +4.88% 0.2859† +5.34% 283/32/152 RM3 + EntityVec 0.2095† +7.21% 0.2559† +4.27% 0.2857† +5.26% 271/82/114 RM3 + WordVec + EntityVec 0.2133†‡§ +9.16% 0.2610 †§ +6.35% 0.2926†‡§ +7.81% 316/30/121 are those of the authors and do not necessarily reflect [10] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. those of the sponsors. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American society References for information science, 41(6):391, 1990. [1] K. Balog, L. Azzopardi, and M. De Rijke. Formal [11] G. Demartini, T. Iofciu, and A. P. De Vries. models for expert finding in enterprise corpora. In Overview of the inex 2009 entity ranking track. Proceedings of the 29th annual international ACM In International Workshop of the Initiative for SIGIR conference on Research and development the Evaluation of XML Retrieval, pages 254–264. in information retrieval, pages 43–50. ACM, 2006. Springer, 2009. [2] K. Balog, M. Bron, and M. De Rijke. Query mod- [12] P. Ferragina and U. Scaiella. Fast and accurate eling for entity search based on terms, categories, annotation of short texts with wikipedia pages. and examples. ACM Transactions on Information IEEE software, 29(1):70–75, 2012. Systems (TOIS), 29(4):22, 2011. [13] J. Foley, B. O’Connor, and J. Allan. Improving [3] K. Balog, D. Carmel, and P. Arjen. de vries, entity ranking for keyword queries. In Proceedings daniel m. herzig, peter mika, haggai roitman, ralf of the 25th ACM International on Conference on schenkel, pavel serdyukov, thanh tran duc. In Information and Knowledge Management, pages The first joint international workshop on entity- 2061–2064. ACM, 2016. oriented and semantic search (JIWES), ACM SI- GIR Forum, volume 46, 2012. [14] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij, and M. de Rijke. Dynamic collective entity rep- [4] K. Balog and R. Neumayer. Hierarchical target resentations for entity ranking. In Proceedings of type identification for entity-oriented queries. In the Ninth ACM International Conference on Web Proceedings of the 21st ACM international confer- Search and Data Mining, pages 595–604. ACM, ence on Information and knowledge management, 2016. pages 2391–2394. ACM, 2012. [15] K. Gunaratna, K. Thirunarayan, A. Sheth, and [5] K. Balog and R. Neumayer. A test collection for G. Cheng. Gleaning types for literals in rdf triples entity search in dbpedia. In Proceedings of the 36th with application to entity summarization. In Inter- international ACM SIGIR conference on Research national Semantic Web Conference, pages 85–100. and development in information retrieval, pages Springer, 2016. 737–740. ACM, 2013. [16] K. Gunaratna, K. Thirunarayan, and A. P. Sheth. [6] K. Balog, P. Serdyukov, and A. P. d. Vries. Faces: Diversity-aware entity summarization using Overview of the trec 2010 entity track. Technical incremental hierarchical conceptual clustering. In report, NORWEGIAN UNIV OF SCIENCE AND AAAI, pages 116–122, 2015. TECHNOLOGY TRONDHEIM, 2010. [17] H. Halpin, D. M. Herzig, P. Mika, R. Blanco, [7] R. Blanco, H. Halpin, D. M. Herzig, P. Mika, J. Pound, H. Thompon, and D. T. Tran. Evalu- J. Pound, H. S. Thompson, and T. T. Duc. En- ating ad-hoc object retrieval. In IWEST@ ISWC, tity search evaluation over structured web data. 2010. In Proceedings of the 1st international workshop on entity-oriented search workshop (SIGIR 2011), [18] F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E. ACM, New York, 2011. Bratsberg, A. Kotov, and J. Callan. Dbpedia- entity v2: A test collection for entity search. In [8] A. Bordes, J. Weston, and N. Usunier. Open ques- Proceedings of the 40th International ACM SIGIR tion answering with weakly supervised embedding Conference on Research and Development in Infor- models. In Joint European Conference on Machine mation Retrieval, pages 1265–1268. ACM, 2017. Learning and Knowledge Discovery in Databases, pages 165–180. Springer, 2014. [19] K. Hong, P. Pei, Y.-Y. Wang, and D. Hakkani-Tür. Entity ranking for descriptive queries. In Spo- [9] J. Dalton, L. Dietz, and J. Allan. Entity query ken Language Technology Workshop (SLT), 2014 feature expansion using knowledge base links. In IEEE, pages 200–205. IEEE, 2014. Proceedings of the 37th international ACM SIGIR conference on Research & development in infor- [20] K. Y. Itakura and C. L. Clarke. A framework mation retrieval, pages 365–374. ACM, 2014. for bm25f-based xml retrieval. In Proceedings of the 33rd international ACM SIGIR conference on [31] J. Pennington, R. Socher, and C. Manning. Glove: Research and development in information retrieval, Global vectors for word representation. In Proceed- pages 843–844. ACM, 2010. ings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages [21] R. Kaptein and J. Kamps. Exploiting the category 1532–1543, 2014. structure of wikipedia for entity ranking. Artificial Intelligence, 194:111–129, 2013. [32] J. R. Pérez-Agüera, J. Arroyo, J. Greenberg, J. P. Iglesias, and V. Fresno. Using bm25f for semantic [22] V. Lavrenko and W. B. Croft. Relevance based search. In Proceedings of the 3rd international language models. In Proceedings of the 24th annual semantic search workshop, page 2. ACM, 2010. international ACM SIGIR conference on Research [33] D. Petkova and W. B. Croft. Hierarchical language and development in information retrieval, pages models for expert finding in enterprise corpora. In- 120–127. ACM, 2001. ternational Journal on Artificial Intelligence Tools, [23] O. Levy, Y. Goldberg, and I. Dagan. Improving 17(01):5–18, 2008. distributional similarity with lessons learned from [34] J. M. Ponte and W. B. Croft. A language mod- word embeddings. Transactions of the Association eling approach to information retrieval. In Pro- for Computational Linguistics, 3:211–225, 2015. ceedings of the 21st annual international ACM SIGIR conference on Research and development in [24] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learn- information retrieval, pages 275–281. ACM, 1998. ing entity and relation embeddings for knowledge graph completion. In AAAI, volume 15, pages [35] J. Pound, P. Mika, and H. Zaragoza. Ad-hoc 2181–2187, 2015. object retrieval in the web of data. In Proceedings of the 19th international conference on World wide [25] V. Lopez, C. Unger, P. Cimiano, and E. Motta. web, pages 771–780. ACM, 2010. Evaluating question answering over linked data. [36] R. Řehůřek and P. Sojka. Software Framework for Web Semantics: Science, Services and Agents on Topic Modelling with Large Corpora. In Proceed- the World Wide Web, 21:3–13, 2013. ings of the LREC 2010 Workshop on New Chal- [26] Y. Lv and C. Zhai. A comparative study of meth- lenges for NLP Frameworks, pages 45–50, Valletta, ods for estimating query language models with Malta, May 2010. ELRA. http://is.muni.cz/ pseudo feedback. In Proceedings of the 18th ACM publication/884893/en. conference on Information and knowledge manage- [37] P. Resnik. Using information content to evaluate ment, pages 1895–1898. ACM, 2009. semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007, 1995. [27] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Ef- ficient estimation of word representations in vector [38] P. Ristoski and H. Paulheim. Rdf2vec: Rdf graph space. arXiv preprint arXiv:1301.3781, 2013. embeddings for data mining. In International Se- mantic Web Conference, pages 498–514. Springer, [28] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, 2016. and J. Dean. Distributed representations of words and phrases and their compositionality. In Ad- [39] S. Robertson, H. Zaragoza, and M. Taylor. Simple vances in neural information processing systems, bm25 extension to multiple weighted fields. In pages 3111–3119, 2013. Proceedings of the thirteenth ACM international conference on Information and knowledge manage- [29] F. Nanni, B. Mitra, M. Magnusson, and L. Di- ment, pages 42–49. ACM, 2004. etz. Benchmark for complex answer retrieval. In [40] M. Schuhmacher, L. Dietz, and S. Paolo Ponzetto. Proceedings of the ACM SIGIR International Con- Ranking entities for web queries through text and ference on Theory of Information Retrieval, pages knowledge. In Proceedings of the 24th ACM In- 293–296. ACM, 2017. ternational on Conference on Information and Knowledge Management, pages 1461–1470. ACM, [30] Y. Ni, Q. K. Xu, F. Cao, Y. Mass, D. Sheinwald, 2015. H. J. Zhu, and S. S. Cao. Semantic documents relatedness using concept graph representation. [41] P. Serdyukov and A. De Vries. Delft university at In Proceedings of the Ninth ACM International the trec 2009 entity track: Ranking wikipedia enti- Conference on Web Search and Data Mining, pages ties. Technical report, DELFT UNIV OF TECH- 635–644. ACM, 2016. NOLOGY (NETHERLANDS), 2009. [42] Q. Wang, J. Kamps, G. R. Camps, M. Marx, A. Schuth, M. Theobald, S. Gurajada, and A. Mishra. Overview of the inex 2012 linked data track. In CLEF (Online Working Notes/Labs/Workshop), 2012. [43] C. Xiong and J. Callan. Esdrank: Connect- ing query and documents through external semi- structured data. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 951–960. ACM, 2015. [44] C. Xiong, J. Callan, and T.-Y. Liu. Word-entity duet representations for document ranking. In Proceedings of the 40th International ACM SI- GIR Conference on Research and Development in Information Retrieval, pages 763–772. ACM, 2017. [45] C. Xiong, R. Power, and J. Callan. Explicit se- mantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th in- ternational conference on world wide web, pages 1271–1279. International World Wide Web Con- ferences Steering Committee, 2017. [46] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014. [47] H. Zamani and W. B. Croft. Embedding-based query language models. In Proceedings of the 2016 ACM international conference on the theory of information retrieval, pages 147–156. ACM, 2016. [48] N. Zhiltsov, A. Kotov, and F. Nikolaev. Fielded sequential dependence model for ad-hoc entity re- trieval in the web of data. In Proceedings of the 38th International ACM SIGIR Conference on Re- search and Development in Information Retrieval, pages 253–262. ACM, 2015. [49] S. Zwicklbauer, C. Seifert, and M. Granitzer. Ro- bust and collective entity disambiguation through semantic embeddings. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 425–434. ACM, 2016.