-

Insights into Entity Recommendation in Web Search

Nitish Aggarwaly?

Peter Mika

pmika@yahoo-inc.com

Roi Blanco

roi@yahoo-inc.com

Paul Buitelaary

User engagement is a fundamental goal for search engines. Recommendations of entities that are related to the user's original search query can increase engagement by raising interest in these entities and thereby extending the user's search session. Related entity recommendations have thus become a standard feature of the interfaces of modern search engines. These systems typically combine a large number of individual signals (features) extracted from the content and interaction logs of a variety of sources. Such studies, however, do not reveal the contribution of individual features, their importance and interaction, or the quality of the sources. In this work, we measure the performance of entity recommendation features individually and by combining them based on a novel dataset of 4.5K search queries and their related entities, which have been evaluated by human assessors.

With the advent of large knowledge bases like DBpedia [ 5 ], YAGO [13] and Freebase [ 7 ], search engines have started recommending entities related to web search queries. Pound et al. [ 12 ] reported that around 50% web search queries pivot around a single entity and can be linked to an entity in the knowledge bases. Consequently, the task of entity recommendation in the context of web search can be de ned as nding the entities related to the entity appearing in a web search query. It is very intuitive to get the related entities by obtaining all the explicitly linked entities to a given entity in the knowledge bases. However, most of the popular entities have more than 1,000 directly connected entities, and the knowledge bases mainly cover some speci c types of relations. For instance, \Tom Cruise" and \Brad Pitt" are not directly connected in DBpedia graph with any relation, however, they can be considered related to each other. ? This work was done while the author was visiting Yahoo! Research Labs, Barcelona. Therefore, to build a system for entity recommendation, there is a need to nd related entities beyond the explicit relations de ned in Knowledge bases. Further, these related entities require a ranking method to select the most related ones.

Blanco et al. [ 6 ] described the Spark system for related entity recommendation and suggested that such recommendations are successful at extending users' search sessions. Microsoft also published a similar system [14] that performs personalized entity recommendation by analyzing the user click through logs. In this paper, we focus on exploring the di erent features in an entity recommendation system and investigate their e ectiveness. Yahoo's entity recommendation system \Spark" utilizes more than 100 di erent features providing the evidence of the relevance of an entity. The nal relevance scores are calculated by combining the di erent features using state-of-the-art learning-to-rank approach. Although, Blanco et al. presented some experimentation with the Spark system, in particular by reporting on the importance of the top 10 features, and the evaluation metrics on di erent types of entities; further experimentation is required to investigate the impact of individual features and their di erent combinations. The features used in Spark can be divided in ve types: co-occurrence based, linear combination of co-occurrence based features, graph-based, popularity-based, and type-based features. Co-occurrence based features make use of four di erent data sources: query term, user speci c query sessions, Flickr tags, and tweets. In this paper, we explore the impact of the features used in the Spark system by combining them based on their types and data sources. In order to investigate the quality of di erent data sources, we focus extensively on co-occurrence based features. All of the data sources used to calculate co-occurrence based features are not publicly accessible. For instance, only major search engines have the datasets like query terms and query sessions. Therefore, we measure the performance of a system that has only co-occurrence based features extracted from Wikipedia. The data sources like query terms, Flickr tags and tweets can only capture the presence of an entity. However, Wikipedia articles are long enough to obtain the associative weight of an entity with a Wikipedia article, which provides an opportunity to build the distributional semantic model (DSM) [ 1, 4, 10 ] over Wikipedia concepts. Therefore, in addition to co-occurrence based features that consider only the presence, we also explore the DSM based feature built over Wikipedia. We evaluate the performance by adding the Wikipedia-base features in the current Spark system, which will be referred as Spark+Wiki in rest of the paper. 2

Entity recommendation system

This section provides a detailed overview of the Spark system. Section 2.1 describes the construction of Yahoo's knowledge graph, part of which is used to obtain the potential entity candidates. Section 2.2 explains di erent types of features and how they are extracted from di erent data sources. Spark and Spark+Wiki combines the values obtained from di erent features, by using a learning to rank approach, which is explained in Section 2.3. 2.1

Yahoo knowledge graph In order to retrieve a ranked list of the entities, the system requires a list of potential entity candidates that can be considered related with the given entity. These candidates can be obtained from existing knowledge bases like DBpedia or YAGO. However, such existing knowledge bases may not cover all the relations that can be de ned between the related entities. For instance, \Tom Cruise" can be considered highly related to \Brad Pitt", but they are not connected by any relation in DBpedia graph. Therefore, Spark uses an entity graph extracted from di erent structured and unstructured data sources including public data sources such as DBpedia and Freebase. It also uses a manually constructed ontology that de nes the types of an entity extracted from di erent resources. In order to extend the coverage of the de ned relations in entity graph, it performs information extraction over various unstructured data sources in di erent domains like movies, music, TV shows and sports. The subset of the entity graph used in Spark covers entity-types in media, sports and geography and consisted of over 3.5M entities and 1.4B relations at the time of our experiments (see for more detail [ 6 ]). 2.2

Feature extraction Spark uses more than 100 di erent features. These features are divided into ve di erent categories: co-occurrence based features, linear combination of cooccurrence based features, graph-based features, popularity-based features, and type-based features.

Co-occurrence features are derived from the hypothesis that the entities, which occur often in the same event or context, are more likely to be related to each other. Spark system uses 11 di erent types of features which are obtained by using di erent co-occurrence measures. Let E1 and E2 are two entities and S is the set of events, where S = fs1; s2; :::sng and sn is the nth event. The event is de ned as one observation under consideration for measuring the co-occurrence. For instance, every query in query logs is an event and entity occurrence is de ned by PN

i=0 oi, where oi = 1 if an event si contains the entity E otherwise oi = 0. 1. Probability (P1; P2) it is calculated by taking the ratio of the number of events that contain the given entity to the total number of events. P is the probability of an entity E.

P =

PN i=0 oi N (1) and P is the probability de ned in feature 1. Similar to the probability feature, it gives two values Ent1 and Ent2 for an entity pair. 3. KL divergence (KL1; KL2) It is KL divergence of an entity E. Similar to the above features, it also gives two values KL1andKL2 for an entity pair. 4. Joint probability (JPSYM) This score is obtained by taking the ratio of the number of events that contain both the given entities to total number of events.

N where coi = 1 if an event si contains both the entities E1 and E2, otherwise oi = 0. 5. Joint user probability (PUSYM) This is similar to the feature 4, however, it calculates the co-occurrence over users rather than the events. (2) (3) where U is the total number of users and coui = 1 if a user ui contains both the entities E1andE2, otherwise coui = 0. 6. PMI (SISYM) It computes the point wise mutual information (PMI). where N is the total number of events. The value of P of an entity is independent of the other entities, therefore it gives two values P1 and P2 for an entity pair consisting of E1 and E2. 2. Entropy (Ent1; Ent2) This is the standard entropy of an entity that is de ned by

Ent1 =

P1 log(P1) (4) (5) (6) (7) (8) P M I(E1; E2) =

Cosine(E1; E2) = . 7. Cosine similarity (CSSYM) The cosine similarity is calculated as . 8. Conditional probability (CPASYM) It is calculated as the ratio of the total number of events that contain E1 and E2, to the total number of events that contain E1.

CP ASY M (E1; E2) =

CU P ASY M (E1; E2) = where oe1i = 1 if an event si contains the entity E1, otherwise oe1i = 0. 9. Conditional user probability (CUPASYM) This is similar to the CPASYM except it computes the score over the users. where oue1i = 1 if an user ui contains the entity E1, otherwise oue1i = 0.

J P SY M =

PiN=0 coi P U SY M =

PiU=0 coui

U log(P (E1; E2)) P (E1) P (E2))

P (E1; E2) P (E1)

P (E2)) PiN=0 coi PiN=0 oe1i PiU=0 coui PiU=0 oue1i where oue2i = 1 if an user ui contains the entity E2, otherwise oue1i = 0. Combined features are the combination of co-occurrence features. The Spark system uses 8 di erent types of combined features from every data source. Therefore it generates a total of 32 di erent features. These are the following 8 features: 1. CF1 is the combination of conditional user probability and prior probability of target entity de ned by: 2. CF2 is the combination of conditional user probability and prior probability of target entity de ned by: PN

i=0 coi PN

i=0 oe2i PU

i=0 coui PU i=0 oue1i (9) (10) (11) (12) (13) (14) (15) 4. CF4 is the combination of reverse conditional probability and entropy of target entity de ned by: 5. CF5 is the combination of joint user probability and prior probability of target entity de ned by: 3. CF3 is the combination of reverse conditional probability and prior probability of target entity de ned by: 6. CF6 is the combination of joint user probability and prior probability of target entity de ned by: 10. Reverse conditional probability (RCPASYM) It is reverse of the CPASYM.

where oe2i = 1 if an event si contains the entity E2, otherwise oe1i = 0. 11. Reverse conditional user probability (RCUPASYM) It is reverse of the CUPASYM.

RCP ASY M (E1; E2) = RCU P ASY M (E1; E2) =

CF 1 = CU P ASY M

P2 CF 2 =

CU P ASY M

P2 CF 3 = RCP ASY M

P2 CF 4 = RCP ASY M

Ent2 CF 5 = J P U SY M

P2 7. CF7 is the combination of joint user probability and entropy of target entity de ned by: 8. CF8 is the combination of joint user probability and entropy of target entity de ned by:

CF 7 = J P U SY M

J P U SY M Graph-based features use the knowledge graphs like DBpedia and Freebase. Spark computes 5 di erent features by using knowledge graphs. 1. Graph similarity (GSCEG) This feature computes the total shared connections between two given entities in Yahoo! knowledge graph. 2. Entity popularity in movies (EPOPUMOVIE) This feature counts the total number of directly connected nodes in movie speci c knowledge graph, to compute the entity popularity rank. 3. Facet popularity in movies (FPOPUMOVIE) This is facet popularity rank in movie speci c knowledge graph. 4. Entity popularity in all (EPOPUALL) Similar to EPOPUMOVIE it counts the total number of directly connected nodes in complete Yahoo! knowledge graph. 5. Facet popularity in all (FPOPUALL) This is facet popularity rank in the complete knowledge graph.

Popularity-based features 1. Web search citation (WCTHWEB) It counts the total hits in web search results of Yahoo!. 2. Web deep citation (WCDHWEB) It counts the total number of user clicks in web search results of Yahoo!. 3. Entity Volume in query(COVQ) It counts the total number of occurrence of given entity in query logs. 4. Entity Volume in facet (COVF) Facet volume in query logs. 5. Entity view volume in query (W P OP1; W P OP2) It compute the total number user clicks for given entity while the entity occur in query.

Entity type features re ect the entity types and relation types present in the knowledge bases. Spark uses two di erent entity type features: 1. Entity class type (ET1; ET2) This is the type of an entity de ned in the knowledge base. It provides two di erent feature values ET1 and ET2 for an entity pair of the entities E1 and E2. 2. Relation type (RT) This feature de nes the relation type between two given entities. For instance, \Brad Pitt" and \Angelina Jolie" are de ned by relation type \Partner" in DBpedia.

Wikipedia-based features The Spark system does not use Wikipedia to extract the features. However, in addition to the features reported by Blanco et al. [ 6 ], we experiment with additional Wikipedia-based features that we refer as Spark+Wiki. Aggarwal et al. [ 2, 3 ] presented an entity recommendations system \EnRG" that shows the e ectiveness of using only Wikipedia-based features. Therefore, in this section we explain the additional features.

In order to obtain the Wikipedia-based features, we use Wikipedia as two types of data sources: collection of textual content and the collection of Wikipedia hyperlinks. We use 7 types of co-occurrence features from Wikipedia, where 6 out these 7 features types are already de ned above: Probability (P1; P2), Joint Probability (JPSYM), Conditional Probability (CPASYM), Cosine Similarity (CSSYM), PMI (SISYM) and Reverse Conditional Probability (RCPASYM). The above described co-occurrence features only consider presence of an entity, as the events (search queries or tweets) used in Spark are very short in length. However, Wikipedia articles have long enough content to measure the importance of an entity to a given article (or an event in this case). Therefore, Wikipedia can provide the occurrence information of the entities with their importance weights that can be used to build the distributional vector of the entities. Spark+Wiki uses Wikipedia-based distributional semantic model (DSM) [ 4, 9 ] as an additional co-occurrence feature. DSM score is calculated by computing the cosine score between two distributional vectors. The DSM vector is de ned by v, where v = PiN=w0 ai ci and ci is ith concept in the Wikipedia concept space, and ai is the tf-idf weight of the entity e with the concept ci. Here, Nw represents the total number of Wikipedia concepts. As mentioned above, we use Wikipedia as a collection of textual content and the collection of Wikipedia hyperlinks, there are 16 features that compute the values by using Wikipedia. 2.3

Ranking In order to predict the ranking by combining all the features, Spark uses learning to rank approach [ 8 ] considering all the scores obtained from di erent features. As all the learning algorithm requires a training data, Blanco et al. [ 6 ] built the dataset that contains more than four thousand web search queries. Every query refers to an entity de ned in knowledge graph, and contain a list of entity candidates. Finally, the dataset consists of 47,623 entity-pairs, which are tagged by professional experts. The ranking can be de ned by learning a ranking function f(.) that generates a score for an input query entity qi and an entity candidate ej . Spark makes use of Stochastic Gradient Boosted Decision Trees (GBDT) to obtain the ranking score to decide the appropriate label for given pairs. 3

Evaluation

This section describes the evaluations of Spark and Spark+Wiki. As explained above, Spark+Wiki is actually the Spark with additional Wikipedia-based features. We evaluate the performance on a dataset that consists of 47,623 queryentity pairs. As Spark uses GBDT ranking method, we tune the GBDT parameters by splitting the dataset in 10 folds. The nal parameters are obtained by performing cross validation. Due to variations in the number of retrieved related entities for a query, we use Normalized Discounted Cumulative Gain (nDCG) [ 11 ] for the performance metric. nDCGp is de ned by the ratio of DCGp to maximum or ideal DCGp.

DCGp is de ned by: nDCGp =

p DCGp = X

DCGp IDCGp

: 2g(li)

1 i=1 log2(g(li)) + 1 (19) (20) g(li) is the gain for the label li. nDCG gives di erent scores on di erent values of p, therefore, we reported the nDCG scores for 1, 5, and 10. 3.1

Datasets Blanco et al. [ 6 ] reported the Spark performance on a dataset that consists of 4,797 search queries obtained from commercial search engines. Every query refers to an entity in DBpedia, and contains a list of entity candidates. The entity candidates are tagged by professional editors on 5 label scales: Excellent, Prefer, Good, Fair, and Bad. The dataset contains di erent types of entity candidates such as person, location, movie, and TV show. Table 1 provides the details about di erent types of instances in the dataset. It shows that most of the entities are of type \location" or \person". Section 3.3 reports the performance for these speci c types in addition to the overall dataset. We evaluate the performance of Spark system, and compare it with the model that was built only over Wikipedia. In order to inspect whether the additional features generated using Wikipedia can complement Spark performance, we perform the experiments with Spark+Wiki. We calculate nDCG@10, nDCG@5, and nDCG@1 as the evaluation metrics. In addition to perform experiments on the dataset with all the entity types, we also evaluated the systems for the datasets including only person type entities or location type entities. Spark combines the scores that are obtained from di erent types of features by using GBDT. It contains 112 features in total where 56 features are co-occurrence based, 32 features are the linear combination of co-occurrence based features, 5 features are graphbased, 6 features are popularity-based, 3 features are type-based, and the remaining 10 features are of types such as string length and Wikipedia clicks. These 56 co-occurrence based features are built over 4 di erent data sources: query term (QT), query session (QS), Flickr tags (FL), and tweets (TW). It means that there are 14 co-occurrence based features generated from each data source. Spark+Wiki has additional co-occurrence based features built over Wikipedia. Spark+Wiki uses the Wikipedia as two types of data sources: collection of documents with textual content and collection of documents with hyperlinks only. However, it does not generate 14 co-occurrence based features for both the data sources. Spark+Wiki uses 8 co-occurrence based features: Probability (P1; P2), Joint probability (JPSYM), PMI (SYSYM), Cosine similarity (CSSYM), Conditional probability (CPASYM), Reverse conditional probability (RCPASYM), and Distributional semantic model (DSM) vector. The DSM feature was not available in Spark as the data sources used in Spark have small documents (query or tweet). However, Wikipedia characteristics allow us to build the DSM vector over Wikipedia concepts [ 4, 9 ]. As a result, Spark+Wiki consists of 128 features where 16 features are additional to Spark system presented by Blanco et al. [ 6 ].

In order to investigate the importance of the features, we build the ranking model by taking the features from one category at a time. Therefore, we examine the performance of all ve models: co-occurrence based, linear combination of co-occurrence based features, graph-based, popularity-based, and type-based. Further, we perform the experiments with only co-occurrence based features as they turn out to be most signi cant features of the system. We calculate the scores by taking co-occurrence based features and compare the importance of each data source separately. This section presents the results obtained from the above described experiments. Table 2 shows the retrieval performance of Spark, and compare it with Spark+Wiki and the Wikipedia only model. It shows that Wikipedia-based model achieved comparable results on full dataset and person type entities. However, it could not cope well for location type entities. The possible reason behind it could be that most of the locations are too speci c which do not have enough information on Wikipedia. Although, Wikipedia-based model could not outperform Spark, the combination of both i.e. Spark+Wiki achieved higher scores for all the test cases. Wikipedia-based model obtained relatively lower scores for location type entities, however, it is able to compliment the Spark performance. In order to inspect the e ectiveness of di erent features, we compute the feature importance in our learning algorithm. We calculate the reduction in the loss function for every split of feature variable and then compute the total reduction in loss function. It provides that how many times the given features was used in making the nal decision by the learning algorithm. Table 3 shows the importance of top 20 features used in Spark+Wiki. The names of the features listed in the table correspond to their acronyms explained in section 3.2. The cooccurrence features have additional su xes QT, QS, FL, TW, WT, and WL for query term, query sessions, Flickr tags, tweets, Wikipedia text, and Wikipedia links, respectively. For instance, the feature CSSYMFL refers to cosine similarity generated over Flickr tags. Table 3 shows that relation type (RT$) is the most important feature in Spark+Wiki which is same as reported by Blanco et al. [ 6 ]. Further, this table reports the e ectiveness of the Wikipedia-based features as Features

types Co-occurrence

Features

QT QS FL TW

Wiki there are 5 Wikipedia based features in the top 10 most e ective ones for the full dataset. It also shows the advantage of using additional DSM features. In particular, for person type entities, Wikipedia-based DSM feature shows a remarkable importance. Moreover, Wikipedia turned out to be a useful data source to obtain the background information about location type entities. The Wikipedia document collection created by keeping only hyperlinks, shows more e ectiveness than taking all the textual content for building the DSM model. It shows the constancy of the results with the ones reported by Aggarwal and Buitelaar [ 4 ] that hyperlink-based DSM outperforms the text-based DSM model for entity relatedness and ranking. As we performed experiments by categorizing the features based on their types, we also evaluate models which are built over the subset of the features coming from the same category. Table 4 shows the scores obtained from ve di erent models based on the feature categories: co-occurrence features, linear combination of co-occurrence features, graph-based features, popularitybased features, and type-based features. It shows that co-occurrence based features are very e ective. Although, relation-type feature turned out to be the most important feature (see table 3), the type-based features are not very e ective without other features. The co-occurrence based features are built by using 5 data sources: query terms, query sessions, Flickr tags, tweets, and Wikipedia. Therefore, we reported the scores generated by co-occurrence based features over di erent data sources in table 5. It shows that Wikipedia is the most e ective resource for all types of entities. However, for location type entities, Flickr tags perform better than Wikipedia. This shows the usefulness of the Flickr data to capture the speci c and non-popular place names. Table 5 shows that Wikipediabased features are the most e ective ones for building the co-occurrence based model. Consequently, we further investigate the importance of Wikipedia-based features. Table 6 shows that the probability obtained from textual content is the most signi cant feature. However, the DSM based vectors over textual content (WT) and hyperlinks (WL) show a good relevance for the model. In all the experiments, DSM over hyperlinks shows more importance than the DSM built over textual content. The possible reason behind this could be that the DSM vector over textual content may not capture the appropriate semantics of an ambiguous entity. On the contrary, the hyperlink-based DSM vector can di erentiate between ambiguous surface forms. For instance, Aggarwal and Buitelaar [ 4 ] showed that the text-based DSM vector of an entity \NeXT"1 may not obtain the relevant dimensions while the hyperlink-based DSM vector obtained all the relevant Wikipedia articles. 4

Conclusion

In this paper, we presented an extensive evaluation of entity recommendation system called \Spark". Spark uses more than 100 features, and produces the nal scores by combining these features using learning to rank algorithm. These features are built over varying data sources: query term, query session, Flickr tags, and tweets. Therefore, we investigated the performance of these features individually and by combining them based on their data source. Most of the data 1 http://en.wikipedia.org/wiki/NeXT sources used in Spark such as users' query logs, are not publicly available. However, Wikipedia is a continuously growing encyclopedia that is publicly available. Therefore, we showed that the model built only over Wikipedia achieved a comparable accuracy to the Spark. Moreover, Spark does not utilize the Wikipedia to build its features, thus, we also analyzed the e ect of using Wikipedia as an additional resource. We showed that Wikipedia-based features complement the overall performance of Spark. 5

Acknowledgement

This work is supported by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (INSIGHT) and Yahoo! Labs. 13. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge.

In Proceedings of the 16th international conference on World Wide Web, pages 697{706. ACM, 2007. 14. X. Yu, H. Ma, B.-J. P. Hsu, and J. Han. On building entity recommender systems using user click log and freebase knowledge. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 263{272. ACM, 2014.

Aggarwal ,

Asooja , G. Bordea, and

Buitelaar . Non-orthogonal explicit semantic analysis . Lexical and Computational Semantics (* SEM 2015 ), pages 92 { 100 , 2015 .

Aggarwal ,

Asooja ,

Buitelaar , and

Vulcu . Is brad pitt related to backstreet boys? exploring related entities . In Semantic Web Challenge ISWC ( 2014 ), 2014 .

Aggarwal ,

Asooja ,

Ziad , and

Buitelaar . Who are the american vegans related to brad pitt?: Exploring related entities . In Proceedings of the 24th International Conference on World Wide Web Companion , pages 151 { 154 . International World Wide Web Conferences Steering Committee, 2015 .

Aggarwal and

Buitelaar . Wikipedia-based distributional semantics for entity relatedness . In 2014 AAAI Fall Symposium Series , 2014 .

Auer ,

Bizer , G. Kobilarov,

Lehmann ,

Cyganiak , and

Ives . Dbpedia: A nucleus for a web of open data . In The semantic web , pages 722 { 735 . Springer, 2007 .

Blanco ,

B. B.

Cambazoglu ,

Mika , and

Torzec . Entity recommendations in web search . In International Semantic Web Conference (ISWC) , 2013 .

Bollacker ,

Evans ,

Paritosh ,

Sturge , and

Taylor . Freebase: a collaboratively created graph database for structuring human knowledge . In Proceedings of the 2008 ACM SIGMOD international conference on Management of data , pages 1247 { 1250 . ACM, 2008 .

J. H.

Friedman . Greedy function approximation: a gradient boosting machine . Annals of Statistics , pages 1189 { 1232 , 2001 .

Gabrilovich and

Markovitch . Computing semantic relatedness using wikipedia-based explicit semantic analysis . In Proceedings of the 20th international joint conference on Arti cal intelligence , IJCAI'07 , pages 1606 { 1611 , 2007 .

10.

Harris . Distributional structure . In Word 10 ( 23 ), pages 146 { 162 , 1954 .

11. K. Ja

rvelin and

Keka

lainen. Ir evaluation methods for retrieving highly relevant documents . In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages 41 { 48 . ACM, 2000 .

12. J. Pound , P.

Mika , and H.

Zaragoza . Ad-hoc object retrieval in the web of data . In Proceedings of the 19th international conference on World wide web , pages 771 { 780 . ACM, 2010 .