=Paper=
{{Paper
|id=Vol-2482/paper6
|storemode=property
|title=Graph Analytical Re-Ranking for Entity Search
|pdfUrl=https://ceur-ws.org/Vol-2482/paper6.pdf
|volume=Vol-2482
|authors=Takahiro Komamizu
|dblpUrl=https://dblp.org/rec/conf/cikm/Komamizu18
}}
==Graph Analytical Re-Ranking for Entity Search==
Graph Analytical Re-ranking for Entity Search Takahiro Komamizu Nagoya University Nagoya, Japan taka-coma@acm.org keywords. Entity search is important for users who investigate entities themselves as well as relationships Abstract among entities. Due to its importance, several open tasks for entity search have been published (for in- Entity search is a fundamental task in Linked stance, INEX-LD [WKC+ 12], QALD [UNH+ 17], and Data (LD). The task is, given a keyword DBpedia-Entity [HNX+ 17]). DBpedia-Entity is the search query, to retrieve a set of entities in LD most recent open entity search task composing the which are relevant to the query. The state-of- existing open entity search tasks and contains com- the-art approaches for entity search are based prehensive evaluation results for existing entity search on information retrieval technologies such as methods. Therefore, this paper deals with this task. TF-IDF vectorization and ranking models. In DBpedia-Entity task, recent methods are in- This paper examines the approaches by apply- spired from information retrieval domains, such as ing a traditional evaluation metrics, recall@k, BM25 and language model (see their website1 ), how- and shows ranking qualities still room left ever, there are few methods using graph analyti- for improvements. In order to improve the cal methods (e.g., PageRank [PBMW99]). The ex- ranking qualities, this paper explores pos- isting methods are based on occurrences of terms; sibilities of graph analytical methods. LD BM25 is a common ranking model-based TF-IDF vec- is regarded as a large graph, graph analyti- torization, language model considers probabilities of cal approaches are therefore appropriate for co-occurrence of terms, and, fielded extension meth- this purpose. Since query-based graph ana- ods over the former basic methods are also included. lytical approaches fit to entity search tasks, Fielded extension methods give high weights for im- this paper proposes a personalized PageRank- portant attributes of documents (e.g., titles of Web based re-ranking method, PPRSD (Person- pages). On the other hand, interestingly, there are alized PageRank based Score Distribution), few methods using graph analytical methods such as for retrieved results by the state-of-the-art. PageRank, even though LD is represented as a graph The experimental evaluation recognizes im- in nature. This raises a question: Are graph analytical provements but its results are not satisfac- methods not appropriate for entity search tasks? tory, yet. For further improvements, this pa- per reports investigations about relationship To answer the question, this paper firstly analyzes between queries and entities in terms of path the existing methods. [HNX+ 17] indicates that exist- lengths on the graph, and discusses future di- ing methods achieve 0.46 NDCG@10 score and 0.55 rections for graph analytical approaches. NDCG@100 score, but it is not clear how far the achievements from goals are. NDCG (Normalized 1 Introduction Discounted Cumulative Gain) is a standard way of ranking evaluation, which reasonably compares rank- Linked Data (LD) [BHB09] which started by Sir ing methods, but it hides potentials for improvement. Tim Berners-Lee has become an important knowledge Therefore, this paper applies a traditional evaluation source, and entity search for LD [PMZ10] is a funda- metrics, recall@k, which is a ratio of relevant answers mental task which retrieves entities in LD for query in top-k results over the total number of relevant an- Copyright © CIKM 2018 for the individual papers by the papers' swers. Thus, recall@k indicates that how many rele- authors. Copyright © CIKM 2018 for the volume as a collection 1 http://tiny.cc/dbpedia-entity by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). vant answers are absent in top-k results. The anal- 2 State of Current Entity Search ysis results shown the Total column in Table 1, re- This work explores the future directions of entity call@10, recall@100, and recall@1000 are maximally search, to this end, this paper investigates the current 0.2872, 0.6912, and 0.8708, respectively. state of entity search, especially this paper sticks to a The investigation results indicate that there are still leading benchmark, DBpedia-Entity v2 [HNX+ 17]. room left for improving rankings. The low recall for As shown in the benchmark, there are various top-10 results and the high recall for top-1000 results approaches which are mainly based on informa- imply that large amount of relevant results are within tion retrieval and natural language processing tech- 1000 results but most of them are below top-10. There- niques. The list of approaches include fundamental ap- fore, this paper attempts to improve the ranking by re- proaches: BM25 [RZ09], BM25-CA [RZ09], LM (Lan- ranking, which arrange the ranking by applying differ- guage Modeling) [PC98], SDM (Sequential Depen- ent ranking criteria. It is reasonable to take graph dency Model) [MC05], PRMS (Probabilistic Model for topological features into account due to the nature Semistructured Data) [KXC09], and MLM-all (Mix- of LD. Therefore, this paper applies graph analytical ture of Language Models) [OC03]; fielded extension methods for re-ranking. The result for the re-ranking approaches: MLM-CA [OC03], FSDM (Fielded Se- method is expected to be an answer to the question. quential Dependence Model) [ZKN15], and BM25F- CA [RZ09]; extended approaches by entity linking In consequence, this paper arranges the aforemen- technique [HBB16] for query: LM-ELR [HBB16], tioned question to the following question. Do graph SDM-ELR [HBB16], and FSDM-ELR [HBB16]. analysis-based re-ranking methods improve the ranking These works are based on a fielded document con- quality? This paper attempts to take graph analyti- struction method in [Has18]. As an overall structure, cal methods into account and proposes a re-ranking each entity has 1000 fields together with three addi- method PPRSD (Personalized PageRank based Score tional fields. The 1000 fields are corresponding with Distribution) which distributes calculated relevance top 1000 frequent predicates in DBpedia, and the ad- scores by the state-of-the-art in a personalized PageR- ditional fields are heuristically constructed such that ank manner. Test of PPRSD gives the following an- one is “name” field which is constitution of predicates swer, the graph analysis-based re-ranking method can rdfs:label and foaf:name; another is “types” field improve the ranking quality but the improvement is which contains rdf:type predicate and predicates end- not very significant. ing in “subject”, and the other is “contents” field which In order to find future directions based on graph holds the contents of all fields of connected entities ex- analytical methods for improving entity search, this cept those connected by owl:sameAs to remove same paper performs investigations and provides insights. entities in different languages. Aforementioned ap- This paper poses a question for results of the prelimi- proaches use parts of the fielded documents as follows: nary evaluation by recall@k, that is, why recall@1000 BM25, LM and SDM use the contents field; and MLM- is not perfect yet? To answer the question, this pa- all, PRMS and FSDM use top-10 fields. The field per investigates relationship between query terms and extension approaches are differentiated by settings of relevant entities for the query, and the investigation field weights (e.g., MLM-all uses equal weights for all reveals that some terms only exist on distant literals fields, while PRMS learns weights for fields). from relevant entities. Additionally, this paper obtains To investigate the qualities of these approaches, this a clue for selection of predicates connecting to literals paper tests more intuitive metrics recall@k in addi- w.r.t. different distances from the entities. Based on tion to NDCG which is shown in [HNX+ 17]. The these investigations, this paper puts discussion on fu- NDCG results are copied to Table 4 (rows of not *- ture directions based on graph analytical approaches ed method names correspond to the original results for entity search. shown in [HNX+ 17]). The NDCG result shows com- parative ranking qualities among these approaches. The following sections discuss the detail for get- While, NDCG is not a clear indicator for distances ting the answers to the questions. Section 2 intro- from goals. Therefore, this paper investigates more duces briefly the state-of-the-art shown in [HNX+ 17] clear indicator, recall@k (Eqn. 1) which reveals ratio and showcases the preliminary evaluation in terms of of relevant results in top-k. recall@k metrics, and Section 3 explains the idea and detail of PPRSD, and Section 4 evaluates the state-of- the number of relevant items in top-k the-art and PPRSD using the test collection and shows recall @k = the total number of relevant items the answers to the aforementioned questions. Section 5 (1) displays additional investigations and insights for the Table 1 displays recall@k (k ∈ {10, 100, 1000}) and future directions, and Section 6 concludes this paper. it indicates that more than 80% of relevant results are Table 1: Recall@k (k = 10, 100, 1000). Each row corresponds with existing approaches, and the last row is maximum recall score among them. For each column, the best score is boldface, underlined, and lined in the bottom. The bottom row indicates gaps from recall@k values (k = 10, 100) from recall@1000, which claims that large amount of relevant results are below top-10. SemSearch ES INEX-LD ListSearch QALD-2 Total Model @10 @100 @1000 @10 @100 @1000 @10 @100 @1000 @10 @100 @1000 @10 @100 @1000 BM25 .2563 .6669 .9280 .1730 .4860 .7554 .1093 .4598 .7221 .1891 .4677 .6929 .1823 .5175 .7703 PRMS .3719 .7499 .9412 .2312 .5339 .7796 .1839 .5476 .7525 .2273 .5428 .7420 .2522 .5919 .8009 MLM-all .3887 .7705 .9412 .2343 .5527 .7796 .1840 .5655 .7525 .2280 .5706 .7420 .2571 .6136 .8009 LM .3812 .8236 .9412 .2425 .5807 .7796 .1899 .5772 .7525 .2355 .5910 .7420 .2607 .6413 .8009 SDM .3884 .8581 .9865 .2409 .6224 .8567 .1987 .6121 .8256 .2398 .5921 .7991 .2659 .6674 .8633 LM-ELR .3863 .8278 .9412 .2364 .5894 .7796 .1913 .5940 .7536 .2474 .5909 .7401 .2646 .6483 .8006 SDM-ELR .3898 .8581 .9865 .2366 .6307 .8567 .2105 .6180 .8256 .2589 .6172 .7991 .2739 .6782 .8633 MLM-CA .4096 .7843 .9420 .2249 .5917 .8051 .1861 .5834 .8038 .2377 .5953 .7894 .2639 .6370 .8329 BM25-CA .3991 .8326 .9766 .2372 .6266 .8603 .2110 .6261 .8431 .2650 .6157 .8164 .2782 .6727 .8708 FSDM .4459 .8515 .9581 .2390 .6153 .8191 .1980 .5999 .8175 .2466 .6102 .7970 .2812 .6667 .8455 BM25F-CA .4097 .8707 .9704 .2607 .6526 .8544 .2042 .6189 .8325 .2548 .6341 .8157 .2811 .6912 .8653 FSDM-ELR .4536 .8539 .9562 .2477 .6253 .8191 .2022 .6075 .8162 .2507 .6275 .7970 .2872 .6765 .8450 max .4536 .8707 .9865 .2607 .6526 .8603 .2110 .6261 .8431 .2650 .6341 .8164 .2872 .6912 .8708 gap .5329 .1158 — .5996 .3077 — .6321 .2170 — .5514 .1823 — .5836 .1796 — Table 2: Recall@k (k = 10, 100). Each row corresponds to the maximum recall@k value among re-ranked existing approaches. SemSearch ES INEX-LD ListSearch QALD-2 Total Re-ranking method @10 @100 @10 @100 @10 @100 @10 @100 @10 @100 PageRank .1545 .4664 .1171 .3639 .1059 .4438 .1561 .4519 .1344 .4198 Personalized PageRank .1632 .4779 .1228 .3822 .1146 .4524 .1613 .4587 .1397 .4355 included in top-1000 but only 20% to 45% of them The subsequent sections introduce naïve baseline are included in top-10, which indicates there are room approaches and the proposed re-ranking method, left for improving rankings. The recalls are calculated PPRSD. Section 3.1 introduces re-ranking methods on the top-1000 results presented in the benchmark via PageRank [PBMW99] and personalized PageR- data2 . The boldface and underlined cells in the ta- ank [Hav02], and introduces a preliminary evaluation ble show maximum recall scores for tasks and k. All of these methods. Then, Section 3.2 explains PPRSD methods have low recall@10 as well as recall@100, but which utilizes both results of the state-of-the-art and still high recall@1000, meaning that ranking perfor- advantages of personalized PageRank. mance should be improved. The gap row in the table emphasizes that top-10 results have large room left for 3.1 Naïve Graph Analytical Re-ranking improvements. As discussed above, graph analytical approaches are reasonable for re-ranking criteria, however, with a 3 PageRank-based Re-ranking little consideration, global evaluation methods like This work attempts to improve the ranking qualities PageRank do not make sense for ranking entities with by graph analytical re-ranking methods. LD is mod- respect to input keyword queries. Roughly speaking, eled as a labeled graph, it is therefore reasonable to PageRank evaluates vertices having lots of incoming apply graph analytical approaches to evaluate values links as important. Therefore, when PageRank is ap- of entities. In particular, this paper explores feasibil- plied to the data graph G, PageRank gives an order ity of PageRank [PBMW99], which is popular graph of vertices which is independent from input queries. analytical methods to originally evaluate Web pages Examinations for the global rankings show bad results and has been applied for many other domains. (this paper does not include this because it is obvious). This paper models LD data as data graph (Def. 1) In order to test PageRank and personalized PageR- ank in a re-ranking manner, this work utilizes an in- Definition 1 (Data Graph) Given LD data, data sight from the recall@k results in Table 1. The insight graph G = (V, E) is a graph, where set V = R ∪ L ∪ B is that the top-1000 results by existing methods in- of vertices are union of set R of entities, set L of lit- clude more than 80% of relevant results. Thus, the erals and set B of blank nodes, and set E ⊆ V × P × V idea of re-ranking with PageRank and personalized of edges between vertices with predicates P as labels. PageRank is to filter top-1000 result entities by the existing methods and to apply the graph analytical ap- 2 https://github.com/iai-group/DBpedia-Entity/tree/ proaches. To do so, an induced subgraph (Definition 2) master/runs/v2 for the top-1000 result entities are extracted. Definition 2 (Induced Subgraph) Given set V 0 of scores than those on ranks. Additionally, the rele- vertices, induced subgraph G0 = (V 0 , E 0 ) of graph G = vance scores are more sophisticated than just count- (V, E) over V 0 is a subgraph of G such that V 0 ⊆ V ing matching entities as naïve personalized PageRank- and E 0 = (V 0 × V 0 ) ∩ E. based re-ranking approach (s in Eqn. 3). To realize this idea, this work arranges the personal- On the induced subgraph G0 extracted from top- ized PageRank formulation shown in Eqn. 3 to include 1000 results, PageRank and personalized PageRank the relevance scores calculated by the state-of-the-art values are calculated as Eqn. 2 and Eqn. 3. In Eqn. 2, as Eqn. 4 called PPRSD (stands for Personalized pr is a PageRank vector with 1000 length, A is a PageRank based Score Distribution). 1000 × 1000 adjacency matrix of G0 , e is 1000-length vector which elements are all 1, and d is a damping pprsdq = (1 − d) · pprsdq A + d · t (4) factor which is the probability of random jumps. Sim- ilarly, in Eqn 3, pprq is a 1000-length PageRank vector where pprsdq is a 1000-length relevance score vector for query q, A is an adjacency matrix as PageRank, s is of PPRSD. The personalized vector s is redefined as t, 1000-length personalized vector for q, which elements where each element ti of entity vi ∈ V 0 stores a rele- corresponding with matching entities for q are 1 and vance score of q to vi calculated by one of the state-of- other elements are 0, and d is a damping factor. the-art method. Log likelihood-based relevance scores pr = (1 − d) · prA + d · e (2) (i.e., LM, MLM, SDM, FSDM, PRMS, and their vari- ations) are negative values in nature, therefore, these pprq = (1 − d) · pprq A + d · s (3) scores are converted to positive numbers by applying A preliminary experiment over these naïve re- exponential function. In addition, the converted scores ranking methods shows worse results than the state- are quite small (e.g., 10−34 ) because the values in the of-the-art. The preliminary experiment tests the feasi- log function are products of probabilities, therefore, bility of aforementioned methods (PageRank and per- the converted scores are multiplied by positive num- sonalized PageRank-based re-ranking methods) on the ber so as to make the scores comparable with those DBpedia-Entity v2 benchmark [HNX+ 17]. The re- of the other methods. As PageRank-based methods ranking approaches are applied for all the state-of- compute the relevance score vectors (i.e., pr in Eqn. 2 the-art methods listed in Table 1. Table 2 displays and ppr in Eqn. 3), PPRSD also computes the rele- maximum recall@k values among the applied methods vance score vector, pprsd, by the power method. Re- of PageRank and personalized PageRank, separately. sult entities ranked by PPRSD are of ordering in the Amongst PageRank and personalized PageRank, per- relevance scores. sonalized PageRank has achieved better performance than PageRank, therefore, taking relevance to queries 4 Experimental Evaluation into account results better ranking qualities. Compar- The experiment in this paper attempts to confirm ing recall@k of the state-of-the-art shown in Table 1, the re-ranking method, PPRSD, improves the rank- the re-ranking methods are mostly worse then them. ing qualities in terms of both recall@k and NDCG@k. Consequently, re-ranking methods should more rely on Since PPRSD relies on the results of the state-of- the state-of-the-art. the-art, this experiment uses the standard benchmark 3.2 Re-ranking by Score Distribution dataset [HNX+ 17]3 of entity search on DBpedia. Rel- evance scores for entities in the state-of-the-arts are The preliminary evaluation on the naïve re-ranking obtained from the website of the benchmark4 . This methods reveal two facts: one is personalized experimental evaluation attempts to answer the follow- PageRank-based re-ranking is superior to PageRank- ing question: Does the re-ranking method improve the based re-ranking, and the other is the state-of-the-art state-of-the-are? And, how large or small the improve- are still more powerful than simple graph analytical ments are? In order to answer the question, PPRSD approaches. Therefore, the facts suggest that person- and the state-of-the-art are compared by recall@k alized PageRank-based method with utilizing results (Eqn. 1) and NDCG@k (Eqn. 5) which is a ratio of of the state-of-the-art can be a better choice. The rest DCG@k (Eqn. 6) over the ideal value of DCG@k re- of this section introduces how to realize it. ferred as to IDCG@k. The main idea of the proposed approach is that utilizing relevance scores for re-ranking algorithm via DCG@k NDCG@k = (5) personalized PageRank. The state-of-the-art rank en- IDCG@k tities by their own relevance scores, the scores indi- 3 http://tiny.cc/dbpedia-entity cate relative relevance degrees among the resulting en- 4 https://github.com/iai-group/DBpedia-Entity/tree/ tities. That is, there are more or less gaps on relevance master/runs/v2 Table 3 shows PPRSD successfully improves rank- ing qualities of the state-of-the-art. The Total col- 0.30 0.30 umn represents the ranking qualities among all tasks. 0.25 0.25 0.20 0.20 The column indicates that 7 over 12 methods have recall@10 recall@10 0.15 0.15 been improved by PPRSD in recall@10 and 8 meth- 0.10 0.10 ods have also been improved by PPRSD but 2 meth- 0.05 0.05 ods have been degraded the ranking qualities in re- 0.00 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.00 0.05 0.10 0.15 0.20 call@100. This indicates that PPRSD successfully im- damping factor damping factor proves the ranking qualities. Note that PPRSD im- (a) Damping factor in 0 to 1. (b) Damping factor in 0 to 0.2. proves not only elementary approaches (e.g., BM25) but also more sophisticated approaches (e.g., BM25F- Figure 1: Effect of damping factor. Lines represent CA and FSDM-ELR). base methods for PPRSD. (a) shows recall@10 values Table 4 also shows PPRSD successfully improves for damping factor 0 to 1 and realizes damping factors ranking qualities of the state-of-the-art. Recall@k in 0 to 0.2 are optimal, therefore, (b) shows that range and NDCG@k obviously have correlation, therefore, in fine granularity. the improvements should be confirmed as the success Xk 2reli − 1 shown in recall@k. As expected, the best results are DCG@k = (6) all of PPRSD-based methods. It is worthy to note that log2 (i + 1) i=1 the improvement ratios shown in imp. are larger than The subsequent sections discuss the comparison of those of recall@k, indicating that PPRSD improves the ranking qualities between the original methods and the rankings not only by just more relevant entities in the re-ranked methods by PPRSD. More specifically, the rankings but also better positions of relevant enti- Section 4.1 introduces an empirical study for deter- ties in the rankings. Since NDCG@k is good at rela- mining damping factor d in Eqn. 4, and, based on the tive comparison between rankings, the results confirm choice of the damping factor, Section 4.2 discusses the the improvements of rankings. For example, BM25- comparison between the PPRSD-based methods over CA* is the best ranking method in terms of recall@10 the original methods. for QALD-2 task, however, BM25F-CA* is superior to BM25-CA* and the best in terms of NDCG@10 for the 4.1 Effect of Damping Factor same task. This means that BM25F-CA* have less rel- evant entities in top-10 rankings but more relevant en- Figure 1 showcases effects of damping factor in var- tities are in the earlier positions in the top-10 rankings. ious state-of-the-art which PPRSD is applied, and it Consequently, evaluation based on NDCG@k confirms reveals that smaller damping factor (i.e., around 0.1) the ranking improvements by PPRSD. achieves the best performances. In the figure, horizon- tal axis expresses damping factor which ranges 0 to 1.0 in Figure 1(a) and 0 to 0.2 in Figure 1(b), vertical axis 5 Investigation for Improvements represents recall@10, and lines in the figure represent This section explores possibilities of further improve- the state-of-the-art. FSDM-ELR and BM25F-CA per- ments for the state-of-the-art and PPRSD-based re- form the best among the state-of-the-art in the figure ranking methods in the following aspect: Why re- around damping factor 0.1. In consequence, keeping call@1000 is not 100% yet? Since PPRSD is based 10% of relevance scores is the best choice for PPRSD, on the state-of-the-art, the upper bound of ranking so d = 0.1 is used for the later experiments. qualities is limited by them and improving the state- of-the-art is also important for further improvement of 4.2 Overall Evaluation PPRSD. Therefore, this work investigates the reason Table 3 and Table 4 display the comparisons of values why top-1000 results have not been perfect yet. To an- of recall@k for the former and NDCG@k for the latter swer the question, this paper investigates path lengths among PPRSD and the state-of-the-art. The Model from relevant entities to entities which literals contain row of the table represents tasks of entity search and an input keyword query term (detail in Section 5.1). values of k, and the left-most column shows lists of The investigation reveals that there are still space left the state-of-the-art and re-ranked versions (which are for including literals within larger distances (i.e., 3, 4, represented by *) of them by PPRSD. In addition, each and 5 hops). Obviously, taking longer paths (or se- group of rows corresponding with a method includes quences of predicates) into account entails explosion imp. row which emphasizes the improvement ratio by of the number literals included into documents of enti- PPRSD. Cells contain recall@k values, and the best ties. As a result, each entity gets noisy documents, and value in a row is emphasized by boldface and underline. it is easy to imagine that the noisy documents degrade Table 3: Recall@k (k=10, 100). Model indicates task types of queries, and top-k indicates the selected k values (10 or 100). Each cell contains a recall@k value for corresponding condition. For each column, the best score is boldface and underlined. The most-left column lists the state-of-the-art and re-ranked versions of them by PPRSD (corresponding with *-ed names). Each group of rows corresponding with the state-of-the-art includes imp. row indicating the ratio of the improvement by PPRSD. SemSearch ES INEX-LD ListSearch QALD-2 Total Model @10 @100 @10 @100 @10 @100 @10 @100 @10 @100 BM25 .2563 .6669 .1730 .4860 .1093 .4598 .1891 .4677 .1823 .5175 BM25* .2735 .6952 .1867 .5144 .1279 .4809 .2036 .5044 .1983 .5466 imp. +6.52% +3.64% +5.38% +5.21% +13.54% +3.70% +7.19% +6.33% +7.57% +4.70% PRMS .3719 .7499 .2312 .5339 .1839 .5476 .2273 .5428 .2522 .5919 PRMS* .3719 .7499 .2312 .5339 .1839 .5476 .2273 .5428 .2522 .5919 imp. 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% MLM-all .3887 .7705 .2343 .5527 .1840 .5655 .2280 .5706 .2571 .6136 MLM-all* .3887 .7705 .2343 .5527 .1840 .5655 .2280 .5706 .2571 .6136 imp. 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% LM .3812 .8236 .2425 .5807 .1899 .5772 .2355 .5910 .2607 .6413 LM* .3812 .8222 .2425 .5807 .1899 .5772 .2355 .5910 .2607 .6410 imp. 0.00% -0.14% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% -0.03% SDM .3884 .8581 .2409 .6224 .1987 .6121 .2398 .5921 .2659 .6674 SDM* .3925 .8602 .2409 .6232 .1991 .6134 .2402 .5921 .2671 .6684 imp. +1.11% +0.24% 0.00% +0.08% +0.05% +0.16% +0.17% 0.00% +0.41% +0.13% LM-ELR .3863 .8278 .2364 .5894 .1913 .5940 .2474 .5909 .2646 .6483 LM-ELR* .3863 .8231 .2364 .5894 .1913 .5945 .2474 .5909 .2646 .6473 imp. 0.00% -0.43% 0.00% 0.00% 0.00% +0.08% 0.00% 0.00% 0.00% -0.12% SDM-ELR .3898 .8581 .2366 .6307 .2105 .6180 .2589 .6172 .2739 .6782 SDM-ELR* .3936 .8590 .2366 .6305 .2107 .6190 .2589 .6172 .2749 .6786 imp. +1.03% +0.09% 0.00% -0.03% +0.10% +0.18% 0.00% 0.00% +0.37% +0.06% MLM-CA .4096 .7843 .2249 .5917 .1861 .5834 .2377 .5953 .2639 .6370 MLM-CA* .4096 .7843 .2249 .5919 .1861 .5834 .2377 .5953 .2639 .6371 imp. 0.00% 0.00% 0.00% +0.03% 0.00% 0.00% 0.00% 0.00% 0.00% +0.02% BM25-CA .3991 .8326 .2372 .6266 .2110 .6261 .2650 .6157 .2782 .6727 BM25-CA* .4085 .8345 .2350 .6301 .2151 .6278 .2701 .6329 .2826 .6795 imp. +2.26% +0.12% -0.38% +0.35% +2.27% +0.38% +0.57% +2.79% +1.33% +0.97% FSDM .4459 .8515 .2390 .6153 .1980 .5999 .2466 .6102 .2812 .6667 FSDM* .4463 .8528 .2390 .6156 .1980 .5998 .2466 .6103 .2813 .6671 imp. +0.09% +0.15% 0.00% +0.05% 0.00% -0.02% 0.00% +0.02% +0.04% +0.06% BM25F-CA .4097 .8707 .2607 .6526 .2042 .6189 .2548 .6341 .2811 .6912 BM25F-CA* .4218 .8753 .2628 .6555 .2047 .6226 .2613 .6423 .2865 .6963 imp. +2.95% +0.45% +1.42% +0.34% +0.73% +0.69% +2.00% +1.39% +1.99% +0.72% FSDM-ELR .4536 .8539 .2477 .6253 .2022 .6075 .2507 .6275 .2872 .6765 FSDM-ELR* .4540 .8552 .2477 .6256 .2022 .6075 .2507 .6277 .2873 .6769 imp. +0.09% +0.15% 0.00% +0.05% 0.00% 0.00% 0.00% +0.03% +0.03% +0.06% ranking qualities. To obtain hints for preferable paths estimated from the preliminary evaluation on recall@k for literals, this paper investigates the commonalities in Table 1, that is, recall@1000 values are less than of tail predicates in the paths (detail in Section 5.2). 86% (except SemSearch ES task which is designed for The investigation reveals that tail predicates should be direct matching with terms). In other words, 14% are different for different lengths of the paths. below top-1000 results. In order to answer question why recall@1000 is not 5.1 Distance from Query Term perfect?, this paper attempts to realize the relation The state-of-the-art rely on terms occurring within two between relevant answers and the numbers of hops hops at most, modeled as fielded documents. Sec- from query terms. To this end, this work investi- tion 2 introduces the fields of entities taken into ac- gates the minimum distances from relevant entities to count for the state-of-the-art, and the contents field query terms by performing SPARQL queries in terms includes contents of one-hop away entities. This im- of the distances. SPARQL queries are generated with plies that no method considers terms occurring within a graph pattern of a sequential path from given entity longer hops away. r ∈ R to literal ` ∈ L which contains query term t, The fielded document construction limits the pos- and predicates and resources between r and ` are ful- sibilities to reach to the relevant results due to the filled by free variables. Figure 3 illustrates a n-length absence of query terms in the documents. This fact is graph pattern for entity r and query term t. Based on Table 4: NDCG@k (k=10, 100). Model indicates task types of queries, and top-k indicates the selected k values (10 or 100). Each cell contains an NDCG@k value for corresponding condition. For each column, the best score is boldface and underlined. The most-left column lists the state-of-the-art and re-ranked versions of them by PPRSD (corresponding with *-ed names). Each group of rows corresponding with the state-of-the-art includes imp. row indicating the ratio of the improvement by PPRSD. SemSearch ES INEX-LD ListSearch QALD-2 Total Model @10 @100 @10 @100 @10 @100 @10 @100 @10 @100 BM25 .2497 .4110 .1828 .3612 .0627 .3302 .2751 .3366 .2558 .3582 BM25* .2839 .4463 .2903 .3816 .2534 .3543 .2953 .3624 .2812 .3847 imp. +13.70% +8.59% +58.81% +5.65% +304.15% +7.30% +7.34% +7.66% +9.93% +7.40% PRMS .5340 .6108 .3590 .4295 .3684 .4436 .3151 .4026 .3905 .4688 PRMS* .5388 .6162 .3590 .4295 .3684 .4436 .3151 .4026 .3913 .4698 imp. +0.90% +0.88% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% +0.20% +0.21% MLM-all .5528 .6247 .3752 .4493 .3712 .4577 .3249 .4208 .4021 .4852 MLM-all* .5578 .6303 .3752 .4493 .3712 .4577 .3249 .4208 .4030 .4863 imp. +0.90% +0.90% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% +0.22% +0.23% LM .5555 .6475 .3999 .4745 .3925 .4723 .3412 .4338 .4182 .5036 LM* .5606 .6529 .3999 .4745 .3925 .4723 .3413 .4338 .4191 .5046 imp. +0.92% +0.83% 0.00% 0.00% 0.00% 0.00% +0.03% 0.00% +0.22% +0.20% SDM .5535 .6672 .4030 .4911 .3961 .4900 .3390 .4274 .4185 .5143 SDM* .5564 .6718 .4030 .4912 .3961 .4902 .3394 .4274 .4191 .5152 imp. +0.52% +0.69% 0.00% +0.02% 0.00% +0.04% +0.12% 0.00% +0.14% +0.17% LM-ELR .5554 .6469 .4040 .4816 .3992 .4845 .3491 .4383 .4230 .5093 LM-ELR* .5608 .6518 .4040 .4816 .3992 .4847 .3491 .4383 .4240 .5103 imp. +0.97% +0.76% 0.00% 0.00% 0.00% +0.04% 0.00% 0.00% +0.24% +0.20% SDM-ELR .5548 .6680 .4104 .4988 .4123 .4992 .3446 .4363 .4261 .5211 SDM-ELR* .5577 .6716 .4105 .4988 .4129 .4999 .3449 .4364 .4271 .5218 imp. +0.52% +0.54% +0.02% 0.00% +0.15% +0.14% +0.09% +0.02% +0.23% +0.13% MLM-CA .6247 .6854 .4029 .4796 .4021 .4786 .3365 .4301 .4365 .5143 MLM-CA* .6249 .6895 .4029 .4798 .4020 .4786 .3365 .4301 .4361 .5150 imp. +0.03% +0.60% 0.00% +0.04% -0.02% 0.00% 0.00% 0.00% -0.09% +0.14% BM25-CA .5858 .6883 .4120 .5050 .4220 .5142 .3566 .4426 .4399 .5329 BM25-CA* .6040 .7024 .4132 .5048 .4302 .5181 .3607 .4544 .4475 .5404 imp. +3.11% +2.05% +0.29% -0.04% +1.94% +0.76% +1.15% +2.67% +1.73% +1.41% FSDM .6521 .7220 .4214 .5043 .4196 .4952 .3401 .4358 .4524 .5342 FSDM* .6549 .7269 .4214 .5044 .4196 .4951 .3401 .4359 .4527 .5350 imp. +0.43% +0.68% 0.00% +0.02% 0.00% -0.02% 0.00% +0.02% +0.07% +0.15% BM25F-CA .6281 .7200 .4394 .5296 .4252 .5106 .3689 .4614 .4605 .5505 BM25F-CA* .6444 .7361 .4494 .5336 .4288 .5166 .3699 .4672 .4673 .5581 imp. +2.60% +2.24% +2.28% +0.76% +0.85% +1.18% +0.27% +1.26% +1.48% +1.38% FSDM-ELR .6563 .7257 .4354 .5134 .4220 .4985 .3468 .4456 .4590 .5408 FSDM-ELR* .6572 .7307 .4354 .5135 .4219 .4985 .3466 .4455 .4587 .5416 imp. +0.14% +0.69% 0.00% +0.02% -0.02% 0.00% -0.06% -0.02% -0.07% +0.15% the pattern, ASK query (which is an indicator func- recorded; and (5) the obtained distances for each rel- tion query in SPARQL) is generated to examine such evant entities are analyzed. Obtained distances for a pattern exists. Following SPARQL query displays ex- relevant entity of a query may be different term by amples of generated ASK queries for distance 2. term. Therefore, this investigation analyses minimum distance, average distance and maximum distance for ASK{ hri ?p0 ?v0. ?v0 ?p1 ?v1. each relevant entity of a query. Consequently, these ?v1 ?p2 ?l. ?l bif:contains ’t’. distances are individually gathered and calculate their FILTER isLiteral(?l).} averages to observe how long distances required to This investigation measures the minimum distance touch query terms from relevant entities. which satisfies the ASK query corresponding with the Figure 2 showcases the analyzed distances with re- distance. The procedure of this investigation is that: spect to tasks as well as with regardless of tasks (i.e., (1) given query q, relevant entity list Aq for q is ob- Total ). In the figure, bars represent ratios of relevant tained from the benchmark dataset; (2) parse q into entities having the number of hops (distances) to reach set Tq of terms; (3) examine ASK queries from length 0 from query terms, and dashed lines express cumulative to maximum length (5 for this investigation) for each ratios of relevance entities. Three kinds of bars (light- pair of relevant entity r ∈ Aq and term t ∈ Tq ; (4) gray and oblique stripe bars, gray and horizontal stripe as soon as the ASK query is satisfied, the distance is bars, and black and crossing stripe bars) correspond 1.0 1.0 1.0 1.0 1.0 0.8 Ratio of results 0.8 0.8 0.8 0.8 Ratio of results Ratio of results Ratio of results Ratio of results 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0 1 2 3 4 0.0 0 1 2 3 4 0.0 0 1 2 3 4 0.0 0 1 2 3 4 0.0 0 1 2 3 4 The number of hops The number of hops The number of hops The number of hops The number of hops (a) Total (b) SemSearch ES (c) INEX-LD (d) ListSearch (e) QALD-2 Figure 2: The number of hops from relevant entities to query terms. Bars represent ratios of relevant entities having the number of hops (distances) to reach from query terms. Three kinds of bars (light-gray and oblique stripe bars, gray and horizontal stripe bars, and black and crossing stripe bars) correspond with minimum, average, and maximum distances, respectively. Dashed lines express cumulative ratios of relevance entities. Three kinds of lines (lines with triangles, those with circles, and those with squares) correspond with minimum, average, and maximum, respectively. An intuition to avoid this situation is to select 𝑝" 𝑝# 𝑝& 𝑝$%& 𝑝$%# 𝑝$ “good” paths from an entity which include meaningful r 𝑣" 𝑣# ... 𝑣$%& 𝑣$%# ... t ... literals for the entity. A naïve extension is to find paths from an entity to “good” entities and to include their documents (suppose the same approach to the state-of- n-hops the-art) into the document of the entity. This paper wants to clarify there is any difference between self- Figure 3: n-length path pattern generated for given descriptive literals and supportive literals for other en- entity r, query term t and distance n. Circular vertices tities. Self-descriptive literals explain well about target are resources and a square is a literal containing t. entities, while supportive literals explain supplemental facts about the target entities. Self-descriptive literals with minimum, average (rounded), and maximum dis- tend to be close to the targets, while supportive liter- tances. Similarly, three kinds of lines (lines with trian- als tend to relatively distant from the targets. There- gles, circles, and squares) correspond with minimum, fore, this investigation attempts to understand the dif- average (rounded), and maximum. ferences of predicates with ending literals (called tail Figure 2 indicates that at least one term is included predicates) between shorter and longer paths. The in- in literals directly connected with relevant entities, and vestigation is done in the following procedure: gathers Figure 2(a) indicates that most of the relevant enti- surveyed paths for each relevant entities using interme- ties are reachable from query terms within two hops diate results of the previous investigation (Section 5.1), on average, however, in terms of maximum distances, and analyzes the paths in terms of commonalities of still more than 10% relevant entities are not reach- the tail predicates. The commonalities are measured able within three hops. This fact answers the question for different lengths (i.e., 1 to 4) of the tail predicate se- why recall@1000 is not perfect? as some relevant en- |Y r ∩Y r | tities are still not found by the query terms due to quences by Jaccard index, Jaccard (Yir , Yjr ) = Yir ∪Yjr , | i j| the smaller distances to construct entity documents. where Yir is a set of i-length tail predicates of entity r This phenomenon is also marked on individual tasks and | · | is cardinality. except SemSeach ES task, which is a simple tasks so that queries in the task are more directly explaining Table 5 display commonalities of tail predicates requiring entities than others. among different lengths of paths for different tail lengths. The results reveal that commonalities of 5.2 Commonality of Tail Predicates of Paths tail predicates decrease as differences of path lengths increase. This fact indicates that literals reachable A simple solution for improving ranking qualities in in different path lengths should select different tail terms of the previous investigation is top include lit- predicates (e.g., rdfs:label is not always a good erals within more hops (i.e., 3 or more), however, it choice.). Due to the tremendous number of tail pred- is obvious that the solution incurs noisy entity docu- icates, the detailed analysis on what kind tail pred- ments by including unnecessary literals within larger icates are preferable in particular path lengths is left hops. The number of reachable entities in G increases for future work. Examples from rough analysis include very quickly as the distance increases. Therefore, ir- rdfs:label and rdfs:comment for 1-length paths and relevant entities contribute to entity documents. dbo:wikiPageWikiLinkText for 5-length paths. Table 5: Commonality (Jaccard index) of tail predicates of top-10 frequent paths from true results to query keywords. The numbers of top-most and left-most in the tables represent lengths of the paths. These tables show that only a part (less than 45%) of tail predicates which are related to basic documents of entities is shared with different lengths of paths. (a) tail length = 1 (b) tail length = 2 (c) tail length = 3 (d) tail length = 4 Path length Path length Path length Path length 1 2 3 4 5 2 3 4 5 3 4 5 4 5 1 1.000 0.399 0.220 0.250 0.198 2 1.000 0.205 0.250 0.190 3 1.000 0.250 0.282 4 1.000 0.449 2 0.399 1.000 0.429 0.342 0.316 3 0.205 1.000 0.325 0.316 4 0.250 1.000 0.481 5 0.449 1.000 3 0.220 0.429 1.000 0.389 0.439 4 0.250 0.325 1.000 0.449 5 0.282 0.481 1.000 4 0.250 0.342 0.389 1.000 0.449 5 0.190 0.316 0.449 1.000 5 0.198 0.316 0.439 0.449 1.000 5.3 Summary & Future Direction of-the-art by formulating as a re-ranking problem. For further improvements, this paper reports two inves- Summary. tigations about relationship between paths to literals This section investigates relationship between paths containing query terms and relevant entities to queries. and result entities in order to answer the question Why Results of the investigations support the improvement recall@1000 is not 100% yet? The answers of this inves- possibilities of graph analytical approaches, and devel- tigation are 2-fold: (1) literals in distant paths are ab- oping them is still left for future works. Consequently, sent from documents; and (2) setting of tail predicates this paper answers to the first question as Yes, they is universal for all lengths of paths. Although, the are appropriate, but there is still an issue on matching first problem is obvious, still the number of reachable entities in a graph. entities in more than two hops is extraordinary large, therefore, constructing documents from longer paths is Acknowledgments. computationally expensive. Additionally, in the naïve This work was partly supported by JSPS KAKENHI approach, the generated documents may include large Grant Number JP18K18056. amount of not quite relevant facts to entities. Future Directions. References To overcome the aforementioned problems, solutions [BHB09] Christian Bizer, Tom Heath, and Tim lies on graph analytical approaches as Section 3 show- Berners-Lee. Linked Data - The Story cases their possibilities. A basic idea is to select So Far. Int. J. Semantic Web Inf. Syst., appropriate reachable predicates and entities within 5(3):1–22, 2009. two or more hops. To this end, graph analyti- [BHP04] Andrey Balmin, Vagelis Hristidis, and cal approaches (e.g., PageRank, Random Walks) can Yannis Papakonstantinou. ObjectRank: be good choices. As discussed in this paper, non- Authority-Based Keyword Search in personalized PageRank and its families are not appro- Databases. In VLDB 2004, pages priate, meaning that global centralities do not help. 564–575, 2004. Therefore, customizable graph analytical approaches such as ObjectRank [BHP04] and random walk with [Has18] Faegheh Hasibi. Semantic Search with restart (RWR) [TFP08] are preferable. There are Knowledge Bases. PhD thesis, Norwe- some preliminary works based on this idea, namely gian University of Science and Technol- FORK [KOAK17], and RWRDoc [Kom18]. FORK has ogy, Trondheim, Norway, 2018. applies ObjectRank for entity search and it achieves better precision@k. While, RWRDoc has applies RWR [Hav02] Taher H. Haveliwala. Topic-sensitive for determining importances of reachable entities in PageRank. In WWW 2002, pages 517– terms of RWR scores and it slight improves in terms 526, 2002. of NDCG@k. These indicates that graph analytical approaches still leave space for improvements. [HBB16] Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg. Exploiting Entity 6 Conclusion Linking in Queries for Entity Retrieval. In ICTIR 2016, pages 209–218, 2016. This paper deals with entity search over Linked Data, analyzes the state-of-the-art in terms of recall@k, and [HNX+ 17] Faegheh Hasibi, Fedor Nikolaev, Chenyan reveals the possibilities of improvements of the state- Xiong, Krisztian Balog, Svein Erik Brats- of-the-art. Also, this paper indicates the feasibility of berg, Alexander Kotov, and Jamie Callan. graph analytical approaches for improving the state- DBpedia-Entity v2: A Test Collection for Entity Search. In SIGIR 2017, pages Napolitano. 7th Open Challenge on Ques- 1265–1268, 2017. tion Answering over Linked Data (QALD- 7). In ESWC 2017, pages 59–69, 2017. [KOAK17] Takahiro Komamizu, Sayami Okumura, Toshiyuki Amagasa, and Hiroyuki [WKC+ 12] Qiuyue Wang, Jaap Kamps, Kitagawa. FORK: Feedback-Aware Georgina Ramírez Camps, Maarten ObjectRank-Based Keyword Search over Marx, Anne Schuth, Martin Theobald, Linked Data. In AIRS 2017, pages 58–70, Sairam Gurajada, and Arunav Mishra. 2017. Overview of the INEX 2012 Linked Data Track. In CLEF 2012 Evaluation Labs [Kom18] Takahiro Komamizu. Learning Inter- and Workshop, 2012. pretable Entity Representation in Linked Data. In DEXA 2018, 2018. (to appear). [ZKN15] Nikita Zhiltsov, Alexander Kotov, and Fe- dor Nikolaev. Fielded Sequential Depen- [KXC09] Jinyoung Kim, Xiaobing Xue, and dence Model for Ad-Hoc Entity Retrieval W. Bruce Croft. A Probabilistic Retrieval in the Web of Data. In SIGIR 2015, pages Model for Semistructured Data. In ECIR 253–262, 2015. 2009, pages 228–239, 2009. [MC05] Donald Metzler and W. Bruce Croft. A Markov Random Field Model for Term Dependencies. In SIGIR 2005, pages 472– 479, 2005. [OC03] Paul Ogilvie and James P. Callan. Com- bining Document Representations for Known-item Search. In SIGIR 2003, pages 143–150, 2003. [PBMW99] Lawrence Page, Sergey Brin, Rajeev Mot- wani, and Terry Winograd. The PageR- ank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, November 1999. [PC98] Jay M. Ponte and W. Bruce Croft. A Lan- guage Modeling Approach to Information Retrieval. In SIGIR 1998, pages 275–281, 1998. [PMZ10] Jeffrey Pound, Peter Mika, and Hugo Zaragoza. Ad-hoc Object Retrieval in the Web of Data. In WWW 2010, pages 771– 780, 2010. [RZ09] Stephen E. Robertson and Hugo Zaragoza. The Probabilistic Rele- vance Framework: BM25 and Beyond. FTIR, 3(4):333–389, 2009. [TFP08] Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Random walk with restart: fast solutions and applications. Knowl. Inf. Syst., 14(3):327–346, 2008. [UNH+ 17] Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Bastian Haarmann, Anastasia Krithara, Michael Röder, and Giulio