Which entities are relevant for the story?

                           Pooja H. Oza                                      Laura Dietz
                   University of New Hampshire                      University of New Hampshire
                       Durham, NH 03824                                 Durham, NH 03824
                    pho1003@wildcats.unh.edu                             dietz@cs.unh.edu


                                                        Abstract
                       A crucial step in the construction of any event story or news report is
                       to identify entities involved in the story, such entities can come from a
                       larger background knowledge graph or from a text corpus with entity
                       links. Along with recognizing which entities are relevant to the story, it
                       is also important to select entities that are relevant to all aspects of the
                       story. In this work, we model and study di↵erent types of links between
                       the entities with the goal of identifying which link type is most useful
                       for the entity retrieval task. Our approach demonstrates the efficacy
                       of using entity links in relevant passages to identify relevant entities
                       and relations. This approach outperforms relational information from
                       knowledge graph links and bibliographic coupling. We evaluate di↵er-
                       ent approaches on a benchmark of open-domain queries. We find that
                       the co-occurrence of entities in the relevant text is one of the strongest
                       indicators (0.14 MAP), and hence should always be considered when
                       constructing a story.


1    Introduction
Entities are ubiquitous and central units to describe stories. Identifying which entities are relevant for any story
is at the core of the tasks, such as event identification or news reporting. The goal of entity retrieval is to identify
and retrieve entities that are relevant for an information need. In this work, we study the first crucial step of a
story construction i.e., identification and retrieval of a list of entities relevant to a story. We demonstrate how a
text corpus is central for obtaining good retrieval performance. As a byproduct of our approach, we also provide
a set of relevant text passages.
    Previous work of entity retrieval focuses on the link-structure of knowledge graphs, text surrounding the
entities, entity attributes such as type, category, name by matching the entity target type with the query target
type [GB17] or by applying language models to evaluate the category matching between queries and entities
[BBDR10]. To identify and retrieve entities relevant to a story, we explore three di↵erent types of links between
entities; i) relevant co-occurrence graph ii) unweighted link patterns iii) relevance-weighted bibliographic coupling.
In the relevant co-occurrence graph, we study the e↵ectiveness of the co-occurrence of entities, whereas in
unweighted link patterns we examine the impact of knowledge graph link structure in entity retrieval performance.
In the third link type, relevance-weighted bibliographic coupling, we investigate the e↵ect of the combination
of knowledge graph link structure with the relevance of the text documents. We provide further explanation of
these three link types in Section 3.
Copyright © by the paper’s authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Finlayson (eds.): Proceedings of the Text2Story’21 Workshop, Online, 1-April-
2021, published at http://ceur-ws.org/


                                                           41
   We envision that a story request is specified through texts such as ‘Radiocarbon dating background history’,
‘Antibiotics side-e↵ects’, ‘Antibiotics resistance’, ‘Deforestation causes’. A user may also be interested in retriev-
ing entities relevant to a larger theme that diversifies over multiple stories, such as ‘Antibiotics’. For such an
information need, we envision user to specify a multi-story request. We define multi-story request as a list of
multiple story requests connected through a common theme. For example, a multi-story query such as [‘Antibi-
otics side-e↵ects’, ‘Antibiotics resistance’] gives us entities relevant to the theme common among these stories
such as ‘Antibiotics’.

Task (Story-specific Entity Retrieval):
Given a story request as a text query, (1) identify which entities in the knowledge base are relevant, and (2)
associate each entity with a score of relative relevance. We assume access to a general-purpose knowledge base
and large text corpus.
   The formulation of our task is related to the task of entity retrieval in information retrieval [ZKN15], semantic
search [BFC+ 14], and question answering [BCFL13]. To construct a story, we need to retrieve a diverse set of
relevant entities that cover all the facets of a story. We interpret a multi-story request as an overarching set of
several stories sharing a common theme.
   The main challenge in entity retrieval is that it is not sufficient to rely on exact keyword matches as in TF-IDF.
We show that the keyword matching for entity retrieval is not an efficient approach in Section 4. Instead, we
find it beneficial to combine several weak indicators of relevance to identify a relevant set of entities.

Contributions:
To find a set of relevant entities, most related work either retrieves entities through matches in the knowledge
base entry and/or by examining link patterns in the knowledge base. We demonstrate, that it is much more
e↵ective to retrieve relevant text passages and use the occurrence of entities as well as the co-occurrence of two
entities as an indicator for the relevance of entities.
   While a full exploration of all prior approaches is out of scope for this work, our study suggests that more
attention to the textual context in which entities occur is a worthwhile endeavor.

2     Related Work
We review below related work in question answering, information retrieval, and semantic search.

2.1   Question Answering and Semantic Search
Many question answering tasks ask to retrieve one (of few) entities in response to a concise description. The
Dr. QA system [CFWB17] demonstrates that using a combination of information retrieval and neural network,
many open-domain entity-centric questions get answers from free text. An alternative is semantic parsing, in
which the question translates into a SPARQL query to select the relevant entities [BCFL13]. Many issues revolve
around matching terms in the question to properties, types, and how to use the link structure most e↵ectively
[ZHW+ 14]. Combined knowledge graph and corpus approaches have demonstrated potential [SA16].

2.2   Entity Retrieval
Several information retrieval models are modified to search in structured entries of knowledge graphs, using
the information of entity such as name, type, category, outlinks/inlinks [PMZ10, RCK12]. For example, the
Fielded Sequential Dependence Model model [ZKN15] uses unigram and sparse bigram matches of the query
in the knowledge graph fields name, category, and neighbors. Garigliotti and Balog [GB17] explore the use of
entity type (i.e. taxonomic) information to exploit the type-based similarity between queries and entities. While
only rarely used, text-based indicators from the context of an entity were found to be a strong feature for entity
ranking tasks [SDPP15, Die19].

2.3   Entity Set Expansion and Graph Walks
When applied to topically focused knowledge graphs, such as ConceptNet, random walks and other graph-
based methods are e↵ective in finding relevant entities [KZ12]. However, in application to large general-purpose


                                                       42
Figure 1: Depiction of three link types that are based on (a) link-patterns in the knowledge base such as
bidirectional links or unweighted bibliographic coupling (b) co-occurrence of entities in relevant passages (c)
bibliographic / co-coupling measures that incorporate the relevance of shared entities. Logical entities and
entity links are represented as stick figures, which are associated with the knowledge base metadata (depicted
as rectangles with links) or can be contained in passages. Documents with dash lines are of higher relevance.

knowledge graphs derived from Wikipedia, indicators from the link structure o↵er only disappointing performance
[BFC+ 14].
   A common approach is to start with a set of seed entities, then try to expand the set to additional entities that
are similar to the seed set. For example, Pantel et al. [PCB+ 09] represent entities through a vector representation,
then include the most similar entities. This method can be refined with neural networks [RPLVD19].

Most of the related work focuses on identifying a small set of correct entities, which explains the e↵ectiveness
of exact keyword matching and semantic parsing. In contrast, we focus on obtaining a relevant set of entities
that are relevant to di↵erent degrees, for which we show that a keyword matching approach is not sufficient. In
contrast, we find that one of the most useful indicators is to use the context of the entity mentions in relevant
text passages.

3     Approach: Identifying Entities Relevant for a Story
Previous work relies on keyword matches in entity names and categories, or graph walks on related entities. We
hypothesize that one of the most e↵ective indicators for the relevance of entities lies in the text surrounding
the entity mentions. We argue that entities which are co-mentioned in relevant text passages are likely to have
a relevant connection. We demonstrate that co-coupling patterns defined on the co-occurrence graph are more
e↵ective than co-coupling patterns on the general-purpose knowledge base. We also explore a hybrid approach,
where links for co coupling and bibliographic coupling are weighted by relevance using text retrieval indicators.
   We are making the following claims:
    • Claim 1: Finding entities through relevant passages is more e↵ective than searching in the knowledge base
      for entity entries (using short descriptions).
    • Claim 2: To predict the relevance of an entity link, entity co-occurrences in relevant passages are more
      e↵ective than link patterns on edges of a general-purpose knowledge base.
    • Claim 3: Count-based bibliographic and co-coupling similarities can be improved by weighting edges in the
      knowledge base by the relevance of the shared entities.
   We study these claims through the following approaches, both in isolation and in combination. We define
three link types between entities, which are based on knowledge base links, co-occurrences, and relevance-based
coupling as depicted in Figure 1 and defined in Sections 3.1–3.3.
   Our approach takes as an input a ranking of passages P for the relevant co-occurrence graph and for the
relevance weighted coupling measures we use knowledge base entries E.


                                                       43
KG-Entity:
We use BM25 retrieval model to generate rankings of the top 1000 entities using the description field of knowledge
base entries.

3.1   Relevant Co-occurrence Graph
We define the graph of relevant co-occurrences as follows: Using a large corpus of text passages, we retrieve a
ranking of passages for the story. Here we use the top 100 passages using BM25 model. After entity linking each
passages, we take note of pairs of entities ei , ej which co-occur in the passage P .

Relevance-based co-occurrences (Co-occ Relevance):
From the passage ranking we derive a relevance weight of the edge by accumulating reciprocal ranks of passages
in which the entities co-occur as in Equation 1.
                                                              X                    1
                                fecr (ei , ej ) =                                        , i 6= j                (1)
                                                                                rank(P )
                                                    8P :ei ,ej mentioned in P

We calculate the entity ranking by aggregating the edge weights of the incident links akin to the degree centrality.

Co-occurrence count (Co-occ Count):
We also explore a simpler alternative by simply counting the number of occurrences. We only count occurrences
in passages that were retrieved. Similar to Co-occ Relevance, we calculate the entity ranking by accumulating
the edge weights of the incident links.

Mention Freq:
We count the number of time an entity ei is mentioned in retrieved passages. We rank the entities based on the
frequency.

   Unlike link styles described below, the graph of relevant co-occurrences only uses the knowledge base through
the entity linker—it does not incorporate any relations provided in the given general-purpose knowledge base.

3.2   Unweighted Link Patterns
As baselines to support Claim 3, we compare to several widely-used link patterns that can be derived from any
general-purpose knowledge base. In this line of work, we are ignoring the semantics of di↵erent relationship
predicates. Concretely, we include the following approaches:

Direct links:
For every pair of entities (ei , ej ), we determine if a direct link exists between the entities ei and ej . We count
the number of di↵erent link occurrences, which are used as a relevance score. We consider four types of direct
links as follows:

 1. Outlinks: Number of outlinks from the entry of entity ei to entity ej .

 2. Inlinks: Number of the incoming link to the entry of ei from ej .

 3. Bidi: 1 if a bidirectional link exists and 0 otherwise. For a bidirectional link, both entities ei and ej link to
    each other.

 4. Undirected: Sum of outlinks and inlinks between ei and ej .

For each entity ei in Direct links, we derive entity ranking by accumulating the score of the incident links.


                                                             44
Coupling (baseline):
Bibliographic coupling and co-coupling are similarity measures that score a semantic relationship between two
entities based on the number of shared in or outlinks.

    1. Count-based Bibliographic Coupling (Biblio Count): Number of outlinks (i.e., citations) shared by entities
       ei and ej .

    2. Count-based Co-coupling (Co Count): Number of shared inlinks between the entities ei and ej .

Similar to other link types, we rank the entities by aggregating the edge weights for each entity ei .

3.3     Relevance-weighted Bibliographic Coupling
We further study a hybrid between link-based and content-based approaches. Traditionally, coupling measures
focus on counting the number of shared pages that link to or from both entities ei and ej . We suggest relevance-
weighted coupling measures by first retrieving a ranking of knowledge base entries E, then accumulating the
relevance scores of shared pages. We retrieve ranking of entries E using BM25 retrieval model.

Relevance-weighted Bibliographic Coupling (Biblio Relevance):
Accumulated rank scores for entries E that link from both entities ei and ej to E, as in Equation 2.
                                                               X
                                   febr (ei , ej ) =                           score(E), i 6= j               (2)
                                                       8E:ei ,ej linked to E


We calculate the entity rankings by aggregating the relevance scores of all the entities associated to the entity
ei .

Relevance-weighted Co-Coupling (Co Relevance):
Accumulating rank scores for entries E that link to both ei and ej (analogously to Equation 2). For entity
ranking of each entity ei , we calculate the edge weights of all the incident links of ei .

3.4     Combining Weighted Link-graphs
As some approaches might demonstrate their strengths only when combined, we study combinations with a
learning-to-rank approach (L2R). Each of our approaches produces a separate feature for each entity. While
many di↵erent methods are available, we opt for a list-wise learning-to-rank approach implemented in RankLib
[DBC13], optimizing with coordinate ascent to optimize mean-average precision (MAP), which has demonstrated
robust state-of-the-art performance on small feature spaces in the past.

3.5     Multi-Story Request
We identify the entities relevant to a multi-story request from the results of multiple stories connected to the
request. We generate the result of a multi-story request by combining the top 100 ranking entities for each story
connected and accumulating the relevance scores of the entities. We keep the top 100 entities from the merged
entities.

4     Evaluation
We are evaluating the three claims in Section 3 on a large-scale benchmark from TREC Complex Answer Retrieval
(CAR) [DG18].

4.1     Benchmark and Evaluation Paradigm
The evaluation is based on the TREC Complex Answer Retrieval “Y1 test” benchmark for evaluation and “Y1
train” for training the combination with learning-to-rank. The benchmark includes a knowledge base that is
derived from a Wikipedia dump that preserves meta-information about names (title, disambiguation, and anchor
text) and the first paragraph as well as the entity ids of inlinks and outlinks. The benchmark also includes a


                                                              45
                               Table 1: Results on selecting entities for 1952 stories.

           Features                         MAP               Rprecision              F1           In L2R
           Co-occ Relevance           0.1485 ± 0.0052       0.1427 ± 0.0053    0.0689 ± 0.0017        X
           Co-occ Count               0.0955 ± 0.0036       0.1033 ± 0.0039    0.0623 ± 0.0016        X
           Mention-Freq               0.1111 ± 0.0035       0.1120 ± 0.0038    0.0723 ± 0.0016        X
           Outlinks                   0.0740 ± 0.0027       0.0731 ± 0.0029    0.0621 ± 0.0014
           Inlinks                    0.0724 ± 0.0025       0.0733 ± 0.0029    0.0652 ± 0.0014
           Bidi                       0.0407 ± 0.0015       0.0341 ± 0.0018    0.0608 ± 0.0014
           Undirected                 0.0788 ± 0.0027       0.0811 ± 0.0030    0.0640 ± 0.0014        X
           Biblio Count               0.0712 ± 0.0023       0.0711 ± 0.0028    0.0655 ± 0.0015
           Co-coupling Count          0.0389 ± 0.0015       0.0365 ± 0.0019    0.0587 ± 0.0013
           Biblio Relevance           0.0501 ± 0.0020       0.0497 ± 0.0024    0.0526 ± 0.0012
           Co-coupling Relevance      0.0766 ± 0.0028       0.0777 ± 0.0031    0.0624 ± 0.0014
           KG-Entity                  0.0344 ± 0.0028       0.0355 ± 0.0028    0.0112 ± 0.0004
           L2R                        0.1558 ± 0.0053       0.1461 ± 0.0054    0.0707 ± 0.0015


background corpus, which is derived from passages on Wikipedia. To ensure reproducibility, we use hyperlinks
provided with the text instead of an entity linker.
   The TREC CAR benchmark includes queries that are derived from the Wikipedia article outlines. Articles
for queries are removed from the knowledge base. The benchmark includes section-level queries formed by con-
catenating the article title and heading text and article-level queries that consist of only the article title. In line
with the task statement in Section 1, a story request is a section-level query and the article-level query forms the
common theme of the multiple stories. For example, the query “Radiocarbon dating/Use in archaeology/Notable
applications/Pleistocene/Holocene boundary in Two Creeks Fossil Forest” forms one of the stories that is asso-
ciated to a multi-story request “Radiocarbon dating”. The benchmark includes automatic assessments for 1952
stories that are associated to 132 multi-story requests. The validity of the automatic assessments was confirmed
by manual assessors [DD20].
   We evaluate the quality of the resulting entity set using the F1 measure. The quality of the relative relevance
scores is evaluated as a ranking via the measures mean-average Precision (MAP) and Rprecision. With R being
the number of true entities, Rprecision measures the set precision among the first R entities of a predicted
ranking. In contrast, MAP averages the precision of ranks at which relevant entities are found, awarding partial
credit for low-ranking entities.

4.2   Story Results
For each story, we evaluate the quality of the retrieved entities through results presented in Table 1. We find that
the relevance-based co-occurrence approach is the most e↵ective approach for entity ranking. In fact, the best
three approaches are all based on a passage ranking (supporting Claim 2). This improvement can be noticed in
the quality of the selected set, where relevance-weighted co-occurrence obtains a significantly higher F1. Overall
the F1 scores are low because we choose to select the top 100 entities, but for most queries, fewer than 100
entities are relevant. Rprecision reflects the quality when selecting the right number of entities.
   In Section 1 we claim that keyword matching is not sufficient for identifying a set of relevant entities. To
support our claim, we evaluate the ranking of entities via their knowledge base entries as described in Section
3 (KG-Entity). However, this is one of the worst-performing approaches. This also supports our Claim 1 that
the relevance-based entity mentions count approach is more e↵ective in giving relevant entities than searching
for entity entries in the knowledge base.
   For Claim 3, we study whether using a retrieval over knowledge base entries as a relevance-weight in bib-
liographic and co-coupling patterns, but obtain mixed results: While weighting shared inlinks by relevance
(co-coupling relevance), it decreases the performance when weighting shared outlinks (biblio relevance).
   To analyze how multiple approaches work in combination, we use learning-to-rank using the four best ap-
proaches: three context-based approaches and undirected links in the knowledge base. In this combination, we
obtain a small, but not a significant improvement. The combination of all features performs worse, as some


                                                       46
                       Table 2: Results on selecting entities for 132 multi-story requests.

           Features                        MAP               Rprecision            F1           In L2R
           Co-occ Relevance          0.2140 ± 0.0111       0.3092 ± 0.0106   0.3106 ± 0.0096       X
           Co-occ Count              0.1669 ± 0.0085       0.2636 ± 0.0095   0.2714 ± 0.0087       X
           Mention-Freq              0.1971 ± 0.0085       0.3032 ± 0.0087   0.3189 ± 0.0086       X
           Outlinks                  0.1509 ± 0.0071       0.2568 ± 0.0078   0.2722 ± 0.0078
           Inlinks                   0.1566 ± 0.0065       0.2630 ± 0.0078   0.2839 ± 0.0079
           Bidi                      0.0538 ± 0.0043       0.1357 ± 0.0072   0.1665 ± 0.0077
           Undirected                0.1640 ± 0.0071       0.2707 ± 0.0078   0.2856 ± 0.0081       X
           Biblio Count              0.1469 ± 0.0068       0.2527 ± 0.0081   0.2700 ± 0.0079
           Co-coupling Count         0.0812 ± 0.0052       0.1844 ± 0.0075   0.2129 ± 0.0077
           Biblio Relevance          0.1099 ± 0.0064       0.2135 ± 0.0075   0.2320 ± 0.0080
           Co-coupling Relevance     0.1339 ± 0.0068       0.2320 ± 0.0083   0.2448 ± 0.0086
           KG-Entity                 0.0172 ± 0.0020       0.0519 ± 0.0037   0.0499 ± 0.0031
           L2R                       0.2534 ± 0.0116       0.3552 ± 0.0104   0.3175 ± 0.0092


features work extremely well for some cases, but not consistently.
   The relevance-based co-occurrence approach also performs better in terms of MAP and Rprecision (with an
exception of co-coupling relevance).

4.3   Multi-story Request Results
We interpret a multi-story request as a list of stories connected to it. In the context of the TREC CAR benchmark,
all queries that are derived from the same article form one multi-story request. We evaluate the performance
obtained by each approach when using multiple stories, as described in Section 3. The results are given in Table
2.
   To demonstrate our Claim 1, we include one baseline: Using each story to retrieve from knowledge base entries
(KG-Entity), then combine the results as in Section 3.5. With a MAP of 0.0172 and F1 of 0.05, the performance
of the baseline is dramatically lower than any of the alternative approaches in our study—supporting Claim 1.
   We see that approaches that consider the relevance of passage contexts are still superior (supporting Claim
2). However, while in the story results, co-occurrence relevance improved over undirected links by 80% in terms
of MAP, the advantage is only 30% for multi-story requests.
   In general, we see the same pattern for multi-story requests as in the story analysis: relevance-based weighting
of shared nodes improves the performance of co-coupling but not bibliographic coupling. We suspect that one of
the reasons why no convincing improvement is seen with relevance-weighted bibliographic coupling is that the
quality of the underlying knowledge base ranking (KG-Entity) is inferior.

5     Conclusion
We study the problem of selecting a set of relevant entities for a story construction. We find that links in the
knowledge base are not e↵ective in selecting relevant entities, possibly because the majority of links are not
relevant for the topic at hand. In contrast, using entity co-occurrences in retrieved text passages—especially
when the relative relevance of passages is incorporated as link strength—is between 80% and 30% more e↵ective
than the best link-based approach. Even combining link-based approach with relevance co-occurrence gives no
significant improvement in the result.
   While incorporating the relevance of shared entities can improve coupling approaches, this requires a high-
quality ranking of entities. We found that one of the most commonly used approaches, namely retrieving entities
through names and abstract of their knowledge base entry, is an ine↵ective approach for a story construction.

References
[BBDR10]    Krisztian Balog, Marc Bron, and Maarten De Rijke. Category-based query modeling for entity
            search. In European Conference on Information Retrieval, pages 319–331. Springer, 2010.


                                                      47
[BCFL13]    Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase
            from question-answer pairs. In In Proceedings of the Conference on Empirical Methods in Natural
            Language Processing, 2013.
[BFC+ 14]   Christopher Boston, Hui Fang, Sandra Carberry, Hao Wu, and Xitong Liu. Wikimantic: Toward
            e↵ective disambiguation and expansion of queries. Data & Knowledge Engineering, 90:22–37, 2014.
[CFWB17] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer open-
         domain questions. In Proceedings of the 55rd Annual Meeting of the Association for Computational
         Linguistics (ACL), 2017.
[DBC13]     Van Dang, Michael Bendersky, and W Bruce Croft. Two-stage learning to rank for information
            retrieval. In ECIR, pages 423–434. Springer, 2013.
[DD20]      Laura Dietz and Je↵ Dalton. Humans optional? automatic large-scale test collections for entity,
            passage, and entity-passage retrieval. Datenbank-Spektrum, 2020.
[DG18]      Laura Dietz and Ben Gamari. Trec car 2.0: A data set for complex answer retrieval, 2018.
[Die19]     Laura Dietz. Ent rank: Retrieving entities for topical information needs through entity-neighbor-
            text relations. In Proceedings of the 42nd International ACM SIGIR Conference on Research and
            Development in Information Retrieval, pages 215–224, 2019.
[GB17]      Darı́o Garigliotti and Krisztian Balog. On type-aware entity retrieval. In Proceedings of the ACM
            SIGIR International Conference on Theory of Information Retrieval, pages 27–34. ACM, 2017.
[KZ12]      Alexander Kotov and ChengXiang Zhai. Tapping into knowledge base for concept feedback: lever-
            aging conceptnet to improve search results for difficult queries. In Proceedings of the fifth ACM
            international conference on Web search and data mining, pages 403–412. ACM, 2012.
[PCB+ 09]   Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, and Vishnu Vyas. Web-
            scale distributional similarity and entity set expansion. In Proceedings of the 2009 Conference on
            Empirical Methods in Natural Language Processing: Volume 2-Volume 2, pages 938–947. Association
            for Computational Linguistics, 2009.
[PMZ10]     Je↵rey Pound, Peter Mika, and Hugo Zaragoza. Ad-hoc object retrieval in the web of data. In
            Proceedings of the 19th International Conference on World Wide Web, WWW ’10, page 771–780,
            New York, NY, USA, 2010. Association for Computing Machinery.
[RCK12]     Hadas Raviv, David Carmel, and Oren Kurland. A ranking framework for entity oriented search
            using markov random fields. In Proceedings of the 1st Joint International Workshop on Entity-
            Oriented and Semantic Search, page 1. ACM, 2012.
[RPLVD19] Pushpendre Rastogi, Adam Poliak, Vince Lyzinski, and Benjamin Van Durme. Neural variational
          entity set expansion for automatically populated knowledge graphs. Information Retrieval Journal,
          22(3-4):232–255, 2019.
[SA16]      Denis Savenkov and Eugene Agichtein. When a knowledge base is not enough: Question answering
            over knowledge bases with external text data. In Proceedings of the 39th International ACM SIGIR
            conference on Research and Development in Information Retrieval, pages 235–244, 2016.
[SDPP15]    Michael Schuhmacher, Laura Dietz, and Simone Paolo Ponzetto. Ranking entities for web queries
            through text and knowledge. In Proceedings of the 24th ACM international on conference on infor-
            mation and knowledge management, pages 1461–1470. ACM, 2015.
[ZHW+ 14] Lei Zou, Ruizhe Huang, Haixun Wang, Je↵rey Xu Yu, Wenqiang He, and Dongyan Zhao. Natural
          language question answering over rdf: a graph data driven approach. In Proceedings of the 2014
          ACM SIGMOD international conference on Management of data, pages 313–324. ACM, 2014.
[ZKN15]     Nikita Zhiltsov, Alexander Kotov, and Fedor Nikolaev. Fielded sequential dependence model for
            ad-hoc entity retrieval in the web of data. In Proceedings of the 38th International ACM SIGIR
            Conference on Research and Development in Information Retrieval, pages 253–262. ACM, 2015.


                                                   48