<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Background Linking: Joining Entity Linking with Learning to Rank Models</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The recent years have been characterized by a strong democratization of news production on the web. In this scenario it is rare to find self-contained news articles that provide useful background and context information. The problem of finding information providing context to news articles has been tackled by the Background Linking task of the TREC News Track. In this paper, we propose a system to address the background linking task. Our system relies on LambdaMART learning to rank algorithm trained on classic textual features and on entity-based features. The idea is that the entities extracted from the documents as well as their relationships provide valuable context to the documents. We analyzed how this idea can be used to improve the effectiveness of (re-)ranking methods for the background linking task.</p>
      </abstract>
      <kwd-group>
        <kwd>Entity Linking</kwd>
        <kwd>Graph of Entities</kwd>
        <kwd>Learning to Rank</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        According to Pew Research studies carried out in 2018, the 93% of American
adults consume at least some of their news online, via social media
recommendations, web browsing or advertising recommendations [
        <xref ref-type="bibr" rid="ref14 ref22">14,22</xref>
        ]. The adoption of
digital strategies marked the end of the publisher-driven news delivery, shifting
the focus faraway from the publisher towards the story. In this scenario, the
user is allowed to consume news on the web and publish his ones. This had
a substantial impact on news production. Indeed, it is more and more
challenging to find self-contained news articles and provide context and background
information about the story told. The National Institute of Standards and
Technology (NIST) recognized to Information Retrieval (IR) and Natural Language
Processing (NLP) a primary role in the solution of this problem and, in
cooperation with the Washington Post, launched the first edition of the News Track in
TREC 2018 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This track is organized into two subtasks, Background Linking
and Entity Ranking : their goal is to provide the user with different means to
Copyright 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published
and copyrighted by its editors. IRCDL 2021, February 18-19, 2021, Padua, Italy.
understand news articles. The former relies on the construction of a ranked list
of articles to provide background information. The latter exploits, as a means of
contextualization, a ranking of named entities mentioned in the article the user
is reading.
      </p>
      <p>In this paper, we present an IR system based on Learning to Rank methods
to improve the ranking of documents for the Background Linking task context.</p>
      <p>
        In the literature, several solutions were proposed to address this task. Most of
them treat background linking as an ad hoc search task and rely on approaches
based on keyword extraction [
        <xref ref-type="bibr" rid="ref15 ref2 ref26">2,15,26</xref>
        ]. Other methods instead, leverage on
entities to identify documents’ topics [
        <xref ref-type="bibr" rid="ref16 ref17">16,17</xref>
        ]. In the 2019 edition of the TREC
Background Linking task instead, many participants exploited machine learning
methods [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], with a particular focus on learning to rank approaches [
        <xref ref-type="bibr" rid="ref20 ref5">5,20</xref>
        ].
      </p>
      <p>
        We propose a retrieval system that relies on LambdaMART learning to rank
algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the classic BM25 model [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] to rank background articles. In
particular, we focus on the creation of feature vectors to be fed to learning to
rank methods such as LambdaMART; indeed, we combine features extracted
from two different representations of the same document: the unstructured
textual representation and a graph-based one. We consider a graph of entities
extracted from the textual documents and then linked to Wikipedia articles. In
this paper, we analyze advantages and limits of entity-based features in learning
to rank approaches and we discuss how they can be employed to improve the
final document ranking for the background linking task.
      </p>
      <p>The rest of the article is organized as follows: in Section 2 we provide some
background and related work, in Section 3 we describe the key components of
the proposed solution and in Section 4 we present a use case. Section 5 describes
the experimental setup and Section 6 reports about the evaluation results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        The TREC News Track aims to study how to provide contextual information
to users while reading news articles. To this end, two tasks are defined: Entity
Ranking and Background Linking. Background Linking Task concerns the
development of systems able to help users contextualize news articles as they are
reading them. Formally, given a source article (i.e., the query), the system should
retrieve a list of articles providing relevant background and context information
related to the source [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>The reference collection for this task is the Washington Post Corpus ver.2.1
This collection contains about 590,000 news articles and blog posts published
from 2012 to 2017 by the Washington Post2. Each document is characterized by
a list of fields such as id, author, article URL, date of publication, title. The main
content is organized in one or more paragraphs; they may include HTML tags,
images and videos. Fifty topics have been provided for this task; each of them is
identified by a number and by the id and the URL to the topic’s source document</p>
      <sec id="sec-2-1">
        <title>1 https://trec.nist.gov/data/wapost/ 2 https://www.washingtonpost.com/</title>
        <p>Textual
documents
Bag of
Entities</p>
        <p>Initial
Graph</p>
        <p>Pruned
Graph</p>
        <p>Features
Vectors</p>
        <p>Re-rankings</p>
        <p>of test
documents</p>
        <p>Final
Run
(or query ). Graded relevance judgments are ranging from 0 (not relevant) to 4
(recommended article).</p>
        <p>
          In the proposed solution, we leverage document graph representation based
on the entities and their relationships that we automatically extract from the
documents. In the literature, several approaches take advantage of entity-oriented
representations. Xiong et al. in [
          <xref ref-type="bibr" rid="ref24 ref25">24,25</xref>
          ] propose a Bag-of-Entities (BoE) model
where each document is represented as a bag-of-entities constructed via entity
linking. The ranking of documents is generated, considering the overlapping
entities between each document and the query. A similar approach is [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] that
describes a learning to rank approach where the training is based both on
Bagof-Words (BoW) and BoE features. The experimental evaluation showed that
the combination of features depending on words and entities improve a BM25
baseline. What makes our retrieval system different from the solutions proposed
above, is that we take the separated entities belonging to the BoE constructed
via entity linking and we create a graph; this graph is more informative than
the classic BoE representation because it includes both the information related
to the separated entities and the information carried by their relationships.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed Solution</title>
      <p>The main phases composing the proposed solution are reported in Figure 1. In
the following, we describe every single phase.</p>
      <p>
        Entity Linking. In this phase, we perform entity recognition on the textual
documents to extract a set of mentions to be linked with entities in a knowledge
repository (KR) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] – i.e., Wikipedia3 in our case. This process can be called
entity annotation. This phase’s output is a bag-of-entities for each document.
Graph Creation. This phase takes the bag-of-entities as input and creates an
undirected weighted graph, where the nodes are the entities and the edges are
based on the semantic relatedness between the nodes. The semantic relatedness
is a measure defined in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], which exploits the Wikipedia structure to find the
3 https://www.wikipedia.org/
relatedness between two entities. Two entities are semantically related if they
share a high number of entities linking to them [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In our implementation,
the entity pairs are connected by an edge whose weight is the numerical value
of the semantic relatedness. The output of this phase is a graph with one or
more connected components. There could be a strong imbalance between the
different components; in fact, one may include the largest part of the nodes,
while another contains very few entities. It is common to identify in the largest
connected component one or more communities – or groups of nodes highly
connected within themselves and poorly with the other groups [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Graph Pruning. In this phase we remove the meaningless entities from the graph.
Pruning is based on components removal, which keeps only the largest connected
component of the graph and community detection that detects the largest
community, where the most representative entities usually are. To this end, we rely
on the Girvan-Neman algorithm for community detection [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] based on the “edge
betweenness”, a generalized version of the classic betweenness centrality measure
defined in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The edge betweenness of an edge corresponds to the number of
shortest paths between every pair of vertices that run along it [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Removing
some nodes from a graph may cause a loss of information, especially if two large
connected components coexist in the same graph (in this case the separation
between different components shows that the article discusses two weakly
correlated subjects). Since this case is extremely rare, the advantages brought by the
pruning phase predominate over the possible loss of information.
Features Extraction. This phase is about the definition and extraction of the
features representing the documents; there are document-based and query-based
features. The two feature sets comprise both textual and entity-based graph
features. Since the overall goal of the implementation consists in studying the
impact of the graphs of entities in the learning process, the core of this phase
lays in the extraction of the entity-based graph features. The query graph-based
features are the most informative about the relevance of a document because
they reflect the similarities between the document graph and the query one.
This is why the largest part of the extracted features belong to this type. An
example of query graph-based features is the semantic relatedness [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] between
the most central node of each graph, where the centrality measure considered
is the betweenness centrality [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. High values highlight the correlation between
the document graph and the query one. The features computed independently
of the query graph, instead, are topological properties of the document graph.
This type of features is less informative than the previous one because it does
not highlight any relation of the document with the query; however properties
such as the node connectivity and the node’s degree revealed their usefulness in
relevance identification. Only few features belong to the textual-based type, and
most of them depend on the query article. Some examples of query-based textual
features are the BM25 score and the Term Frequency (TF), while the document’s
length and the number of paragraphs are document-based textual features. In
Table 1 it is presented an overview of the number of features extracted from
each document’s representation. For each document it is finally created a feature
vector. These vectors are required to perform learning to rank tasks.
Learning to Rank. We employ LambdaMART because it is one of the best
performing learning to rank methods [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in the literature. LambdaMART is a
list-wise approach based on Multiple Additive Regression Trees (MART) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
LambdaMART is trained to automatically construct a set of ranking models.
The ranking models trained on different sets of hyper-parameters are tested on
the test set containing a new list of queries; the documents in the test set are
ranked on the base of the new scores assigned by the models. The output of this
phase is the set of final runs, one for each model.
      </p>
      <p>
        Fusion. In this phase we experiment the fusion of multiple system’s runs to
analyze whether an increase in the number of merged runs corresponds to an
effectiveness improvement. Each run provided by the learning to rank phase
contains the same set of documents ranked differently. In order to create the
fusion runs we employ combSUM method [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]: the new final score of a document
corresponds to the sum of the scores that document obtained in each individual
run. The documents are finally re-ranked according to their new scores.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Use case</title>
      <p>To better illustrate how a graph of entities is created starting from a textual
article, we employ a small portion of an article4 about tropical storms.
Entity Linking. The mentions are detected and linked to Wikipedia entities.
Below, we report the textual fragment where the mentions are marked in boldface.</p>
      <p>Super Typhoon Vongfong explodes becomes most intense storm on Earth
in 2014. Super Typhoon Vongfong has rapidly intensified over the past 24
hours from the equivalent of a category two hurricane to a monster
typhoon [...]
4 Article available at:
https://www.washingtonpost.com/news/capital-weathergang/wp/2014/10/07/super-typhoon-vongfong-explodes-becomes-most-intensestorm-on-earth-in-2014/</p>
      <p>Storm</p>
      <p>Tropical
cyclone scales
0.483
0.514</p>
      <p>In Table 2 we report the mentions and their linked entities. Giving a first look to
the mention-entity associations, the largest part of entities are coherent with the
tropical storms except for past tense and 24-hour clock which are very general
or out of context.
Graph Creation. The extracted entities are the nodes of the graph, while the
edges are based on the semantic relatedness between the nodes. In Figure 2 we
show the resulting graph. The two connected components are highly unbalanced
because a component includes the largest part of entities while the second has
only two entities. In the largest connected component lie the most meaningful
entities, while the smallest one contains the two out of context entities, past
tense and 24-hour clock.</p>
      <p>Graph Pruning. Starting from the graph in Figure 2, we firstly remove the
smallest connected component (right part in red). Then, we run the
GirvanNewman algorithm that removes the Earth and Explosion entities. These entities
are discarded because they have degree equal to one and they do not belong to
the largest community. Nonetheless, if we consider these entities in the context
of tropical storms, they do not bring any contribution for context identification,
hence their removal increases the coherence of the graph with respect the topic.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Setup</title>
      <p>In this section, we describe the experimental setup. In particular, we discuss
some technical details, and we illustrate the tools used to implement each phase
of the proposed solution.</p>
      <p>
        Preprocessing. We preprocessed the Washington Post Corpus to remove from
each article all the fields without informative content. Then we indexed the
collection with Apache Solr5, an open-source search library based on Apache
Lucene set with the default English tokenizer, the English stop-words list and
no stemmer. Once indexed the collection, we used the default BM25 in Solr to
construct an initial ranking of 100 documents per topic: our baseline.
Entity Linking and Graph creation. Since it was unfeasible to analyze the entire
collection of documents in order to choose the most effective settings’ apporach,
we relied on a sample of thirty documents, both relevant and not relevant, and
we considered the setting the most effective on this sample. The linking system
we adopted is TagMe [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], in particular, we performed article annotation via the
TagMe RESTful API, setting a confidence score ρ = 0.1. This parameter is a
threshold imposed to discard non-meaningful entities [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]: the lower this
parameter is, the more entities are recognized. For each article, we linked at most
thirty entities leading to graphs with at most thirty nodes. To create graphs as
consistent as possible with the original textual article, we set a threshold equal
to 0.4 on the value of semantic relatedness. Specifically, if two entities have a
semantic relatedness higher (or equivalent) than 0.4, an edge between them is
created. This threshold allowed us to obtain graphs subdivided into connected
components: such a structure highlights the distinction between meaningful and
non-meaningful entities. Pruning is performed only if the graph contains at least
ten nodes: if it does not, pruning will lead to a graph unable to represent the
original document correctly. To perform graph creation and pruning, we relied
on NetworkX Python library [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Features Extraction. To extract the set of textual-based features from each
document, we exploited Scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], a Python library that provides the
implementation of measures like the TF and the Inverse Document Frequency (IDF).
For the extraction of the entity-based graph features, both document-based and
query-based, we exploited the NetworkX Python library [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. We created for
each document a feature vector containing the following elements: the relevance
      </p>
      <sec id="sec-5-1">
        <title>5 https://lucene.apache.org/solr/</title>
        <p>judgment of the document, the id of the query, the features extracted and the
id of the document.</p>
        <p>Learning to Rank. We relied on the implementation of LambdaMART provided
by RankLib6, a library of learning to rank algorithms developed by the Lemur
Project. We performed manual tuning, which allowed us to understand the best
combinations of hyperparameters and analyze how each parameter interferes
with the others. Table 3 summarizes the main features related to the training
and test sets involved in our implementation. The test set comprises the
vector representation of the documents belonging to the baseline. The training set,
instead, combines two sets of topics belonging to different TREC tracks. We
remark that even if both the training and test sets contain documents belonging
to the Washington Post Corpus, they rely on disjointed sets of topics.
Performing training depending on two different collections of documents allowed us to
enrich the training set but, at the same time, it made our system suffer from
the limitations induced by the transfer learning. In particular, what influenced
our system the most was the lack of consistency between the relevance grades
provided for the two tracks; indeed, we had to map the five possible relevance
grades offered for the Background Linking task to the three grades provided for
the Common Core Track of TREC 2017. The training set was finally split to
derive a validation set. We trained the algorithm by choosing more than 200
different combinations of hyperparameters, and we selected the seventy models
which maximized the nDCG@5 effectiveness on the validation set, and, at the
same time, which were far from overfitting or underfitting conditions. In Table
4, the values of the five most effective models’ hyperparameters are described.
We tested the seventy ranking models on the test set, and we obtained seventy
final runs: each one contains a different re-ranking of the baseline documents.
Fusion. To perform fusion we considered the ten most effective models and the
related final runs. We fused k (with k = {2, 3, 5, 7, 10}) runs, and, for each k, we
collected !10" runs – i.e. all the possible combinations of k distinct runs taken
k
from a set of ten runs.</p>
      </sec>
      <sec id="sec-5-2">
        <title>6 https://sourceforge.net/p/lemur/wiki/RankLib/</title>
        <p>
          Evaluation and metrics. Coherently with the guidelines of the task [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], we
considered nDCG@5 as primary measure of effectiveness and we tested the system’s
performances also in terms of nDCG@1, nDCG@10, nDCG@100, Precision at
Cut-off 1 (P@1) and reciprocal rank (recip rank). We analyzed two types of runs:
(1) the individual run, intended as the re-ranking produced by each one of the
seventy LambdaMART’s models, and (2) the fused run, intended as the result of
the fusion of k individual runs belonging to the set of ten runs dedicated to the
fusion approach. We used trec eval7 tool to evaluate the set of seventy individual
runs and the five sets of fused runs (one set of fused runs for each k); for each
set we selected the run with the highest nDCG@5, obtaining in total six selected
runs.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Evaluation and results</title>
      <p>In this section we describe the results obtained by our system. We assess the
retrieval effectiveness from both a quantitative and a qualitative viewpoint.
6.1</p>
      <sec id="sec-6-1">
        <title>Quantitative evaluation</title>
        <p>The quantitative evaluation has the intent to describe how our system performs
on average. To this end, in Table 5 and Table 6 we illustrate the average results
of the six selected runs and the baseline. We indicated the baseline as BM25, the
best individual run as best run and the five best runs obtained with the fusion
approach as fus followed by the number of runs fused. The top scores of each
evaluation measure are highlighted in boldface.</p>
        <p>In Table 5 we describe the performances of our system in terms of reciprocal
rank, P@1 and nDCG@1: these metrics evaluate the effectiveness of one
document per topic in a run. Our goal is to evaluate if our system can correctly
recognize a relevant document and place it in the ranking’s highest positions. In
general, all the generated runs show effectiveness slightly higher than the
baseline both for reciprocal rank and nDCG@1. In P@1 instead, three runs equalled</p>
        <sec id="sec-6-1-1">
          <title>7 https://trec.nist.gov/trec eval/</title>
          <p>the baseline (best run, fus2 and fus10 ) and three runs reported greater values
(fus3, fus5 and fus7 ). The most effective run is fus5, obtained fusing five
individual runs. It maximizes both reciprocal rank and P@1 evaluation measures.
best run and fus7 instead, maximize the nDCG@1. Observing the reported
results, we can notice that increasing the number of runs fused does not necessarily
correspond to an effectiveness improvement. The effectiveness of fus5 and fus7
runs in fact, is usually higher than the effectiveness of fus2, fus3 and fus10 : this
indicates that the fusion can improve the performances until a certain number of
runs fused, after that, the fusion becomes disadvantageous and the effectiveness
decreases. This is proved by fus10 which is the least effective run among the five
fused runs. This behavior is verified in all the measures proposed in Table 5.</p>
          <p>Table 6 reports the effectiveness of the proposed runs at different ranking
depths. This analysis considers the nDCG evaluation measure at different
cutoffs: 5, 10 and 100 (in this case, it is evaluated the entire ranking associated
with each topic). In this analysis, we study our system’s ability to produce
effective rankings. In particular, we are interested in how the effectiveness varies,
increasing the cut-off. The results reported in terms of nDCG@5 highlight that
the fus2, fus3, fus5 and fus7 runs have performances slightly above those of
the baseline, while the best run approximates it very well. The least effective
run in nDCG@5 is the fus10 which achieves the lowest result. If we consider
more than five documents in the ranking, the BM25 model always reports the
Fig. 3: This histogram represents the nDCG@5 effectiveness of the best run
compared to the nDCG@5 of the baseline. Each column represents a topic and the
column’s height is found computing the difference between the nDCG@5 of the
best run and the nDCG@5 of the baseline for that topic.
highest effectiveness. The highest results achieved by our system for nDCG@10
and nDCG@100 are those of fus5, while the lowest ones are attributed to the
best run and fus10. As it has been already noticed in Table 5, this revealed
that the fusion approach improves the system’s effectiveness, but it becomes
disadvantageous if more than five runs are fused.</p>
          <p>Discussion First, our hypothesis that fusing more runs leads to an effectiveness
improvement does not seem to be supported. Both in Table 5 and Table 6 the
fusion of all the available runs is the least effective run among the six proposed.
We also see that there is no correlation between the number of runs fused and
the effectiveness improvement. Despite this fact, the fusion approach has been
revealed to be useful in our implementation, since among the selected runs, the
fused ones achieve the highest results. Moreover, our system and the BM25 model
show two opposite behaviors: the more documents we consider in the ranking,
the more our system’s effectiveness decreases, and vice versa holds for the BM25
model. This contrast between our system and BM25 is primarily caused by
the transfer learning. In fact, the effectiveness decrease detected after only five
documents is strictly related to the inability of our system to distinguish among
more than three relevance scores.
6.2</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Qualitative evaluation</title>
        <p>The qualitative evaluation describes the effectiveness of the particular topics
belonging to a run. The histogram shown in Figure 3 describes the difference
between the nDCG@5 score of the best run for a given topic and that one of
the baseline. In particular, all the columns above the abscissa identify the topics
where our system prevails over the baseline. Vice versa holds for the columns
below the abscissa. It is possible to notice an equivalent number of topics laying
above and below the abscissa; this means that the effectiveness improvement
brought by the topics where our system prevails over the baseline is perfectly
balanced by the effectiveness decrease caused by the topics where BM25
prevails over our system. This is the reason why, in the quantitative analysis, the
nDCG@5 of the best run (0.4090) approximated the one of the baseline (0.4097).
We conducted an analysis focused on the rankings associated with the best and
the worst performing topics of the best run – this study aimed at finding a
correlation between the documents’ feature vectors and our system’s
effectiveness. The most compelling topics report in the first positions of their rankings
documents whose relevance is identifiable both in the textual-based and in the
entity-based graph features. This proves that relying on features that leverage
the classic textual representation and the graph of entities can lead to an
effectiveness improvement. Analyzing the rankings associated with the topics where
the BM25 prevails, we identified three main reasons for our system’s poor
effectiveness in some topics. These are:
– The transfer learning: it prevented some topics from reaching a high level of
effectiveness.
– The graphs of entities: the documents may not present well-formed graphs;
in this case, the learner attributes a score on the basis of the textual features
only. This happens when the original article has not enough textual content
to construct a consistent graph.
– Errors in learning phase: this condition depends on the features in the
vectors; we experimented that the value of a single feature may influence the
entire ranking of a topic.</p>
        <p>These conditions are unavoidable, and they represent the cases in which it is
convenient to use the BM25 model.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>This article presented a solution for the background linking task relying on
LambdaMART to obtain a list of background articles. We leveraged the
document’s textual and graph representations to extract a set of features used to
perform training. Our goal was to study whether the combination of features
belonging to different document representations can improve the system’s
effectiveness; in particular, we were interested in exploring the impact related to
the graphs of entities-based features. The analysis conducted on the single
topics highlighted that there is a set of topics where our system outperformed the
BM25 model. In these cases, the graphs of entities played a crucial role because
the combination of textual-based features and graph-based ones allowed for an
effectiveness improvement. This implied that our initial hypothesis is confirmed
for this first set of topics. However, there was an equivalent number of topics
where our system was not highly effective. Transfer learning has a high impact
on negative performances. The balance between the effective and ineffective
topics explained the similarity between the average nDCG@5 values obtained by
our system and by the baseline. We finally introduced the fusion approach as
a means to improve the overall effectiveness of our system. The results showed
that, contrary to our initial belief, fusing too many runs makes the performances
decrease; in particular, the optimal number of runs to merge, in the tested
context, is five.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Entity-oriented search</article-title>
          . Springer Nature (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bimantara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Engelhardt</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerwert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottschalk</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukosz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaft</surname>
            ,
            <given-names>N.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berberich</surname>
          </string-name>
          , K.:
          <article-title>htw saar@ trec 2018 news track</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>From ranknet to lambdarank to lambdamart: An overview</article-title>
          .
          <source>Learning</source>
          <volume>11</volume>
          (
          <fpage>23</fpage>
          -
          <lpage>581</lpage>
          ),
          <volume>81</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Despalatovi´c, L., Vojkovi´c, T.,
          <string-name>
            <surname>Vukicevic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Community structure in networks: Girvan-newman algorithm improvement</article-title>
          .
          <source>In: 2014 37th international convention on information and communication technology, electronics and microelectronics (MIPRO)</source>
          . pp.
          <fpage>997</fpage>
          -
          <lpage>1002</lpage>
          . IEEE (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lian</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Ictnet at trec 2019 news track</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ferragina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scaiella</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Fast and accurate annotation of short texts with wikipedia pages</article-title>
          .
          <source>IEEE software 29(1)</source>
          ,
          <fpage>70</fpage>
          -
          <lpage>75</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Foley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pena</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Smith at trec2019: Learning to rank background articles with poetry categories and keyphrase extraction</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Freeman</surname>
          </string-name>
          , L.C.
          <article-title>: A set of measures of centrality based on betweenness</article-title>
          . Sociometry pp.
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meulman</surname>
            ,
            <given-names>J.J.:</given-names>
          </string-name>
          <article-title>Multiple additive regression trees with application in epidemiology</article-title>
          .
          <source>Statistics in medicine 22(9)</source>
          ,
          <fpage>1365</fpage>
          -
          <lpage>1381</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Girvan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Community structure in social and biological networks</article-title>
          .
          <source>Proceedings of the national academy of sciences 99(12)</source>
          ,
          <fpage>7821</fpage>
          -
          <lpage>7826</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Gon¸calves, G., Magalh˜aes, J.,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Improving ad hoc retrieval with bag of entities</article-title>
          .
          <source>image 409(68.81)</source>
          ,
          <volume>116</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hagberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swart</surname>
            , P.,
            <given-names>S</given-names>
          </string-name>
          <string-name>
            <surname>Chult</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Exploring network structure, dynamics, and function using networkx</article-title>
          .
          <source>Tech. rep.</source>
          , Los Alamos National Lab.
          <source>(LANL)</source>
          , Los Alamos,
          <source>NM (United States)</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hasibi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bratsberg</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>On the reproducibility of the tagme entity linking system</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <fpage>436</fpage>
          -
          <lpage>449</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soboroff</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Trec 2018 news track</article-title>
          .
          <source>NewsIR@ ECIR</source>
          <year>2079</year>
          ,
          <volume>57</volume>
          -
          <fpage>59</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
          </string-name>
          , H.:
          <article-title>Paragraph as lead-finding background documents for news articles</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
          </string-name>
          , H.:
          <article-title>Leveraging entities in background document retrieval for news articles</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>MacFarlane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez-Lopez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Dminr at trec news track</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Montague</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aslam</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Relevance score normalization for metasearch</article-title>
          .
          <source>In: Proceedings of the tenth international conference on Information and knowledge management</source>
          . pp.
          <fpage>427</fpage>
          -
          <lpage>433</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Unc sils at trec 2019 news track</article-title>
          .
          <source>In: TREC</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>The probabilistic relevance framework: BM25 and beyond</article-title>
          . Now Publishers Inc (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Soboroff</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Trec 2018 news track overview</article-title>
          . In: TREC (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milne</surname>
            ,
            <given-names>D.N.:</given-names>
          </string-name>
          <article-title>An effective, low-cost measure of semantic relatedness obtained from wikipedia links (</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu, T.Y.:
          <article-title>Bag-of-entities representation for ranking</article-title>
          .
          <source>In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval</source>
          . pp.
          <fpage>181</fpage>
          -
          <lpage>184</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu, T.Y.:
          <article-title>Word-entity duet representations for document ranking</article-title>
          .
          <source>In: Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval</source>
          . pp.
          <fpage>763</fpage>
          -
          <lpage>772</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          : Anserini at trec 2018:
          <article-title>Centre, common core, and news tracks</article-title>
          .
          <source>In: Proceedings of the Twenty-Seventh Text REtrieval Conference (TREC</source>
          <year>2018</year>
          ), Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>