<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Domain-adapted named-entity linker using Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesca Frontini</string-name>
          <email>fFrancesca.Frontinig@ilc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carmen Brando</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Gabriel Ganascia</string-name>
          <email>Jean-Gabriel.Ganasciag@lip6.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Istituto di Linguistica Computazionale CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Labex OBVIL. LiP6, UPMC. CNRS</institution>
          ,
          <addr-line>4 place Jussieu, 75005, Paris</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present REDEN, a tool for graph-based Named Entity Linking that allows for the disambiguation of entities using domainspeci c Linked Data sources and di erent con gurations (e.g. context size). It takes TEI-annotated texts as input and outputs them enriched with external references (URIs). The possibility of customizing indexes built from various knowledge sources by de ning temporal and spatial extents makes REDEN particularly suited to handle domain-speci c corpora such as enriched digital editions in the Digital Humanities.</p>
      </abstract>
      <kwd-group>
        <kwd>named-entity disambiguation</kwd>
        <kwd>evaluation</kwd>
        <kwd>linked data</kwd>
        <kwd>digital humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Semantic annotation plays an important role in the construction of enriched
digital editions in the domain of Digital Humanities. The TEI annotation standard3
de nes speci c tags to mark-up key words in texts and to link them to
background knowledge by using external references. Thus, for instance, it is possible
to annotate topics, citations, places, organization and person names. References
can point to internal identi ers or to external resources that are available in
the form of Linked Data (LD) on the semantic Web. Such annotations can be
used to facilitate research on the enriched texts (by allowing the creation of
indexes and the performing of advanced queries). Moreover, they can also enhance
the utilization of texts by the nal user, since TEI can be easily converted into
other formats, notably Electronic Publication (EPUB). Physical supports such
as ebook readers could in future make use of the enriched information in the
form of external links.</p>
      <p>The results of the work described here make up part of an editorial pipeline
the aim of which is produce an enriched digital edition of a corpus of French
literary criticism and essays from the 19th century4. Such a corpus, once completed,
is intended to allow researchers to gain a more comprehensive perspective on</p>
      <sec id="sec-1-1">
        <title>3 http://www.tei-c.org/index.xml</title>
        <p>4 http://obvil.paris-sorbonne.fr/corpus/critique/
the evolution of ideas both in literature as well as in the sciences over the years.
While the production of high quality digital editions requires manual checking,
natural language processing techniques can speed up the process to a great
extent. In particular, the detection of person mentions is a very important step for
the enrichment of such a corpus, as it allows researchers to trace references to
authorities (authors, scientists, critics, etc.) over time and from di erent books.</p>
        <p>
          While existing Named Entity Recognition and Classi cation (NERC)
algorithms can help human beings to detect persons' mentions, previous
experience has shown that this step is not particularly di cult for trained annotators.
Conversely, disambiguation and linking of mentions to an identi er can often
be painstaking when done manually, as it requires searching external sources,
such as mainstream online encyclopedias (Wikipedia) but also domain-speci c
databases, such as library databases. Named Entity Linking (NEL) algorithms[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
can thus be of great use, as they automatically assign an external identi er to
already detected mentions, performing disambiguation and semantic annotation
at the same time. They rely on external sources of structured or unstructured
information to perform this task which is complex, because an entity is usually
mentioned in the text in ambiguous forms.
        </p>
        <p>While current NEL algorithms perform well on the contemporary news texts
domain adaptation is, as always in NLP, an issue. Nineteenth century essays
for instance often mention persons that were well known at the time but are
currently hard to identify even for specialists. Thus although existing NEL tools
are very e cient they mostly rely on generic background knowledge such as
Wikipedia or DBpedia that may not be su cient to grant ample coverage for all
domains.</p>
        <p>The solution we present here is a NEL tool, REDEN, that makes use of
state of the art graph-based algorithms and at the same time can easily leverage
knowledge from di erent sources (both generic and domain speci c). REDEN is
applied to already detected mentions, and relies natively on knowledge drawn
from linked data sources for disambiguation, requiring only an index of super cial
forms and URIs to be used as candidates. Such indexes can be built out of
di erent sources using crawlers.</p>
        <p>The purpose of this paper is to show that such a solution can be e ciently
used for texts in which a domain speci c knowledge base is necessary. We
evaluate REDEN on ad-hoc test-sets from our corpus of essays and compare the
results with those obtained with a freely available NEL tool. We rst present
the current state of the art in NEL, then brie y describe the tool and then focus
on the index building facilities; some experiments are presented to compare our
proposed solution to the state of the art, and we conclude with some remarks
and ideas for future developments.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>Supervised methods usually perform very well in NLP tasks but they rely on
pre-annotated corpora for training which are not available for speci c domains
such as French Literature and the Digital Humanities more generally. Therefore
we concentrate on existing non-supervised approaches for NEL. These can be
broadly divided into two main families. Those using text similarity along with
statistical models and those using graph-based methods.</p>
      <p>
        The best known tool from the rst group is DBpedia Spotlight (DBSL) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
which performs NER and DBpedia linking at the same time. It is based on
cosine similarities and a modi cation of TF-IDF weights. More speci cally,
recognition of possible annotations in a text is performed using indexes built from the
Wikipedia dumps and DBpedia by substring matching algorithms. DBSL
generates DBpedia candidates for each possible annotation in two possible ways,
using a language-dependent implementation (available for English and Dutch
and more NLP-adapted) or a language-independent implementation which is
more e cient but less precise. DBSL then selects the best candidates from the
set of annotations by computing scores that combine statistical methods, the
Wikipedia article links along with their anchor texts and textual context. This
or similar methods are known to be very e cient, but they can are strongly
dependent on the availability of textual descriptions for each entry.
      </p>
      <p>
        Graph-based approaches rely on formalised knowledge in the form of a graph
built from a knowledge base (e.g. the Wikipedia article network, Freebase,
DBpedia, etc.). Reasoning can be performed through graph analysis operations. It
is thereby possible to at least partially reproduce the actual decision process
with which humans disambiguate mentions. A reader may decide that the
mention \James" refers to philosopher \William James" and not to writer \Henry
James" because it occurs in the same context as \Hume" and \Kant". In the
same way such algorithms build a graph out of the available candidates for each
possible referent in a given context and use the relative position of each referent
within the graph to choose the correct referent for each mention. The graph is
built for a context (e.g. paragraph) containing possibly more than one mention,
so that the disambiguation of one mention is helped by that of the others. Hybrid
approaches such as [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] can use both graph-based algorithms, more speci cally
graph-search ones, and text similarity measures; they rely on a single LD source.
      </p>
      <p>
        The graph-based NEL is similar to graph-based Word Sense Disambiguation
(WSD)[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where a set of words in a given sentence needs to be labeled with the
appropriate sense label by using the information contained in a lexical database
such as WordNet. The key idea of this approach is that, for all ambiguous words
in the context, senses that belong to the same semantic eld should be selected,
and that in this way two ambiguous words can mutually disambiguate each
other. More speci cally, a subgraph is built, constituted only of the relevant links
between the possible senses of the di erent words, and then for each alternative
sense labeling, the most central is chosen. This procedure, when applied to such
context speci c subgraphs, ensures that in the end the chosen senses for each
word will be those that are better connected to each other.
      </p>
      <p>
        Centrality is an abstract concept, and it can be calculated by using di erent
algorithms5. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the experiment was carried out using the following algorithms:
      </p>
      <sec id="sec-2-1">
        <title>5 For a discussion of the notion of centrality see also [7]</title>
        <p>Indegree, Betweenness, Closeness, PageRank, as well as with a combination of
all these metrics using a voting system. Results showed the advantages of using
centrality with respect to other similarity measures.</p>
        <p>
          This graph-based approach has been applied to NEL, where mentions take
the place of words and Wikipedia articles that of WordNet synsets. Here too
centrality measures are performed on the Wikipedia structure in order to use the
large set of relations to disambiguate mentions. More speci cally in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] English
texts were disambiguated using a graph that relied only on English Wikipedia,
and was made up of the links and of the categories found in Wikipedia articles.
So for instance the edges of the graph represented whether ArticleA links to
ArticleB or whether ArticleA has CategoryC. Here too \local" centrality is used
to assign a correct link to an ambiguous mention. Such a WSD based tool for NEL
is proposed by the DBpedia Graph Project6. However, it is highly dependent on
DBpedia structure and only links to this broad-coverage data set and not to
other domain-speci c ones.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Our disambiguation approaches</title>
      <p>With REDEN, we propose a graph-based, centrality-based approach that is
particularly adapted for digital humanities; in fact it can exploit RDF sources
directly (thus allowing users to choose domain adapted knowledge bases) and
takes TEI-annotated texts as input. The disambiguation algorithm processes a
le where NE mentions are already annotated (e.g. using the tag &lt;persName&gt;).
Possible referents from all mentions in a given context and for a given class (e.g.
person) are retrieved from an index built on the y from selected LD sources. The
fusion of homologous individuals across di erent sources is performed thanks to
owl:sameAs or skos:exactMatch predicates; a graph is created where RDF
objects and subjects are vertices and RDF predicates represent edges. Irrelevant
edges are removed from the graph before calculating centrality: only edges which
involve at least two vertices representing candidate URIs are preserved. Thanks
to the selected centrality measure, the best connected candidates for each
mention are chosen as referents and an enriched version of the input TEI le is
produced.</p>
      <p>To illustrate how REDEN works, we apply our approach to a paragraph
from a French text of literary criticism entitled \Une these sur le symbolisme"
(1936) written by Albert Thibaudet (1874-1936)7. Figure 1 shows an excerpt
of the resulting graph where the chosen and correct mention candidates out of
the nine present in the text are marked in bold. We can observe that the
vertices yago:Symbolist Poets, dbpedia:Charles Baudelaire, dbpedia:French
Dramatists And Playwrights are the ones in uencing the centrality measure the most.
6 https://github.com/bernhardschaefer/dbpedia-graph but related published work is
not available at this time
7 Full text can be found here: http://obvil.paris-sorbonne.fr/corpus/critique/
thibaudet re exions.html</p>
      <p>
        We developed our implementation in Java8. In [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ], we described the steps of
the algorithm in detail. In the following section, we detail the software component
for building on-the- y a domain-adapted index of possible mention candidates.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Domain-adapted crawler of Linked Data</title>
      <p>As previously described, REDEN identi es and looks for mention candidates'
URIs in indexes built on the y per class. So for instance, the entries of a
person in the index correspond to super cial forms of that person's name and
the corresponding URIs from di erent Linked Data sets. REDEN then retrieves
the RDF data from the corresponding URIs found in the index for each selected
source.</p>
      <p>Clearly, a source can be more useful for certain domains than another, for
instance the French National Library (BnF) linked data repository is the most
appropriate source for a domain such as French literary criticism. BnF provides
the catalog of authors for all books ever published in France; their entries
contain information on dates of birth and death, gender, alternative names, works
authored, etc. Most importantly, BnF provides equivalence links (owl:sameAs )
to other sources such as DBpedia or IdRef. Thus, BnF can be chosen as the
entry point for building an index of authors. In particular information on gender
and full names can be used to generate further alternative names - for instance:
surname only (Rousseau), initials + surname (J.J. Rousseau, JJ Rousseau, ...),
title + surname (M. Rousseau, M Rousseau), ... Equivalence links may also be
stored, so that later on REDEN can build fused graphs from di erent sources.</p>
      <p>Likewise, using DBpedia as entry point an index of places can be assembled.
For instance, the DBpedia entity Gare de Paris Saint-Lazare has the following
name forms (familiar, o cial and old ones) - for instance: Gare Saint-Lazare,
Gare Saint-Lazare (Paris), Embarcadere de l'Ouest, ...
8 Code source and useful
https://github.com/cvbrandoe/REDEN
resources
can
be
found
here:</p>
      <p>We have therefore developed an index-building module which allows for the
con guration and the implementation of a crawler for a chosen LD source. The
main purpose of this module is to facilitate the construction and the update of
entity indexes which should be suitable for the domain of the exploited corpora.
In particular, the temporal dimension and possibly the spatial dimensions can be
used as lters. For instance when working with 19th century texts, a time lter
can a priori exclude the currently well-known singer Thierry Amiel as a candidate
for the mention `Amiel' and thus enable the identi cation of the correct referent,
well-known writer Henri-Frederic Amiel.</p>
      <p>The proposed module takes into account domain parameters such as the
temporal and spatial dimensions. For instance, a time period can be speci ed
with respect to the date of birth of the persons described in the corresponding
linked data set, or to any other property of the individual (date of death, periods
of activity). Concerning the spatial extent of the index, one may be interested
in including only persons who were born or have lived in Europe because the
corpora probably concerns authors with these features. A bounding box can
be de ned by specifying a rectangle consisting of four Latitude and Longitude
coordinates9. It is also straightforward to de ne new domain parameters to lter
for themes, such as Symbolism or Psychology.</p>
      <p>From a technical point-of-view, a crawler performs SPARQL queries using
a selected end point. The created indexes can bene t from text search engines
which enable the optimal querying of candidates by REDEN and therefore reduce
query processing time.</p>
      <p>Currently REDEN comes with two already implemented crawlers: one to
collect an index of place candidates from DBpedia and one to crawl an index of
author candidates from the LD repository of the BnF, containing alternative
names for every entity. In the case of authors, as BnF entries point to DBpedia
equivalents when available, these two sources can be easily fused using the
aforementioned algorithm. In the case of places, there is no need to fuse other LD
sources although one may be interested in including a place gazetteer available
in linked data such as Pleaides10 when working with Ancient texts.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Tool assessment</title>
      <p>
        In previous experiments [
        <xref ref-type="bibr" rid="ref1 ref4">4, 1</xref>
        ], we have discussed and evaluated our graph-based
method, which allows for state of the art accuracy in NEL for the texts in our
domain. Here we assess REDEN as a tool, thus not only in terms of accuracy, but
also of adaptability and usefulness. For this reason we compare the performance
of our linker with an available state of the art tool such as DBpedia Spotlight
(DBSL). As our texts are in French, the language independent version of DBSL
is used. Thus the comparison is not fair, as this version performs both NERC and
NEL at the same time, while we run REDEN on a TEI text with a pre-checked
9 Some Web mapping tools easily provide this kind
      </p>
      <p>http://www.mapdevelopers.com/geocode bounding box.php
10 http://pleiades.stoa.org
of information:
set of classi ed mentions. Nevertheless, the comparison can help to give an idea
of the di erences in coverage that our method o ers.</p>
      <p>
        Since REDEN treats one class at a time, DBSL too is called with the chosen
class as a parameter. Moreover, we use the most relaxed con guration possible,
accepting any matching results with no lter on support or con dence, to
maximize recall. DBSL is installed locally and launched using French DBpedia as
a knowledge base. As for REDEN, the chosen centrality measure is
DegreeCentrality [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The manually linked test corpus consists of a French text of literary criticism
entitled \Une these sur le symbolisme" (A thesis about Symbolism) published by
Albert Thibaudet in 1936, and a scienti c essay entitled \L'evolution creatrice"
(Creative Evolution) written by Henri Bergson and published in 1907. We can
consider two measures of accuracy: A1, the proportion of correct links over the
mentions for which a link was chosen, and A2, the proportion of correctly
assigned links over the number of mentions overall. The second measure considers
missing links as mistakes, and can be seen as a measure for coverage11.
5.1</p>
      <sec id="sec-5-1">
        <title>Experiment 1: Locations</title>
        <p>Even though REDEN's strength lies in the possibility of merging resources, we
will rst compare the two algorithms on French DBpedia alone12. Since DBSL
cannot process more than a paragraph at a time, REDEN too is con gured to
use paragraphs as contexts for its graphs. We choose the task of linking location
mentions, which, in our texts are rather sparse but not particularly
domainspeci c. A priori we expect most referents for locations' mentions to be present
in the knowledge base.</p>
        <p>As shown by the results, both linking algorithms are comparable. They both
achieve excellent correctness rates when they assign links, but they miss a certain
number of places. Missing places occur often due to the lack of alternative names,
such as \Lacedemone" (another name for Sparta), or from entries that are
missclassi ed in DBpedia, such as \Berlin" which is classi ed as a Concept and not
11 These measures may be considered to roughly correspond to precision and recall,
though such terms seem less adapted to describe this type of task.
12 REDEN and DBSL use versions of 05/03/15 and of 17/07/2014, respectively.
Unfortunately, it is technically not possible for REDEN to access a snapshot of DBpedia
RDF resources from that particular date.
as a Location (in the French version of DBpedia). REDEN achieves a better
coverage even when using the same knowledge base, possibly because it better
exploits the available alternate names, or because DBSL does not provide an
answer when it is unable to nd a correspondence between the textual context
of a mention and the description of a candidate.</p>
        <p>As previously stated this evaluation is meant as a comparison of the
algorithms, as REDEN works on pre-detected mentions while DBSL has to perform
detection too. That said, in this experiment DBSL analyses only paragraphs
containing locations and we force the type \location" in input; in most cases
when DBSL is missing a link the name of the mention is written in capitalized
words and coincides exactly with the title of a DBpedia entry; thus it is possible
to conclude that, in such cases, it is not the detection but the linking that fails.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Experiment 2: Persons</title>
        <p>The second experiment aims to show how our tool can achieve a much better
performance on this corpus thanks to the possibility of combining two knowledge
bases to target a speci c domain. Here, the more di cult task of identifying
authors' mentions is evaluated. DBSL is run using French DBpedia as usual,
REDEN is run on an index of authors built from French DBpedia + the BnF
linked data. Moreover, we exploit all the ne-tuning options o ered by REDEN:
in order to best exploit contextual information we increase the disambiguation
context, using the chapter for Thibaudet, which is a text with high mentions'
density, and the whole book for Bergson because it is poorer in mentions13.
Moreover, we lter the index by preventing authors who weren't born or were
too young at the time of publication of each work from becoming candidates.</p>
        <p>As shown in Table 2, DBSL achieves a high correctness rate for persons when
it makes a choice, assigning almost perfectly each mention found on both corpora.
On the other hand, the amount of entities found is much smaller. The reason
is obviously that REDEN exploits additional information, granting coverage for
authors who aren't present in DBpedia.</p>
        <p>
          At the same time, we notice that DBSL correctly recognizes the mention of
(Alphonse de) \Lamartine" but not those of (Maurice) \Barres" and (Alphred
13 In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we showed that considering di erent context sizes can increasingly improve
correctness rates; we also assess the impact of ltering the knowledge base with
temporal window.
de) \Vigny", when all three authors have a referent in DBpedia, and despite
the fact that we are running the algorithm with the lowest possible con dence.
This may mean that the unstructured information on which DBSL relies is not
enough to make a decision, while the graph of relations that REDEN exploits
is richer in information. Generally speaking REDEN's correctness rate is close
to state of the art (even with respect to other graph-based algorithms, whose
stated correctness rate is around 0.85), although Thibaudet, which is richer in
mentions, gives better results.
        </p>
        <p>It is important to note here that DBSL is more precise than REDEN in
those cases where it actually chooses a link (A1). In particular, it beats our
method when a paragraph contains only one ambiguous candidate mention. In
this case, the unstructured textual context is a useful source of information to
disambiguate, while the graph-based algorithm cannot use background
information. Nevertheless, the greater coverage of the graph-based approach makes the
overall post-processing work of manual checking less cumbersome in terms of
both correction and integration.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and future work</title>
      <p>In this article we presented REDEN, a graph-based tool for Named Entity
Linking that can be easily extended to di erent online sources and tailored to suit
various di erent needs. Domain adaptation is enabled by entity indexes built
on the y using linked data crawlers. New sources for a particular domain can
be added by con guring a new crawler with the appropriate SPARQL query.
Experiments involving the disambiguation of persons' and locations' names in a
corpus of French essays of the 19th century show how the graph-based algorithm
gives us state of the art correctness results while allowing for more exibility and
coverage.</p>
      <p>The experiment with DBSL highlighted some of the practical advantages
of our tool. First, DBSL is very dependent on the DBpedia. Con guring it to
use new knowledge bases such as BnF does not yet seem possible. Furthermore,
REDEN works natively with RDF. The structured information used has thus
the potentiality to grow as new links are added to the semantic web. REDEN
uses online sources (end points) which allows for the easily downloading of the
most up-to-date version of the data from the corresponding repository. These
data are cached by REDEN in order to download an RDF resource only once.
On the contrary, DBSL uses o ine resources which can be an advantage when
bandwidth is limited, however updating the DBSL knowledge base looks to be
a very di cult and time consuming process. Another technical limitation is the
size of the context that DBSL uses for its decision; the tool runs as a Web service
accepting HTTP calls and the URL size is restricted. In REDEN, the size of the
context does not need to be xed, but can be set by the user according to
prede ned textual partitions in TEI; thus the tool can run using all mentions in
a paragraph, a chapter, or a whole text as a disambiguation context. Finally
crawlers can be customized in such a way as to apply (time or space) lters on
candidates, thus making it domain adapted.</p>
      <p>For these reasons we conclude that the proposed methodology seems better
adapted to the task of linking named-entities to a knowledge base other than
DBpedia. Thus it seems particularly useful for digital editions in the humanities,
where texts are already in TEI format, and specialists have a good knowledge
of the text and of relevant sources of information.</p>
      <p>
        As prospective work we intend to perform an exhaustive comparison of
REDEN to other graph-based NEL tools such as AGDISTIS[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the DBpedia
Graph Project focusing on domain-speci c test corpora, in particular, on texts
used in French Literature and more generally in the Digital Humanities.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was supported by French state funds managed by the ANR within
the Investissements d'Avenir programme under reference ANR-11-IDEX-0004-02
and by an IFER Fernand Braudel Scholarship awarded by FMSH.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Brando</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frontini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganascia</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          :
          <article-title>Disambiguation of named entities in cultural heritage texts using linked data sets (accepted)</article-title>
          .
          <source>In: Proceedings of the First International Workshop on Semantic Web for Cultural Heritage in Conjunction with 19th East-European Conference on Advances in Databases and Information Systems</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Daiber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hokamp</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          :
          <article-title>Improving e ciency and accuracy in multilingual entity extraction</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics)</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Freeman</surname>
          </string-name>
          , L.C.
          <article-title>: A set of measures of centrality based on betweenness</article-title>
          . Sociometry pp.
          <volume>35</volume>
          {
          <issue>41</issue>
          (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Frontini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brando</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganascia</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          :
          <article-title>Semantic web based named entity linking for digital humanities and heritage texts</article-title>
          .
          <source>In: Proceedings of the First International Workshop Semantic Web for Scienti c Heritage at the 12th ESWC 2015 Conference</source>
          . pp.
          <volume>77</volume>
          {
          <issue>88</issue>
          (
          <year>2015</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1364</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hachey</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curran</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Graph-based named entity linking with wikipedia</article-title>
          .
          <source>In: Web Information System Engineering{WISE</source>
          <year>2011</year>
          , pp.
          <volume>213</volume>
          {
          <fpage>226</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Entity linking: Finding extracted entities in a knowledge base</article-title>
          .
          <source>In: Multi-source, Multilingual Information Extraction and Summarization</source>
          , pp.
          <volume>93</volume>
          {
          <fpage>115</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rochat</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Character Networks and Centrality</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Lausanne (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sinha</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
          </string-name>
          , R.:
          <article-title>Unsupervised graph-basedword sense disambiguation using measures of word semantic similarity</article-title>
          .
          <source>In: ICSC</source>
          . vol.
          <volume>7</volume>
          , pp.
          <volume>363</volume>
          {
          <issue>369</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Usbeck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gerber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Both</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Agdistis - graphbased disambiguation of named entities using linked data</article-title>
          .
          <source>In: 13th International Semantic Web Conference</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>