<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Arretium or Arezzo? A Neural Approach to the Identification of Place Names in Historical Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rachele Sprugnoli Fondazione Bruno Kessler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Via Sommarive</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>sprugnoli@fbk.eu</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>English. This paper presents the application of a neural architecture to the identification of place names in English historical texts. We test the impact of different word embeddings and we compare the results to the ones obtained with the Stanford NER module of CoreNLP before and after the retraining using a novel corpus of manually annotated historical travel writings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Named Entity Recognition (NER), that is the
automatic identification and classification of proper
names in texts, is one of the main tasks of Natural
Language Processing (NLP), having a long
tradition started in 1996 with the first major event
dedicated to it, i.e. the Sixth Message Understanding
Conference (MUC-6)
        <xref ref-type="bibr" rid="ref14">(Grishman and Sundheim,
1996)</xref>
        . In the field of Digital Humanities (DH),
NER is considered as one of the important
challenges to tackle for the processing of large cultural
datasets
        <xref ref-type="bibr" rid="ref17">(Kaplan, 2015)</xref>
        . The language variety of
historical texts is however greatly different from
the one of the contemporary texts NER systems
are usually developed to annotate, thus an
adaptation of current systems is needed.
      </p>
      <p>
        In this paper, we focus on the identification of
place names, a specific sub-task that in DH is
envisaged as the first step towards the complete
geoparsing of historical texts, which final aim is
to discover and analyse spatial patterns in
various fields, from environmental history to literary
studies, from historical demography to
archaeology
        <xref ref-type="bibr" rid="ref12">(Gregory et al., 2015)</xref>
        . More specifically, we
propose a neural approach applied to a new
manually annotated corpus of historical travel writings.
In our experiments we test the performance of
different pre-trained word embeddings, including a
set of word vectors we created starting from
historical texts. Resources employed in the experiments
are publicly released together with the model that
achieved the best results in our task1.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Different domains - such as Chemistry,
Biomedicine and Public Administration
        <xref ref-type="bibr" rid="ref10 ref15 ref18 ref24 ref25">(Eltyeb and Salim, 2014; Habibi et al., 2017; Passaro
et al., 2017)</xref>
        - have dealt with the NER task by
developing domain-specific guidelines and
automatic systems based on both machine learning
and deep learning algorithms
        <xref ref-type="bibr" rid="ref19 ref21 ref29 ref4 ref9">(Nadeau and Sekine,
2007; Ma and Hovy, 2016)</xref>
        . In the field of Digital
Humanities, applications have been proposed for
the domains of Literature, History and Cultural
Heritage
        <xref ref-type="bibr" rid="ref29 ref33 ref4">(Borin et al., 2007; Van Hooland et al.,
2013; Sprugnoli et al., 2016)</xref>
        . In particular, the
computational treatment of historical newspapers
has received much attention being, at the moment,
the most investigated text genre
        <xref ref-type="bibr" rid="ref16 ref20 ref22 ref23 ref28 ref9">(Jones and Crane,
2006; Neudecker et al., 2014; Mac Kim and
Cassidy, 2015; Neudecker, 2016; Rochat et al.,
2016)</xref>
        .
      </p>
      <p>
        Person, Organization and Location
are the three basic types adopted by
generalpurpose NER systems, even if different entity
types can be detected as well, depending on
1https://dh.fbk.eu/technologies/
place-names-historical-travel-writings
the guidelines followed for the manual
annotation of the training data
        <xref ref-type="bibr" rid="ref32 ref8">(Tjong Kim Sang and
De Meulder, 2003; Doddington et al., 2004)</xref>
        . For
example, political, geographical and functional
locations can be merged in a unique type or
identified by different types: in any case, their
detection has assumed a particular importance in
the context of the spatial humanities framework,
that puts the geographical analysis at the center of
humanities research
        <xref ref-type="bibr" rid="ref2">(Bodenhamer, 2012)</xref>
        .
However, in this domain, the lack of pre-processing
tools, linguistic resources, knowledge-bases and
gazetteers is considered as a major limitation to
the development of NER systems with a good
accuracy
        <xref ref-type="bibr" rid="ref28 ref9">(Ehrmann et al., 2016)</xref>
        .
      </p>
      <p>
        Compared to previous works, our study focuses
on a text genre not much investigated in NLP
but of great importance from the historical and
cultural point of view: travel writings are indeed
a source of information for many research areas
and are also the most representative type of
intercultural narrative
        <xref ref-type="bibr" rid="ref1 ref5">(Burke, 1997; Beaven,
2007)</xref>
        . In addition, we face the problem of poor
resource coverage by releasing new historical
word vectors and testing an architecture that does
not require any manual feature selection, and thus
neither text pre-processing nor gazetteers.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Manual Annotation</title>
      <p>
        We manually annotated a corpus of 100,000
tokens divided in 38 texts taken from a collection
of English travel writings (both travel reports and
guidebooks) about Italy published in the second
half of the XIX century and the ’30s of the XX
century
        <xref ref-type="bibr" rid="ref31">(Sprugnoli, 2018)</xref>
        . The tag Location
was used to mark all named entities (including
nicknames like city on the seven hill) referring to:
geographical locations: landmasses
(Janiculum Hill, Vesuvius), body of waters (Tiber,
Mediterranean Sea), celestial bodies (Mars),
natural areas (Campagna Romana,
Sorrentine Peninsula);
political locations: areas defined by
sociopolitical groups, such as cities (Venice,
Palermo), regions (Tuscany, Lazio),
kingdoms (Regno delle due Sicilie), nations (Italy,
Vatican);
functional locations: areas and places that
serve a particular purpose, such as facilities
(Hotel Riposo, Church of St. Severo),
monuments and archaeological sites (Forum
Romanum) and streets (Via dell’Indipendenza).
The three aforementioned definitions correspond
to three entity types of the ACE guidelines, i.e.,
GPE (geo-political entities), LOC (locations) and
FAC (facilities): we extended this latter type to
cover material cultural assets, that is the built
cultural inheritance made of buildings, sites,
monuments that constitute relevant locations in the
travel domain.
      </p>
      <p>
        The annotation required 3 person/days of work
and, at the end, 2,228 proper names of locations
were identified in the corpus, among which 657
were multi-token (29.5%). The inter-annotator
agreement, calculated on a subset of 3,200 tokens,
achieved a Cohen’s kappa coefficient of 0.93
        <xref ref-type="bibr" rid="ref6">(Cohen, 1960)</xref>
        , in line with previous results on named
entities annotation in historical texts
        <xref ref-type="bibr" rid="ref28 ref9">(Ehrmann et
al., 2016)</xref>
        .
      </p>
      <p>
        The annotation highlighted the presence of
specific phenomena characterising place names in
historical travel writings. First of all, the same
place can be recorded with variations in spelling
across different texts but also in the same text: for
example, modern names can appear together with
the corresponding ancient names (Trapani
gradually assumes the form that gave it its Greek name
of Drepanum) and places can be addressed by
using both the English name and the original one, the
latter occurring in particular in code-mixing
passages
        <xref ref-type="bibr" rid="ref30">(Sprugnoli et al., 2017)</xref>
        such as in: (Byron
himself hated the recollection of his life in Venice,
and I am sure no one else need like it. But he is
become a cosa di Venezia, and you cannot pass
his palace without having it pointed out to you by
the gondoliers.). Second, some names are written
with the original Latin alphabet graphemes, such
as Ætna and Tropaea Marii. Then, there are names
having a wrong spelling: e.g., Cammaiore instead
of Camaiore and Momio instead of Mommio. In
addition, there are several long multi-token proper
names, especially in case of churches and other
historical sites, e.g. House of the Tragic Poet,
Church of San Pietro in Vincoli, but also
abbreviated names used to anonymise personal addresses,
e.g. Hotel B.. Travel writings included in the
corpus are about cities and regions of throughout Italy
thus there is a high diversity in the mentioned
locations, from valleys in the Alps (Val Buona) to
small villages in Sicily (Capo S. Vito). However,
even if the main topic of the corpus is the
description of travels in Italy, there are also references to
places outside the country, typically used to make
comparisons (Piedmont, in Italy, is nothing at all
like neighbouring Dauphine´ or Savoie).
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>Experiments for the automatic identification of
place names were carried out using the annotated
corpus described in the previous Section. The
corpus, in BIO format, was divided in a training, a
test and a development set following a 80/10/10
split. For the classification, we tested two
approaches: we retrained the NER module of
Stanford CoreNLP with our in-domain annotated
corpus and we used a BiLSTM implementation
evaluating the impact of different word embeddings,
including three new historical pre-trained word
vectors.
4.1</p>
      <sec id="sec-4-1">
        <title>Retraining of Stanford NER Module</title>
        <p>
          The NER system integrated in Stanford CoreNLP
is an implementation of Conditional Random
Field (CRF) sequence models
          <xref ref-type="bibr" rid="ref11">(Finkel et al., 2005)</xref>
          trained on a corpus made by several datasets
(CONLL, MUC-6, MUC-7, ACE) for a total of
more than one million tokens2. The model
distributed with the CoreNLP distribution is
therefore based on contemporary texts, most of them
of the news genre but also weblogs, newsgroup
messages and broadcast conversations. We
evaluated this model (belonging to the 3.8.0 release of
CoreNLP) on our test set and then we trained a
new CRF model using our training data.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Neural Approach</title>
        <p>
          We adopted an implementation of
BiLSTMCRF developed from the Ubiquitous Knowledge
Processing Lab (Technische Universita¨t
Darmstadt)3. This architecture exploits casing
information, character embeddings and word
embeddings; no feature engineering is required
          <xref ref-type="bibr" rid="ref24 ref26 ref27 ref3 ref30">(Reimers
and Gurevych, 2017a)</xref>
          . We chose this
implementation because the authors propose
recommended hyperparameter configurations for several
sequence labelling tasks, including NER, that we
took as a reference for our own experiments. More
specifically, the setup suggested by Reimers and
2https://nlp.stanford.edu/software/
CRF-NER.html
        </p>
        <p>3https://github.com/UKPLab/
emnlp2017-bilstm-cnn-crf
Gurevych (2017a) for the NER task is summarised
below:
dropout: 0.25, 0.25
classifier: CRF
LSTM-Size: 100
optimizer: NADAM
word embeddings: GloVe Common Crawl
840B
character embeddings: CNN
miniBatchSize: 32</p>
        <p>
          Starting from this configuration, we evaluated
the performance of the NER classifier trying
different pre-trained word embeddings. Given that
the score of a single run is not significant due to the
different results producing by different seed values
          <xref ref-type="bibr" rid="ref24 ref26 ref27 ref3 ref30">(Reimers and Gurevych, 2017b)</xref>
          , we run the
system three times and we calculated the average of
the test score corresponding to the epoch with the
highest result on the development test. We used
Keras version 1.04 and Theano 1.0.05 as backend;
we stopped after 10 epochs in case of no
improvements on the development set.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.2.1 Pre-trained Word Embeddings</title>
        <p>
          We tested a set of word vectors available online, all
with 300 dimensions, built on corpora of
contemporary texts and widely adopted in several NLP
tasks, namely: (i) GloVe embeddings, trained on
a corpus of 840 billion tokens taken from
Common Crawl data
          <xref ref-type="bibr" rid="ref25">(Pennington et al., 2014)</xref>
          ; (ii)
Levy and Goldberg embeddings, produced from
the English Wikipedia with a dependency-based
approach
          <xref ref-type="bibr" rid="ref10 ref18 ref25">(Levy and Goldberg, 2014)</xref>
          ; (iii) fastText
embeddings, trained on the English Wikipedia
using sub-word information
          <xref ref-type="bibr" rid="ref3">(Bojanowski et al.,
2017)</xref>
          . By taking into consideration these
pretrained embeddings, we cover different types of
word representation: GloVe is based on linear
bagof-words contexts, Levy on dependency
parsetrees, and fastText on a bag of character n-grams.
        </p>
        <p>
          In addition, we employed word vectors we
developed using GloVe, fastText and Levy and
Goldberg’s algorithms on a a subset of the Corpus
of Historical American English (COHA)
          <xref ref-type="bibr" rid="ref7">(Davies,
2012)</xref>
          made of more than 198 million words. The
chosen subset contains more than 3,800 texts
belonging to four genres (i.e., fiction, non-fiction,
newspaper, magazine) published in the same
temporal span of our corpus of travel writings. These
4https://keras.io/
5http://deeplearning.net/software/
theano/
historical embeddings, named HistoGlove,
HistoFast and HistoLevy, are available online6.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>Table 1 shows the results of our experiments in
terms of precision (P), recall (R) and F-measure
(F1): the score obtained with the Stanford NER
module before and after the retraining is compared
with the one achieved with the deep learning
architecture and different pre-trained word
embeddings.</p>
      <p>The neural approach performs remarkably
better than the CFR sequence models with a
difference ranging from 11 to 14 points in terms of F1,
depending on the word vectors used. The
original Stanford module produces much unbalanced
results with the lowest recall and F1 but a
precision above 82. In all the other experiments, scores
are more balanced even if in the majority of the
neural experiments recall is slightly higher than
precision, meaning that BiLSTM is more able to
generalise the observations of named entities from
the training data. Although the training data are
few, compared to the corpora used for the
original Stanford NER module, they produce an
improvement of 13.1 and 5.9 points on recall and F1
respectively, demonstrating the positive impact of
having in-domain annotated data.</p>
      <p>As for word vectors, dependency-based
embeddings are not the best word representation for the
NER task having the lowest F1 among the
experiments with the neural architecture. It is worth
noticing that GloVe, suggested as the best word
vectors by Reimers and Gurevych (2017a) for the
NER task on contemporary texts, does not achieve
the best scores on our historical corpus. Linear
bag-of-words contexts is however confirmed as
the most appropriate word representation for the
identification of Named Entities, given that
HistoGloVe produces the highest scores for all the
three metrics.</p>
      <p>The improvement obtained with the neural
approach combined with historical word vectors and
in-domain training data is evident when looking
in details at the results over the three files
constituting the test set. These texts were extracted
from two travel reports, “A Little Pilgrimage in
Italy” (1911) and “Naples Riviera” (1907) and one
guidebook, “Rome” (1905). The text taken from
the latter book is particularly challenging for the
6http://bit.do/esiaS</p>
      <sec id="sec-5-1">
        <title>Stanford NER</title>
      </sec>
      <sec id="sec-5-2">
        <title>Retrained Stanford NER</title>
      </sec>
      <sec id="sec-5-3">
        <title>Neural HistoLevy</title>
      </sec>
      <sec id="sec-5-4">
        <title>Neural Levy</title>
      </sec>
      <sec id="sec-5-5">
        <title>Neural HistoFast</title>
      </sec>
      <sec id="sec-5-6">
        <title>Neural GloVe</title>
      </sec>
      <sec id="sec-5-7">
        <title>Neural FastText</title>
      </sec>
      <sec id="sec-5-8">
        <title>Neural HistoGlove P</title>
        <p>presence of many Latin place names and locations
related to the ancient (and even mythological)
history of the city of Rome, e.g. Grotto of Lupercus,
Alba Longa. As displayed in Table 2, Neural
HistoGloVe increases the F1 score of 9.8 points on the
first file, 12.7 on the second and 25.3 on the third.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Works</title>
      <p>In this paper we presented the application of a
neural architecture to the automatic identification of
place names in historical texts. We chose to work
on an under-investigated text genre, namely travel
writings, that presents a set of specific linguistic
features making the NER task particularly
challenging. The deep learning approach, combined
with in-domain training set and in-domain
historical embeddings, outperforms the linear CRF
classifier of the Stanford NER module without the
need of performing feature engineering.
Annotated corpus, best model and historical word
vectors are all freely available online.</p>
      <p>As for future work, we plan to experiment with
a finer-grained classification so to distinguish
different types of locations. In addition, another
aspect worth studying is the georeferencing of
identified place names so to map the geographical
dimension of travel writings in Italy. An example
of visualisation is given in Figure 1 where the
locations automatically identified from the test file
taken from the book “Naples Riviera” are
displayed: place names have been georeferenced
using the Geocoding API7 offered by Google and
displayed through the Carto8 web mapping tool.
Another interesting work would be the detection
of itineraries of past travellers: this application
could have a potential impact on the tourism
sector, suggesting historical routes alternative to those
more beaten and congested and making tourists
rediscovering sites long forgotten.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The author wants to thank Manuela Speranza for
her help with inter-annotator agreement.
7https://developers.google.com/maps/
documentation/geocoding/start
8https://carto.com/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Tita</given-names>
            <surname>Beaven</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A life in the sun: Accounts of new lives abroad as intercultural narratives</article-title>
          .
          <source>Language and Intercultural Communication</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>188</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>David J</given-names>
            <surname>Bodenhamer</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>The spatial humanities: space, time and place in the new digital age</article-title>
          .
          <source>In History in the Digital Age</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>50</lpage>
          . Routledge.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>5</volume>
          :
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Lars</given-names>
            <surname>Borin</surname>
          </string-name>
          ,
          <source>Dimitrios Kokkinakis, and Leif-Jo¨ran Olsson</source>
          .
          <year>2007</year>
          .
          <article-title>Naming the past: Named entity and animacy recognition in 19th century Swedish literature</article-title>
          .
          <source>In Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH</source>
          <year>2007</year>
          ), pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Peter</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Varieties of cultural history</article-title>
          . Cornell University Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <year>1960</year>
          .
          <article-title>A coefficient of agreement for nominal scales</article-title>
          .
          <source>Educational and psychological measurement</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>37</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Mark</given-names>
            <surname>Davies</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English</article-title>
          . Corpora,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <fpage>121</fpage>
          -
          <lpage>157</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>George R Doddington</surname>
          </string-name>
          , Alexis Mitchell,
          <article-title>Mark A Przybocki, Lance A Ramshaw, Stephanie Strassel</article-title>
          , and
          <string-name>
            <surname>Ralph M Weischedel</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>The Automatic Content Extraction (ACE) Program-Tasks, Data, and</article-title>
          <string-name>
            <surname>Evaluation. In LREC</surname>
          </string-name>
          , volume
          <volume>2</volume>
          , pages
          <fpage>837</fpage>
          -
          <lpage>840</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Maud</given-names>
            <surname>Ehrmann</surname>
          </string-name>
          , Giovanni Colavizza, Yannick Rochat, and Fre´de´ric Kaplan.
          <year>2016</year>
          .
          <article-title>Diachronic evaluation of NER systems on old newspapers</article-title>
          .
          <source>In Proceedings of the 13th Conference on Natural Language Processing (KONVENS</source>
          <year>2016</year>
          )),
          <source>number EPFL-CONF221391</source>
          , pages
          <fpage>97</fpage>
          -
          <lpage>107</lpage>
          . Bochumer Linguistische Arbeitsberichte.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Safaa</given-names>
            <surname>Eltyeb</surname>
          </string-name>
          and
          <string-name>
            <given-names>Naomie</given-names>
            <surname>Salim</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Chemical named entities recognition: a review on approaches and applications</article-title>
          .
          <source>Journal of cheminformatics</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Jenny</given-names>
            <surname>Rose</surname>
          </string-name>
          <string-name>
            <surname>Finkel</surname>
          </string-name>
          , Trond Grenager, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Incorporating non-local information into information extraction systems by Gibbs sampling</article-title>
          .
          <source>In Proceedings of the 43rd annual meeting on association for computational linguistics</source>
          , pages
          <fpage>363</fpage>
          -
          <lpage>370</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Gregory</surname>
            , Christopher Donaldson, Patricia Murrieta-Flores, and
            <given-names>Paul</given-names>
          </string-name>
          <string-name>
            <surname>Rayson</surname>
          </string-name>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>International Journal of Humanities and Arts Computing</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Ralph</given-names>
            <surname>Grishman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Beth</given-names>
            <surname>Sundheim</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>Message understanding conference-6: A brief history</article-title>
          .
          <source>In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics</source>
          , volume
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Maryam</given-names>
            <surname>Habibi</surname>
          </string-name>
          , Leon Weber,
          <string-name>
            <given-names>Mariana</given-names>
            <surname>Neves</surname>
          </string-name>
          , David Luis Wiegandt, and
          <string-name>
            <given-names>Ulf</given-names>
            <surname>Leser</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Deep learning with word embeddings improves biomedical named entity recognition</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>33</volume>
          (
          <issue>14</issue>
          ):
          <fpage>i37</fpage>
          -
          <lpage>i48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Alison</given-names>
            <surname>Jones</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gregory</given-names>
            <surname>Crane</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>The challenge of Virginia Banks: an evaluation of named entity analysis in a 19th-century newspaper collection</article-title>
          .
          <source>In Digital Libraries</source>
          ,
          <year>2006</year>
          .
          <source>JCDL'06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          Fre´de´ric Kaplan.
          <year>2015</year>
          .
          <article-title>A map for big data research in digital humanities</article-title>
          .
          <source>Frontiers in Digital Humanities</source>
          ,
          <volume>2</volume>
          :
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Omer</given-names>
            <surname>Levy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>DependencyBased Word Embeddings</article-title>
          .
          <source>In ACL (2)</source>
          , pages
          <fpage>302</fpage>
          -
          <lpage>308</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Xuezhe</given-names>
            <surname>Ma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eduard</given-names>
            <surname>Hovy</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>End-to-end Sequence Labeling via Bi-directional LSTM-CNNsCRF</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          , pages
          <fpage>1064</fpage>
          -
          <lpage>1074</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Sunghwan</given-names>
            <surname>Mac</surname>
          </string-name>
          Kim and
          <string-name>
            <given-names>Steve</given-names>
            <surname>Cassidy</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Finding names in trove: named entity recognition for Australian historical newspapers</article-title>
          .
          <source>In Proceedings of the Australasian Language Technology Association Workshop 2015</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Nadeau</surname>
          </string-name>
          and
          <string-name>
            <given-names>Satoshi</given-names>
            <surname>Sekine</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A survey of named entity recognition and classification</article-title>
          .
          <source>Lingvisticae Investigationes</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Clemens</given-names>
            <surname>Neudecker</surname>
          </string-name>
          , Lotte Wilms, Wille Jaan Faber, and Theo van Veen.
          <year>2014</year>
          .
          <article-title>Large-scale refinement of digital historic newspapers with named entity recognition</article-title>
          .
          <source>In Proc IFLA Newspapers/GENLOC Pre-Conference Satellite Meeting.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Clemens</given-names>
            <surname>Neudecker</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>An Open Corpus for Named Entity Recognition in Historic Newspapers</article-title>
          .
          <source>In LREC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Lucia C Passaro</surname>
          </string-name>
          ,
          <string-name>
            <surname>Alessandro Lenci</surname>
            , and
            <given-names>Anna</given-names>
          </string-name>
          <string-name>
            <surname>Gabbolini</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>INFORMed PA: A NER for the Italian Public Administration Domain</article-title>
          .
          <source>In Fourth Italian Conference on Computational Linguistics CLiC-it</source>
          <year>2017</year>
          , pages
          <fpage>246</fpage>
          -
          <lpage>251</lpage>
          . Accademia University Press.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Jeffrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Nils</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          . 2017a.
          <article-title>Optimal hyperparameters for deep lstm-networks for sequence labeling tasks</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>06799</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Nils</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          . 2017b.
          <article-title>Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>338</fpage>
          -
          <lpage>348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Yannick</given-names>
            <surname>Rochat</surname>
          </string-name>
          , Maud Ehrmann, Vincent Buntinx, Cyril Bornet, and Fre´de´ric Kaplan.
          <year>2016</year>
          .
          <article-title>Navigating through 200 years of historical newspapers</article-title>
          .
          <source>In iPRES</source>
          <year>2016</year>
          ,
          <string-name>
            <surname>number</surname>
            <given-names>EPFL</given-names>
          </string-name>
          -CONF-
          <volume>218707</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , Giovanni Moretti, Sara Tonelli, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Menini</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Fifty years of European history through the lens of computational linguistics: the De Gasperi Project</article-title>
          .
          <source>Italian Journal of Computational Linguistics</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          , Sara Tonelli, Giovanni Moretti, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Menini</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A little bit of bella pianura: Detecting Code-Mixing in Historical English Travel Writing</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Rachele</given-names>
            <surname>Sprugnoli</surname>
          </string-name>
          .
          <year>2018</year>
          . “
          <article-title>Two days we have passed with the ancients</article-title>
          ...”
          <article-title>: a Digital Resource of Historical Travel Writings on Italy</article-title>
          .
          <source>In Book of Abstract of AIUCD 2018 Conference. AIUCD.</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Erik F Tjong Kim Sang and Fien De Meulder</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition</article-title>
          .
          <source>In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume</source>
          <volume>4</volume>
          , pages
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Seth Van Hooland</surname>
          </string-name>
          , Max De Wilde, Ruben Verborgh, Thomas Steiner, and Rik Van de Walle.
          <year>2013</year>
          .
          <article-title>Exploring entity recognition and disambiguation for cultural heritage collections</article-title>
          .
          <source>Digital Scholarship in the Humanities</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <fpage>262</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>