<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Digital Humanities and Portuguese Processing: a research pathway ⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Ri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>r mın´ i</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivo S</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>quim S</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Gon</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Fin</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ulo Qu</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIDEHUS, Instituto Polietc ́nico de Portalegre</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CIDEHUS,Universidade de E</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science, Universidade de E</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>IRIT</institution>
          ,
          <addr-line>Universiet ́ de Toulouse</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>PPGLA, Universidade Federal do Rio Grande do Sul</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper reflects on the whole path of work in digital humanities, on the light of the projects related to text processing under development at CIDEHUS. These projects deal with a rich heritage related to the Portuguese culture, history and language. This paper relfects on the many challenges to be faced and how NLP techniques may broaden the capabilities of organising and sharing knowledge related to these resources.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural language processing</kwd>
        <kwd>Portuguese language</kwd>
        <kwd>Digital Humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This paper presents and discusses the challenges faced by projects under
development at CIDEHUS (Center for History, Culture and Societies), in the area of
Digital Humanities, particularly those related to text processing. The main point
in common in the interdisciplinary projects we discuss here is having Portuguese
texts as their primary knowledge source, mostly pre-contemporary ones. The
diferences are mainly the historical periods of the sources, whether manuscripts
or printed material, and their stage of digitisation, which varies from digital
images, PDF texts and digitised texts. The paper presents an overview of the
collections that are of interest. Then, we discuss the common starting points
to deal with not yet digitised material, and then discuss the first organisation
requirements once they are digitised or transcribed. Next, we present the more
advanced processing undertaken in some of the collections under study, as well
as discuss current research goals and how NLP is required or involved. At last,
we present our concluding remarks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>DH at CIDEHUS: dealing with sources from the 15th century to date</title>
      <p>
        The Centre for History, Culture and Societies (CIDEHUS) is an interdisciplinary
research centre of the University of E´vora. It develops research in the intersection
of heritage, language, history, social sciences and more. CIDEHUS has been
conducting research in the area of Digital Humanities, dealing with texts, 3d
models, georeference, maps, music, tourism experiences etc. In this paper we
focus on those research in the area of DH which are more closely related to text
collections. These collections under analysis cover a broad range period, as listed
below:
– From the 14ht to the 16th century, “Cortes Portuguesas” is a source under
analysis by Her mın´ ia Vilar.
– From the 16th and 17th centuries: (i) the letters collected in “Livro das
Monc¸˜oes” are studied in a project led by Ana Sofia Ribeiro [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]; (ii) Anot´nio
Vieira´s unfinished work (“Hisot´ria do Futuro”) is studied by Ana Paula
Banza [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
– From the 18th-century, we have: (i) the Parish Memories, with related projects
led by Fernanda Olival [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; (ii) the medical books and treatises of Curvo
Semedo, studied in a project led by Maria Finatto [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and (iii) the first
Portuguese nursery handbook ”Postilla Religiosa, e Arte de Enfermeiros”
(1741) studied by Filomena Gonc¸alves [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
– Addressing contemporary scientific literature, we have the research
conducted by Ivo Santos based on archaeology reports [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
The heterogeneity of these sources poses a series of challenges, from digitisation
up to natural language processing and the organisation of the acquired knowledge
in semantically structured databases. In the next, section we will discuss these
challenges.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Initial steps</title>
      <p>DH projects may benefit from current AI and NLP techniques for better
access to the sources, including digitisation, the addition of metadata, information
extraction, knowledge representation techniques, annotated corpora, creation of
data sets and linked data. NLP and AI methods, however, must be adapted to
diferent needs, and better user interfaces are required for the developed methods
to be used out of the context of programming frameworks. We start by presenting
the initial steps for DH projects, which are full of challenges themselves.</p>
      <sec id="sec-3-1">
        <title>Starting points: digitisation</title>
        <p>
          OCR technology is often applied for sources available in PDF files. In our current
projects, we have the Monsoon letter (“O livro das Monc¸˜oes”), the books and
treatises of Curvo Semedo, and some of the Archaeology reports at this stage.
OCR output quality varies a lot depending on the format and quality of the
input. It is a basic but not solved problem [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. It is often the case that OCR
outputs must pass through extra processing or manual correction to make the
source fit for the next steps.
        </p>
        <p>
          Besides the challenges posed by OCR quality, there are projects that need
digitisation from original manuscripts, which is yet a diferent problem.
Transcription tools such as Transkribus [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] help digitisation in these cases.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Processing digitised sources</title>
        <p>
          Once a source, or corpus of study, is already digitised, the organisation of its
metadata is of great relevance. The digital material must identify itself well,
that is, it should inform to which collection it belongs, what is the place of each
ifle, and so on. The metadata can also describe the document structure, when
required. It is also important to separate metadata from the actual data. For
transcribed material, for instance, one should identify headers, page numbering
or comments in an organised way. This is essential for further processing.
Metadata is also important to connect a source with other sources, and connect them
to the linked open data [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Beyond metadata organisation, there is the problem of normalisation. In
Digital Humanities it is often the case that the sources, being from a distant
time, present grammatical variations, both spelling and morpho-syntactic.
Understanding these diferences and being able to translate or associate ancient
writings to the current standards is an essential step towards other processing
levels [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Another way of mitigating the writing variants is to create language models
that include substantial corpora from other time periods, with their naturally
occurring variants or to add a final training phase (tuning) to adapt the model
to the variants [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>From this point, with digitised, normalised texts or with suitable language
models, we can make better use of current NLP tools that are already
developed, such as automatic translation, named entity recognition, event extraction,
correference resolution, question answering, and many others. However, such
developments require not only current NLP tools, but also closer interaction with
scholars, for the final adaptations for their needs and suitable interfaces.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Next (more advanced) steps</title>
      <p>The DH group at CIDEHUS has advanced in processing some of its collections,
using AI and natural language processing tools to create knowledge bases that
may be useful for other researchers. Further ahead we describe some of the
developed resources.</p>
      <sec id="sec-4-1">
        <title>Resources under development</title>
        <p>
          The Parish Memories One example is the work done with the Parish
Memories. It is a rich collection, very well studied in Portugal. The digitised version of
microfilms from the originals are available at the Portuguese National Archive
(Arquivo Nacional da Torre do Tombo). There are also many printed books
reproducing parts of the material. A first digital version was made freely available
through CIDEHUS Digital Portal6. The application of NLP techniques is now
possible due to past projects that worked on the manual transcription of the
original manuscripts. From this collection, a named entity dataset was
automatically built using previously developed systems for named entity recognition [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
based on machine learning techniques and language models. The initial entity
categories considered were the usual, person, location, and organisation. The
named entities extracted from the Parish Memories constituted a dataset that
was made available to the community. The digitised texts are provided with
their respective lists of named entities [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. As it was based on a completely
automated process, a curation phase for this data is still foreseen in the future
developments of this source.
        </p>
        <p>
          The Archaeology Corpus A Portuguese corpus is being built in the domain
of archaeology. The main sources of information considered are reports of
archaeological works, academic theses and specialised bibliography. Among these
sources, the “Portal do Arquoe´logo” (Archaeological Data Management Tool)
stands out for housing structured information. In this portal, the information
is distributed in three main groups: sites, works and projects. A project can be
made up of several archaeological works, and a work can refer to one or more
sites. As of June 2021, this corpus had 36275 records of archaeological sites and
39947 works. The goal is to extract and organise information using NLP
methods. Some analysis made on this corpus allows identifying periods in time with
more intensive archaeological work and many other aspects of archaeology in
Portugal. This initial corpus and the analyses made upon it is described in more
detail in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Further research goals</title>
        <p>
          The various projects of the group have diferent processing needs. For the work
being developed with the writings of Padre Anot´nio Vieira, Hisot´ria do Futuro
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], for instance, the assessment of semantic similarity [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is quite significant.
Although semantic similarity is a well-developed problem, finding a suitable,
applicable tool and creating helpful interfaces for the source analysis are some
of the challenges.
        </p>
        <p>
          Event Extraction is another interesting NLP task, with resources developed
for Portuguese [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The challenge, in this case, is to adapt these tools to the
6 http://www.cidehusdigital.uevora.pt
language of the period, an efort under exploration in the project related to the
Monsoon letters [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          Ontology linking is one of the goals of the Curvo Semedo project [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], where
health terminology can be found and mapped to existing ontologies [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. These
mappings are a way of enriching the resource and establishing what areas of
medicine, anatomy and which kind of diseases and medications were known from
that time, according to that source. Currently, the vocabulary - terminologies
and related expressions - used within Semedo’s manuals are contrasted with
those used along with the nursery manual printed in 1741 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This contrast
may be useful to identify diferent perspectives, concepts and behaviours of the
various health care professionals and practitioners at that time.
        </p>
        <p>
          Regarding the source of “Cortes Portuguesas”, one of the purposes is to
identify and analyse concepts occurring in the speeches of the Courts and the
argumentation used in the royal legislation. The quest is to grasp the origins
of an ethics of behaviour of royal oficers, to the establishment of control and
accountability systems and the control of corruption in low-middle-age societies.
On top of the other mentioned task, argumentation mining [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] might be the kind
of processing that may accelerate the analysis made by the scholar in this kind
of project.
        </p>
        <p>In fact, all these diferent tasks are complementary, and one ambitious goal
would be a unified environment combining them to study these sources and
others. On the other hand, there are many other possibilities, as dealing with
textual information towards meaning is an endless efort.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>FAIR Data and Ontology Development</title>
        <p>
          Naturally, with the evolution of the studies mentioned here, data will be
produced, as was the case for the named entities in the Parish Memories [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The
eforts of sharing data has a long way to go in terms of standardisation. It is
very important to make data compliant to the FAIR principles (Findability,
Accessibility, Interoperability, and Reusability) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Also, an essential step
towards improving FAIRness of data is using vocabularies and ontologies for data
and metadata representation [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. While diverse vocabularies are available for
metadata representation (as DCAT, PROV-O, etc.), extending existing domain
ontologies or developing new ones to better fit the specificities of each corpus or
source is required.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>CIDEHUS has a rich portfolio of projects with a great potential to be explored
through the advancements of the area of Digital Humanities. The ideal is that
all collections in this portfolio follow the same digitisation standards, such as
those proposed by TEI (the Text Encoding Initiative) and the FAIR principles
of Open Data. AI techniques are needed for treating the problems in all phases,
since those initial steps related to OCR quality, manuscripts transcription, the
addition of metadata, and normalisation as well as the processing phases required
for translation, information retrieval and extraction, creation of knowledge basis,
and their association to ontologies. It is very crucial that a human centred AI
perspective must be taken into account to provide suitable user interfaces for
accessing the sources and the extracted data. All these projects and collections,
even with diferent goals, may gain by coexisting in the same environment and
thus sharing the use of the same powerful tools to deal with texts and their
encoded knowledge.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arevalo</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonteyn</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Macberth: Development and evaluation of a historically pre-trained language model for english (1450-1950)</article-title>
          .
          <source>In: ICON Workshop on Natural Language Processing for Digital Humanities</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Banza</surname>
            ,
            <given-names>A.P.:</given-names>
          </string-name>
          <article-title>A edcia˜¸o digital da hisotr´ia do futuro, de anotn´io vieira: arquivo e ferramentas</article-title>
          . In: Actas da Jornada de Humanidades Digitais do CIDEHUS (to appear) (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>H.F.</given-names>
          </string-name>
          , Gonca¸lves,
          <string-name>
            <given-names>M.F.</given-names>
            ,
            <surname>Quaresma</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Linguistic and orthographical classic portuguese variants challenges for NLP</article-title>
          .
          <source>In: Proceedings of the 14th International Conference on the Computational Processing of Portuguese</source>
          . pp.
          <fpage>43</fpage>
          -
          <lpage>48</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Finatto</surname>
            ,
            <given-names>M.J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Gonca¸lves, M.F.:
          <article-title>Portuguese corpora of the 18th century: old medicine texts for teaching and research activities</article-title>
          .
          <source>In: Proceedings of the conference on Language Technologies Digital Humanities</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Gonca¸lves, M.F.:
          <article-title>A arte de enfermeiros (1741): aspetos doelx´ico relativo a doenca¸s e remed´ios no esc´ulo xviii</article-title>
          . Revista Panace@
          <volume>21</volume>
          (
          <issue>52</issue>
          ) (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guizzardi</surname>
          </string-name>
          , G.:
          <article-title>Ontology, Ontologies and the “I” of FAIR</article-title>
          .
          <source>Data Int</source>
          .
          <volume>2</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>181</fpage>
          -
          <lpage>191</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kahle</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colutto</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hackl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Mu¨hlberger, G.:
          <article-title>Transkribus-a service platform for transcription, recognition and retrieval of historical documents</article-title>
          .
          <source>In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)</source>
          .
          <source>vol. 4</source>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          . IEEE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Argument mining: a survey</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>45</volume>
          (
          <issue>4</issue>
          ),
          <fpage>765</fpage>
          -
          <lpage>818</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lopez-Gazpio</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maritxalar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uria</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
          </string-name>
          , E.:
          <article-title>Interpretable semantic textual similarity: Finding and explaining diferences between sentences</article-title>
          .
          <source>Knowledge-Based Systems 119</source>
          ,
          <fpage>186</fpage>
          -
          <lpage>199</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeeven</surname>
          </string-name>
          , V.:
          <article-title>A brief overview of metadata formats</article-title>
          .
          <source>DESIDOC Journal of Library &amp; Information Technology</source>
          <volume>24</volume>
          (
          <issue>4</issue>
          ) (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Olival</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
          </string-name>
          , R.:
          <article-title>As memo´rias paroquiais: do manuscrito ao digital</article-title>
          . In: Actas da Jornada de Humanidades Digitais do CIDEHUS (to appear) (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finatto</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Information extraction from historical texts: a case study</article-title>
          .
          <source>In: Workshop on Digital Humanities and Natural Language Processing, collocated with PROPOR)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ribeiro</surname>
            ,
            <given-names>A.S.:</given-names>
          </string-name>
          <article-title>O projecto monsoon: perspectivas digitais da ´India portuguesa</article-title>
          . In: Actas da Jornada de Humanidades Digitais do CIDEHUS (to appear) (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sacramento</surname>
            ,
            <given-names>A.d.S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Joint event extraction with contextualized word embeddings for the portuguese language</article-title>
          .
          <source>In: Brazilian Conference on Intelligent Systems</source>
          . pp.
          <fpage>496</fpage>
          -
          <lpage>510</lpage>
          . Springer (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olival</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeira</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Excavating the data pit: the portuguese parish memories (1758) as a gold standard</article-title>
          .
          <source>In: Workshop on Digital Humanities and Natural Language Processing, collocated with PROPOR)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
          </string-name>
          , R.:
          <article-title>Semantic information extraction in archaeology: Challenges in the construction of a portuguese corpus of megalithism</article-title>
          .
          <source>In: 15th International Conference on Metadata and Semantics Research</source>
          , Springer Communications in
          <source>Computer and Information Science Series</source>
          , Vol.
          <volume>1537</volume>
          . (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>dos Santos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collonini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Assessing the impact of contextual embeddings for portuguese named entity recognition</article-title>
          .
          <source>In: 2019 8th Brazilian Conference on Intelligent Systems (BRACIS)</source>
          . pp.
          <fpage>437</fpage>
          -
          <lpage>442</lpage>
          . IEEE (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Schriml</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arze</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nadendla</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Y.W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazaitis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Felix</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kibbe</surname>
            ,
            <given-names>W.A.</given-names>
          </string-name>
          :
          <article-title>Disease ontology: a backbone for disease semantic integration</article-title>
          .
          <source>Nucleic acids research</source>
          <volume>40</volume>
          (
          <issue>D1</issue>
          ),
          <fpage>D940</fpage>
          -
          <lpage>D946</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. van Strien,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Beelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Ardanuy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.C.</given-names>
            ,
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>McGillivray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Colavizza</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          :
          <article-title>Assessing the impact of ocr quality on downstream nlp tasks</article-title>
          .
          <source>In: ICAART 2020 - Proceedings of the 12th International Conference on Agents and Artificial Intelligence</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olival</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeira</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Enriching the 1758 portuguese parish memories (alentejo) with named entities</article-title>
          .
          <source>Journal of Open Humanities Data</source>
          <volume>7</volume>
          ,
          <issue>20</issue>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aalbersberg</surname>
          </string-name>
          , e.:
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          .
          <source>Scientific data 3(1)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>