<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IberLEF 2019 Portuguese Named Entity Recognition and Relation Extraction Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Collovini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>quim S</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>o Consoli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ulo Qu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rlo Souz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>iro Cl</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Federal University of Bahia (UFBA)</institution>
          ,
          <addr-line>Bahia</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ponti cal Catholic University of Rio Grande do Sul (PUCRS)</institution>
          ,
          <addr-line>Porto Alegre</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Evora</institution>
          ,
          <addr-line>Evora</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>390</fpage>
      <lpage>410</lpage>
      <abstract>
        <p>This work provides and overview of the Named Entity Recognition (NER) and Relation Extraction (RE) in Portuguese shared tasks in IberLEF 2019, its participant systems and results. These tasks sought to challenge Portuguese NER and RE systems by o ering ve new datasets for testing, two for RE and three for NER. Of these new datasets, two in particular o ered a novel challenge for NER: the rst composed of o cial police documents and the second composed of hospital's clinical notes. These cannot be published due to their sensitive nature, but the other three have been released for public use.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing NLP Named Entity Recognition NER Relation Extraction RE Portuguese Language IberLEF 2019</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Information Extraction (IE), a task in the eld of Natural Language Processing
(NLP), consists of obtaining relevant information from texts and representing
it in a structured way. This representation can, for example, take the form of
a list, table or graph, all of which can easily be used for storage, indexing, and
query processing by standard database management systems.</p>
      <p>Some examples of IE applications are Named Entity Recognition (NER)
and Relation Extraction (RE). NER aims to identify and classify a given text's
Named Entities (NEs) and their categories (Organization, Place, Person, among</p>
      <p>
        Collovini et al.
others) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]; RE aims to identify relations that occur between said entities [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
For instance, the \a liation" relation between Person-type and
Organizationtype NEs is one of those relations sought by RE systems. According to [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], the
identi cation of NEs is the rst step towards the semantic analysis of a text,
being crucial to relation extraction systems. In the literature, we nd several
works that consider NER to be an integral part of RE systems [
        <xref ref-type="bibr" rid="ref12 ref16 ref2">2, 12, 16</xref>
        ], given
that NER can help with the identi cation of NEs that may possess some kind
of relation between themselves.
      </p>
      <p>In order to explore and contribute to the state of RE and NER for Portuguese,
we proposed three workshop tasks that were part of the IberLEF 2019 (Iberian
Languages Evaluation Forum). The rst involved annotating Portuguese texts
with NER systems, while the second and third focused on the extraction of open
relations in Portuguese texts. Participants were free to apply for any combination
of activities, be it only one, two or all of them.</p>
      <p>This paper is organized as follows: Section 2 describes Task 1 and its
participant's results; Section 3 describes Task 2 and its participant's results; Section
4 describes Task 3 and its participant's results; and Section 5 presents the
concluding remarks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task 1: Named Entity Recognition</title>
      <p>The rst task we proposed was NER. As explained previously, this is the task
of identifying NEs within a given text and classifying them into one of several
relevant categories or to a default category known as Miscellaneous. Our
objective with this task was to evaluate the participant system's performance for
three test datasets composed of texts in di erent genres. The rst test dataset,
composed of assorted news stories, memorandums, e-mails, interviews and
magazine articles, was annotated for the following categories: Person (tagged PER),
Place (tagged PLC), Organization (tagged ORG), Value (tagged VAL) and Time
(tagged TME). The second and third datasets, the former composed of a
hospital's clinical notes on patients and the latter of police documents, were only
annotated for the Person (tagged PER) category.</p>
      <p>
        The task consisted of the following steps:
{ Development Phase: For this phase, participants were required to develop
a computational approach to NER. This approach, hereby referred to as
system, must be capable of solving NER tasks for the proposed textual
genres. Participants were free to develop their solution however they saw t,
so long as they complied with the requirements described in the training and
test phases;
{ Training Phase: The objective of this phase was that participants choose
their training datasets. Participants were free to choose any datasets they so
desired for training their systems to solve the task in the proposed textual
genres;
{ Test Phase: In this phase the coordinators evaluated the capacity and
reproductibility of the submitted systems:
Reproduction Stage: For this stage, the participants' proposed systems
were reproduced and executed by the coordinators. Should the
coordinators have been unable to reproduce or execute a system, said system
would have been disquali ed;
Evaluation Stage: The proposed corpora were inputted into all systems
that passed the Reproduction Stage. The expected output was to be in
the \.txt" format, so that it could be evaluated via the CoNLL-2002
script[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
2.1
      </p>
      <sec id="sec-2-1">
        <title>Test data</title>
        <p>This task makes use of three datasets composed of texts in di erent textual
genres in order to evaluate how the submitted systems behave when exposed to
each set's particularities.</p>
        <p>
          General Dataset: this dataset was built from two existing corpora, SIGARRA
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and Second HAREM [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. SIGARRA was chosen since it is a more recently
published dataset, and was not used as trainning data by any of the participants.
We added the Second HAREM dataset only to complete with examples from the
Value category which was not given in SIGARRA.
        </p>
        <p>SIGARRA, a dataset of news stories collected by the SIGARRA system
created by the University of Porto, is composed of 1000 news stories taken from
17 Organic Units within the University. SIGARRA is annotated for the following
NE categories: Person, Place, Organization, Date, Time, Event, Organic Unit
and Course. However in our corpus we only considered the rst ve categories
(Person, Place, Organization, Date and Time). In regard to both Date and Time
categories we mapped them to a single category (Time), as in the HAREM golden
collection.</p>
        <p>Second HAREM, a Dataset that comprises 129 Brazilian and European
Portuguese texts with 7255 named entities manually annotated. Despite being
annotated for 10 categories, we only used sentences annotated for the Value category.</p>
        <p>In total, 5055 sentences made up of 179892 tokens were extracted.</p>
        <p>Clinical Dataset: clinical notes are textual data related by the hospital
workers (nurse technicians, nurses, medical doctors...) about each of the
hospital's patients, past and present. This kind of text contains names of patients,
doctors and residents, results of medical exams and other assorted medical
information. The Person category was manually annotated in a subset of the clinical
notes. The manual annotation was made by 4 annotators on which each one
annotated the clinical notes and after it was realized a discussion for the cases
in which we did not have a consensus. The tool used for this annotation was
WebAnno (https://webanno.github.io/webanno/), a web-based tool desined for
usage on annotation tasks, we chose this tool due to the fact that it has a feature
that allows direct export to the CoNLL-2002 format.</p>
        <p>Clinical notes present particular challenges when it comes to their textual
structure: words that should be separated by a space are not (for example
\AnaR1")and several medical abbreviations.The pre-processing before
annotation included only tokenization, so as to preserve the original formatting of this</p>
        <p>Collovini et al.
data. In cases such as the above example, we understand \AnaR1" to be a
Person, and we understand \####Paulo" to also be a Person.</p>
        <p>In total, from 50 notes, we have 9523 tokens, summing up a total of 77 named
entities of the Person category. As this date is of a sensitive nature, we cannot
distribute it.</p>
        <p>Police Dataset: a textual dataset from Brazil's Federal Police was
manually annotated for the Person category. The data is divided in ten Testimony
texts, ten Statement texts, and ten Interrogatory texts. These include names
of deputies, scriveners and witnesses. This corpus contains well-structured, as
well as grammatically correct texts, as they are all o cial documents. As with
the clinical notes, the only pre-processing technique used on the text prior to
annotation was tokenization.</p>
        <p>In total, from 30 texts, we had 1,388 sentences, 37,706 tokens and a total of
916 named entities of the Person category. As for the clinical notes, we cannot
distribute this dataset.</p>
        <p>Table 1 shows the number of sentences and tokens by dataset. The quantity
of named entities per category are shown in the Table 2.
Our evaluation process was divided in three stages, as shown in Figure 1 and
described below:
i The participant's system are executed using one of the three proposed datasets
as the input. Each system's expected output should have two columns: the
rst containing the dataset's tokens (one per line of the column) and the
second their predicted tags (as per the CoNLL-2002 format);
ii A third column, containing the expected tags, is then aligned with the other
two;
iii The le generated in ii is used as an input for the CoNLL-2002 evaluation
script, which calculates the nal metrics.</p>
        <p>In stage ii, the algorithm checks whether or not each of the output's tokens
is the same as the expected token. Should the tokens be di erent, the algorithm
returns the expected token and stops the alignment. This ensures that each
system's output preserves the original dataset's integrity, for better and more
accurate evaluation.</p>
        <p>That said, none of the ve systems evaluated output the expected sequence
of tokens. As already mentioned, the Clinical Dataset has repeated sequences
of the \#" character, and words joined by \ ". All of these particularities are
part of the text and of the language developed in the medical context. We then
identi ed the instances of sequence breaking, and found that the systems were
ignoring speci c tokens, or capturing only part of them. An example of a partially
captured token is in: \seg sex" (medical abbreviation indicating the passage of
time), where one system discarded everything that came after the \ ".</p>
        <p>Having alerted the participants of this, we asked them to resubmit their
systems after altering them in such a way that they preserved in its totality
the structure of the text. The results shown in the next section come from this
second submission of the systems.
We had ve participants in total, one of whom submitted two systems.
1. BiLSTM-CRF-ELMo
2. CRF-LG
3. NLPyPort
4. CVT
5. BiLSTM-CRF-FlairBBP
6. Linguakit</p>
        <p>
          The evaluated systems used di erent training sets. The datasets used in the
training process were: First HAREM[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], Second HAREM [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], MiniHAREM[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ],
WikiNER[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], Paramopama[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], LeNER-Br[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], FreeLing Corpus[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and
Datalawyer. Table 3 shows which datasets each participant used for training.
Linguakit is not mentioned in Table 3 since it is based on rules and heuristics.
        </p>
        <p>According to the results, we noticed that no single system had better
Fmeasures for all datasets. The system with the best F-measure for the Police
Dataset and Clinical Dataset was System 4. We also noted based on Figure 2 (a)
and (b) that the best three systems for these two datasets used approaches based
on Neural Network and Language Models. The results for the Police Dataset in
particular showed a remarkable di erence between approaches that were based
on Neural Networks and those that were not.</p>
        <p>System 5 achieved the best F-measure for the General Dataset. However,
Figure 3 (a) shows that the results for the General Dataset had the least
Fmeasure variance out of all test datasets. This can be explained by the fact
that the General Dataset is structurally similar to the datasets used to train
the systems. Figure 3 (b) shows the F-measure for the Organization category,
where the highest score, achieved by System 5, was of over 45%. For the Person
category, Figure 3 (c), System 1 had the best performance with an F-measure
of over 80%. For the Place category, Figure 3 (d), the best performance was
achieved by System 5 with an F-measure of over 58%. Figure 3 (e) shows results
for the category, where the highest metric was the one achieved by System 5. For
the Value category, Figure 3 (f), the best F-measure was achieved by System 4.
Overall, the systems with higher F-measures used approaches based on Neural
Networks. However, still on the General Dataset, the systems based on rules
showed competitive results for the Organization category as seen in Figure 3
(b). Detailed results are presented in Appendix 1.
(a) Police Dataset - Person (b) Clinical Dataset - Person
Fig. 2: Systems F-measure for the Police and Clinical Dataset
(a) Overall</p>
        <p>(b) Organization
(c) Person
(d) Place
(e) Time (f) Value</p>
        <p>Fig. 3: Systems F-measure for General Dataset</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Task 2: Relation Extraction for Named Entities</title>
      <p>
        The task of open relation extraction (RE) from texts faces many challenges,
as it requires large amounts of linguistic knowledge and sophistication in the
language processing techniques employed to solve it. We proposed a RE task
that included the automatic extraction of a relation descriptor expressing any
type of relation between a pair of NEs of the Person, Place and Organization
categories, in Portuguese texts. The relation descriptor is de ned as the text
chunk that describes the explicit relation occurring between these entities in a
sentence [
        <xref ref-type="bibr" rid="ref1 ref7">7, 1</xref>
        ].
      </p>
      <p>For example, we have the relation descriptor \diretor de" (director of) that
occurs between the NEs \Ronaldo Lemos" (PER) and \Creative Commons" (ORG) in
the sentence below:
\No proximo Sabado, Ronaldo Lemos, diretor da Creative Commons, ira
participar de um debate [...]"
Next Saturday, Ronaldo Lemos, director of Creative Commons, will participate
in a debate [...]</p>
      <p>The relation descriptor identi ed in the sentence is represented as a triple:
(Ronaldo Lemos, diretor de, Creative Commons).</p>
      <sec id="sec-3-1">
        <title>This RE task consisted of the following steps:</title>
        <p>{ Systems Development Phase: In this phase, the coordinators made a small
annotated dataset available for the participants' use in developing their RE
systems;
{ Test Phase: The test phase included two options for participants:
Test 1: For this test, participants had to extract relation descriptors
between NE pairs (all of which belonging to one of the following categories:
PER, PLC or ORG) from data provided by the coordinators. This data
was already annotated with NE information when provided, and as such
did not need the application of a NER system by participants;
Test 2: For this test, the data provided was not annotated with NE
information. As such, the objective of the task was to extract and
classify (using only the following categories: PER, PLC or ORG) the NEs
from the test sentences, and then they had also to extract the relation
descriptors between pairs of the recognized NEs;
{ Evaluation Phase: In this phase the participants sent their results from the
Test Phase. Results were submited for Test 1 only. Afterwards, the analyzed
results were sent back to the participants. The metrics used for evaluation
phase were Precision, Recall and F-measure.
3.1</p>
        <sec id="sec-3-1-1">
          <title>Resources</title>
          <p>
            For the purpose of accomplishing this task, the coordinators provided subsets
of Portuguese texts annotated with RE information as described in [
            <xref ref-type="bibr" rid="ref1 ref7">7, 1</xref>
            ]. The
authors presented a subset of the Golden Collections from the two HAREM
conferences [
            <xref ref-type="bibr" rid="ref22 ref6">22, 6</xref>
            ], to which they added manual annotation of RE information
expressed between NEs belonging to certain categories (ORG, PER and PLC).
This resulted in a total of 516 RE annotated text instances, which were added to
the 236 RE annotated texts from the Summ-it++ corpus [
            <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
            ], for a total of 752
instances (positive and negative) of RE annotated texts. In Table 4 examples of
positive relation instances are shown.
          </p>
          <p>
            The organizers selected 3 positive example subsets from the RE dataset for
each step of Task 2. Table 5 shows the data distributed by NE pairs, for a total of
390 examples. For the Systems Development Phase, 90 positive examples
annotated with relation descriptors (seeds) were made available for the participants.
The available data for Test Phase (Test 1 and 2) was not annotated with the
relation descriptors.
We considered two scores for Task 2 evaluation metrics: a completely correct
relations score and a partially correct relations score. These were adapted from
First HAREM's evaluation metrics for named entities [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ].
          </p>
          <p>{ Completely Correct Relations (CCR): when all terms that make up the
relation descriptors in the key are equal to the relations descriptors of the
system's output. The score for each completely correct relation is 1, which
represents a full hit (see Appendix 2);
{ Partially Correct Relations (PCR): when at least one of the terms in the
relation descriptors of the systems output corresponds to a term in the relarion
descriptors of the key. The score for a partially correct relation is calculated
as shown in the Appendix 2.
3.3</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Results</title>
          <p>For Task 2, the FactPyPort system participated on Test 1. Table 6 shows the
results. Of the 149 examples in Test 1, FactPyPort system identi ed 144
examples and from those, 106 were Completely Correct Relations (CCR). There was
no evaluation for Test 2 since there were no registered participants.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Task 3: General Open Relation Extraction</title>
      <p>The task of general open relation extraction aims to identify structured
representations of the information contained in unstructured sources, such as textual
documents. This task faces many challenges, considering the generality of the
problem, as well as the required linguistic knowledge to automatically perform
such task.</p>
      <p>
        This task involves the automatic extraction of any relation descriptor
expressing any type of semantic relation between a pair of entities or concepts
mentioned in Portuguese sentences. As before, a relation descriptor is de ned as
the text chunks that describe the explicit semantic relation, occurring between
these entities in a sentence [
        <xref ref-type="bibr" rid="ref1 ref7">7, 1</xref>
        ]. This task is a generalization of Task 2 by
removing the requirement of the entities being named in the text, meaning that
any relation between two Noun Phrases (NP) is to be considered.
      </p>
      <p>For example, the relation descriptor \diretor de" (director of) that occurs
between noun phrases \Ronaldo Lemos" and \uma organizaca~o sem ns
lucrativos" (a non-pro t organization) in the sentence below:
\No proximo Sabado, Ronaldo Lemos, diretor de uma organizaca~o sem ns
lucrativos, ira participar de um debate [...]"
(Next Saturday, Ronaldo Lemos, director of a non-pro t organization, will
participate in a debate [...])</p>
      <p>The relation descriptor identi ed in the sentence is represented as a triple:
(Ronaldo Lemos, diretor de, uma organizaca~o sem ns lucrativos).</p>
      <p>The idea of this proposal is to request the participation of systems/solutions
for the task of RE between NPs in Portuguese texts. The systems' results were
evaluated using a set of annotated test data provided by the coordinators.</p>
      <p>
        For the purpose of accomplishing this task, the coordinators will provide two
sets of Portuguese texts: the rst one, composing Test 1, is annotated with NPs
information aims for the systems to identify the the relation descriptors, similar
to what is provided in Task 2; the second, composing Test 2, is presented without
any annotation, aiming to evaluate the system's capacity to identify relations
and its arguments in texts. The authors present a set of 25 sentences annotated
with NPs and RE information presented in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>This RE task consists of the following steps:
{ Systems Development Phase: In this phase, the coordinators will make a
small annotated dataset (seeds) available for the participants' use in
developing their RE systems;
{ Test Phase: The test phase includes two options for participants:
Test 1: For this test, participants must extract relation descriptors
between NP pairs from data provided by the coordinators. This data will
already be annotated with NP information when provided, and as such
will not necessitate the application of a NER system by participants;
Test 2: For this test, the data provided will not be annotated with NP
information. As such, the goal of the task will be to extract and classify
the NPs from the test sentences, and then they must also extract the
relation descriptors between pairs of the recognized NPs;
{ Evaluation Phase: In this phase the participants will send their results from
the Test Phase. They may submit results from Test 1, Test 2 or both to
evaluation by the coordinators. Afterwards, the analyzed results will be sent back
to the participants. The metrics used for evaluation phase will be Precision,
Recall and F-measure.
4.1</p>
      <sec id="sec-4-1">
        <title>Resources</title>
        <p>
          Task 3 was evaluated using the Portuguese Open IE corpus proposed by Glauber
et al [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This corpus is composed of 442 relation triples extracted from 25
setences obtained from sources such as the Portuguese section of Wikipedia,
the CETENFolha corpus, movie reviews from Adoro Cinema and the Europarl
corpus v7.0 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The relations were manually extracted by 5 human annotators
in two rounds.
        </p>
        <p>Collovini et al.</p>
        <p>
          Since the annotation of all possible extractions from a sentence is a hilghly
subjective and, therefore, di cult task to perform systematically, the authors
imposed some restrictions on the form of extractions that may appear in the
corpus [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]:
C1 When there is a word chain through a preposition forming a noun phrase
(NP), we rst select the fragment that is composed of a noun, proper noun
orpronoun, its respective determinants and direct modi ers (articles,
numerals, adjectives and some pronouns).
        </p>
        <p>C2 When a sentence has a transitive verb with preposition (indirect mode),the
preposition will be attached to the realtion descriptor.</p>
        <p>C3 We call minimal fact (minimal) any extracted fact having as arguments NPs
composed only of a noun, proper noun or pronoun with its determinants and
direct modi ers.</p>
        <p>C4 If there are fragments with a noun function (preposition chain) that modify
arguments in minimal facts, new facts (not minimal) must be added bythe
annotator (see C3 second triple example).</p>
        <p>C5 A fact must only be extracted from a sentence if it contains a propernoun
or pronoun in, at least, one of the arguments.</p>
        <p>C6 For n{ary facts, if there is no signi cant loss of information, the annotator
must extract multiple binary facts.</p>
        <p>C7 The coordinating conjunctions with additive function can generate multiple
extracted facts and also a fact with the coordinated conjunction.
C8 Relations and arguments in the extracted facts must agree in number.</p>
        <p>The corpus was divided into three fragments containing randomly selected
relations: a training fragment (train dataset) composed of 90 realtion triples that
were made available to the participants in the Systems Development Phase; and
two test fragments each composed of 176 relation triples used to evaluate the
systems submissions for Test 1 (test 1 dataset) and Test 2 (test 2 dataset). The
three fragments are pair-wise disjoint and each fragment contains extractions
from all the 25 sentences that compose the original annotatted corpus.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation</title>
        <p>General Open Relation Extraction is concerned with indentifying all possible
information contained within a sentence, without any a priori restriction on the
kind of relation to be extracted. Since the train and test datasets were composed
of relations extracted from the same 25 sentences, the evaluation for Task 3 needs
to consider the possibility of a system correctly identifying a relation and this
relation not being in the test dataset being considered, as such we proceed with
di erent evaluation scenarios.</p>
        <p>For Test 1, since the relations to be extracted were pre-de ned a priori by
setting the arguments, the participating systems needed only to identify the
relation descriptor. As such, we performed one evaluation scenario using the
test 1 dataset as golden resource and comparing it to the participants' systems
outputs.</p>
        <p>For Test 2, however, due to the fact that the training and test datasets
were constructed by taking examples of extractions from the same set of 25
sentences, we performed the evaluations of the participanting systems in four
di erent evaluation scenarios. The reason for this is that in the simplest scenario
(Scenario 1) in which only the test 2 dataset is used to compute the evaluation
metrics, the systems may su er for identifying correct relations which are not in
this dataset, i.e. the correct relation was selected for the train or test 1 datasets
in the corpus fragmentation. As such, we consider the following scenarios:
Scenario 1) Test 2 dataset is the golden resource. Since the system may
perform correct extractions that do not appear in test 2 dataset, e.g. relations
contained in test 1 or train dataset, the evaluation of the system's precision
may be a ected.</p>
        <p>Scenario 2) Test 2 dataset contains the target relations to be identi ed, but
we matched the extractions performed by the participant systems against
the relation in test 2 and train datasets. The metrics are then computed
considering only those relations that were matched with some relation in the
test 2 dataset, since the relations in the train dataset were known a priori
by the systems and cannot be used in the evaluation. In this evaluation,
the systems may su er in precision for not considering those relations in the
Test 1 dataset, but also in recall due to the fact that some relations that
could be partially matched with relations in the test 2 dataset may have
been matched with relations in the trainning dataset.</p>
        <p>Scenario 3) in this scenario, we consider as target relations, those contained
in Test 1 and Test 2 datasets and matched the extractions performed by
the participant systems against them, as well the relations contained in the
trainning dataset. The metrics are then computed considering only those
relations that were matched with some target relation, disregarding those
that were matched with relations in the trainning dataset. In this evaluation,
the systems may su er in recall due to the fact that some relations that could
be partially matched with relations in the test 1 or test 2 dataset may have
been matched with relations in the trainning dataset.</p>
        <p>Scenario 4) in this scenario we consider the union of all three datasets as
golden resource and compute the evaluation metrics for each system. In this
evaluation, the systems may have gained in precision and recall, since the
realtions in the train dataset have been provided in advance for the systems
to train over.</p>
        <p>
          As for Task 2, we adapted the First HAREM's evaluation metrics for named
entity recognition [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] to the task of relation extraction considering both a
completely correct relations score and a partially correct relations score (see
Appendix 2). We also followed the same matching strategy as used in Task 2.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Results</title>
        <p>
          For Test 1, we had two groups participating in the task: group 1 submitted two
systems - DEPENDENTIE [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and DPTOIE; group 2 submitted three systems
- ICEIS [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], INFERPORTIE [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] and PRAGMATICOIE [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
Collovini et al.
        </p>
        <p>Considering only the exact matches, the system DPTOIE had the best result
with Fexact of 3.4% and the systems ICEIS and DEPENDENTIE achieved 11%
of Fexact, while the systems INFERPORTIE and PRAGMATICOIE had
negligible results. When considering partial matches, DPTOIE had the best results,
achieving a Fpartial score of 4.3%.</p>
        <p>
          For Test 2, we had three groups participating in the task: the two groups that
participated in Test 1 and a third group which submitted the system Linguakit 2,
an adpated version of the relation extraction module of Linguakit [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] described
in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] .
        </p>
        <p>In Scenario 1, considering only the exact matches, the system Linguakit 2
had the best results with Fexact of 8,8%. When considering partial matches,
DPTOIE had the best results, achieving a Fpartial score of 28.3%.</p>
        <p>In Scenario 2, considering only the exact matches, the system Linguakit 2
had the best results with Fexact of 9.35%. When considering partial matches,
DPTOIE had the best results, achieving a Fpartial score of 23.25%. Notice that,
as conjuctured, the systems' precisions were positivelly impacted in the
evaluation Scenario 2, but due to the fact that some relations have been partially
matched with relations in the train dataset, the Rpartial has been negativelly
impacted in this scenario.</p>
        <p>In Scenario 3, considering only the exact matches, the system Linguakit 2
had the best results with Fexact of 7.56%. When considering partial matches,
DPTOIE had the best results, achieving a Fpartial score of 18+81%. Notice that
since the number of target relations has increased greatly from the previous
scenarios to Scenario 3, we can percieve a decrease in the systems' Recall, associated
with an increase in their Precision.</p>
        <p>In Scenario 4, considering only the exact matches, the system Linguakit 2 had
the best results with Fexact of 7.71%. When considering partial matches,
DPTOIE had the best results, achieving a Fpartial score of 20.21%. Given that no
extraction of the system is excluded from evaluation, it is noticible that the
systems' recall scores have improved, when compared to those achieved in Scenario
3, specially considering partial matches. Notice, however, that the performance
of some systems, such as DEPENDENTIE and ICEIS, have decreased, due to
the fact the the number of target relations to be extracted have increased -
indicating that these systems have partially macthed some of the relations in the
train dataset, thus performing decreasing the overall performance of the systems
when considering those relations in the evaluation.</p>
        <p>Despite the fact that we performed 4 evaluation scenarios, the overall
evaluation of the systems have remained consistent across the di erent scenarios.
This indicates that our evaluation results are robust - considering the
particularities of the dataset considered in this task. Overall, the systems DPTOIE
and Linguakit 2 have performed best in all evaluation scenarios, with Linguakit
2 dominating the exact match evaluations and DPTOIE the partial matches
evaluations. These facts indicate that, while both systems were able to extract
a great deal of the manually identi ed relations in the corpus, Linguakit 2 is
the most consistent in their extractions with the restrictions in the extractions
imposed by the dataset, while DPTOIE is capable of extracting a great number
of relations from the sentences.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>In this work, we presented three tasks involving annotating Portuguese texts
with NER systems and open relation extraction in Portuguese texts. As a result,
we had a total of eleven teams registered to participate on the proposed tasks.
Seven of them sent their results, while four dropped out. In the ende, a total of
thirteen submissions from 6 di erent institutions were evaluated, as presented
in Table 12.</p>
      <p>Task
Task 1
Task 2
Task 3
As a contribution of this work, we made available annotated datasets for RE
in Portuguese; we evaluated di erent systems/solutions for NER and RE; and
we had the oportunity to test the solutions/systems for NER on various
textual genres. The resources to reproduce this work are available in our GitHub
(https://github.com/jneto04/iberlef-2019). As future work we would like to
propose an evaluation of the systems where the same data sets are used for trainning
the systems.</p>
    </sec>
    <sec id="sec-6">
      <title>Appendix 1 - Detailed systems results</title>
      <p>System
BiLSTM-CRF-ELMo
Collovini et al.</p>
      <p>System
NLPyPort</p>
    </sec>
    <sec id="sec-7">
      <title>Appendix 2 - Metrics calculation details</title>
      <p>The evaluation metrics are: Precision, Recall and F-measure, where:
CR = correct relations
IR = identi ed relations</p>
      <p>TR = total relations</p>
      <p>Considering only the Completly Correct Relations (CCR):</p>
      <sec id="sec-7-1">
        <title>The score for a partially correct relation is calculated as</title>
        <p>P CR = 0:5</p>
        <p>#correct terms in the annotation
greatest value from terms in the key and the system's output</p>
      </sec>
      <sec id="sec-7-2">
        <title>Considering both CCR and PCR:</title>
        <p>CR
Pexact = IR</p>
        <p>CR</p>
        <p>Rexact = T R
Fexact =
(2</p>
        <p>Pexact</p>
        <p>Rexact)
(Pexact + Rexact)
Ppartial =
Rpartial =
(CR + P CR)</p>
        <p>IR
(CR + P CR)</p>
        <p>T R
Fpartial =
(2</p>
        <p>Ppartial</p>
        <p>Rpartial)
(Ppartial + Rpartial)
(1)
(2)
(3)
(4)
(5)
(6)
(7)</p>
        <p>To compute the partially correct score, we matched each of the systems
outputs R1 to a relation in the corpus (R2) that maximize the following matching
score, in with match(R1; R2) denotes the number of terms in common between
the two extractions and len(R) denotes the number of terms in the extraction:
score(R1; R2) = (2 match(R1; R2) (len(R1) + len(R2)))
jlen(R1) len(R2)j (8)</p>
        <p>Notice that the matching score minimizes mismatches between the relations
R1 R2, i.e. it is maximal and when the relations R1 ad R2 are an exact matches,
i.e. the same relation. When the match is only partial, the score privileges matchs
with the fewer number of mismatched terms. Notice that the term jlen(R1)
len(R2)j is used to guarantee that the relations are the closest possible, i.e. it
is used to rule out macth candidates with high number of matched tokens but
di erring too much from relation (R1).</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We thank the CNPQ, CAPES and FAPERGS for their nancial support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Collovini de Abreu,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Vieira</surname>
          </string-name>
          , R.: Relp:
          <article-title>Portuguese open relation extraction</article-title>
          .
          <source>Knowledge Organization</source>
          <volume>44</volume>
          (
          <issue>3</issue>
          ),
          <volume>163</volume>
          {
          <fpage>177</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agichtein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gravano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Snowball: Extracting relations from large plain-text collections</article-title>
          .
          <source>In: 5th ACM International Conference on Digital Libraries</source>
          . pp.
          <volume>85</volume>
          {
          <issue>94</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Antonitsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Figueira</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amaral</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Summ-it++: an enriched version of the summ-it corpus</article-title>
          .
          <source>In: of the Language Resources and Evaluation Conference (LREC)</source>
          . pp.
          <year>2047</year>
          {
          <year>2051</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. de Araujo,
          <string-name>
            <given-names>P.H.L.</given-names>
            ,
            <surname>de Campos</surname>
          </string-name>
          , T.E.,
          <string-name>
            <surname>de</surname>
            <given-names>Oliveira</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.R.</given-names>
            , Stau er, M.,
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Bermejo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Lener-br: A dataset for named entity recognition in brazilian legal text</article-title>
          .
          <source>In: International Conference on Computational Processing of the Portuguese Language</source>
          . pp.
          <volume>313</volume>
          {
          <fpage>323</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Carreras</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chao</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Padro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Padro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Freeling: An open-source suite of language analyzers</article-title>
          .
          <source>In: LREC</source>
          . pp.
          <volume>239</volume>
          {
          <issue>242</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Carvalho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>H.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Segundo harem: Modelo geral, novidades e avaliaca~o</article-title>
          . In: Mota,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <article-title>Desa os na avaliaca~o conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM (</article-title>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Machado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
          </string-name>
          , R.:
          <article-title>Extracting and structuring open relations from portuguese text</article-title>
          .
          <source>In: Computational Processing of the Portuguese Language - 12th International Conference, PROPOR 2016. Lecture Notes in Computer Science</source>
          , vol.
          <volume>9727</volume>
          , pp.
          <volume>153</volume>
          {
          <fpage>164</fpage>
          . Springer, Tomar, Portugal (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>dos Santos</surname>
            ,
            <given-names>H.D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
          </string-name>
          , R.:
          <article-title>Annotating relations between named entities with crowdsourcing</article-title>
          .
          <source>In: Natural Language Processing and Information Systems - 23rd International Conference on Applications of Natural Language to Information Systems</source>
          ,
          <string-name>
            <surname>NLDB</surname>
          </string-name>
          <year>2018</year>
          . pp.
          <volume>290</volume>
          {
          <fpage>297</fpage>
          . Paris, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garc</surname>
            <given-names>a</given-names>
          </string-name>
          , M.:
          <article-title>Multilingual open information extraction</article-title>
          .
          <source>In: Proceedings of Progress in Arti cial Intelligence - 17th Portuguese Conference on Arti cial Intelligence</source>
          ,
          <string-name>
            <surname>EPIA</surname>
          </string-name>
          <year>2015</year>
          . pp.
          <volume>711</volume>
          {
          <fpage>722</fpage>
          .
          <string-name>
            <surname>Coimbra</surname>
          </string-name>
          ,
          <string-name>
            <surname>Portugal</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Linguakit: uma ferramenta multilingue para a analise lingu stica e a extraca~o de informaca~o</article-title>
          .
          <source>Linguamatica</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <volume>19</volume>
          {
          <fpage>28</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Glauber</surname>
            , R., de Oliveira,
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Challenges of an annotation task for open information extraction in portuguese</article-title>
          .
          <source>In: International Conference on Computational Processing of the Portuguese Language</source>
          . pp.
          <volume>66</volume>
          {
          <fpage>76</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hasegawa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sekine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grishman</surname>
          </string-name>
          , R.:
          <article-title>Discovering relations among named entities from large corpora</article-title>
          .
          <source>In: ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics</source>
          . p.
          <fpage>415</fpage>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Morristown, NJ, USA (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Junior</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macedo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bispo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbosa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Paramopama: a brazilian-portuguese corpus for named entity recognition</article-title>
          .
          <source>Encontro Nac. de Int. Arti cial e Computacional</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in arti cial intelligence, Pearson Education Ltd</article-title>
          ., London,
          <volume>2</volume>
          <fpage>edn</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Europarl: A parallel corpus for statistical machine translation</article-title>
          .
          <source>In: MT summit</source>
          . vol.
          <volume>5</volume>
          , pp.
          <volume>79</volume>
          {
          <issue>86</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bollegala</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsuo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishizuka</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Using graph based method to improve bootstrapping relation extraction</article-title>
          .
          <source>In: CICLing (2)</source>
          . pp.
          <volume>127</volume>
          {
          <issue>138</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranchhod</surname>
            , E.: Avaliaca~o de reconhecimento de entidades mencionadas: Princ pio de harem. In: Santos,
            <given-names>D</given-names>
          </string-name>
          . (ed.)
          <article-title>Avaliaca~o Conjunta: Um novo paradigma no processamento computacional da l ngua portuguesa</article-title>
          ,
          <source>chap. 14</source>
          , pp.
          <volume>161</volume>
          {
          <fpage>176</fpage>
          . IST Press (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Nothman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ringland</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curran</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Learning multilingual named entity recognition from wikipedia</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>194</volume>
          ,
          <fpage>151</fpage>
          {
          <fpage>175</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>de Oliveira</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Dependentie: an open information extraction system on portuguese by a dependence analysis</article-title>
          .
          <source>Encontro Nacional de Intelig^encia Arti cial e Computacional</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Pires</surname>
            ,
            <given-names>A.R.O.</given-names>
          </string-name>
          :
          <article-title>Named entity extraction from Portuguese web text</article-title>
          .
          <source>Master's thesis</source>
          , Faculdade de Engenharia da Universidade de Porto, Porto, Portugal (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sang</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erik</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the conll-2002 shared task: languageindependent named entity recognition</article-title>
          .
          <source>In: Proceedings of CoNLL-2002</source>
          . pp.
          <volume>155</volume>
          {
          <issue>158</issue>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardoso</surname>
          </string-name>
          , N.:
          <article-title>Breve introduca~o ao HAREM</article-title>
          ,
          <source>chap. 1</source>
          , pp.
          <volume>1</volume>
          {
          <fpage>16</fpage>
          .
          <string-name>
            <surname>Linguateca</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardoso</surname>
          </string-name>
          , N.:
          <article-title>Reconhecimento de entidades mencionadas em portugu^es: Documentaca~o e atas do HAREM, a primeira avaliac~ao conjunta na area</article-title>
          .
          <source>Linguateca</source>
          , Lisboa,
          <string-name>
            <surname>PT</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Pragmatic information extraction in brazilian portuguese documents</article-title>
          .
          <source>In: International Conference on Computational Processing of the Portuguese Language</source>
          . pp.
          <volume>46</volume>
          {
          <fpage>56</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Inferportoie: A portuguese open information extraction system with inferences</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>25</volume>
          (
          <issue>2</issue>
          ),
          <volume>287</volume>
          {
          <fpage>306</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          :
          <article-title>Inference approach to enhance a portuguese open information extraction</article-title>
          .
          <source>In: ICEIS (1)</source>
          . pp.
          <volume>442</volume>
          {
          <issue>451</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>