<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NER and Open Information Extraction for Portuguese</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pablo Gamallo</string-name>
          <email>pablo.gamallo@usc.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Garcia</string-name>
          <email>marcos.garcia.gonzalez@udc.gal</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patricia Martín-Rodilla</string-name>
          <email>patricia.martin.rodilla@usc.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS) University of Santiago de Compostela</institution>
          ,
          <addr-line>Galiza</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidade da Coruña, CITIC, Grupo LyS, Departamento de Letras</institution>
          ,
          <addr-line>Galiza</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>457</fpage>
      <lpage>467</lpage>
      <abstract>
        <p>This article describes the different systems we have developed to participate at the IberLEF 2019 Portuguese Named Entity Recognition and Relation Extraction Tasks (NerReIberLEF2019). Our objective is to compare rule-based and neural-based approaches. For this purpose, we applied our systems to two specific subtasks: Named Entity Recognition (Task 1) and General Open Information Extraction (Task 3) in Portuguese texts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The use of neural networks in tasks related to language technology and natural
language processing (NLP) is currently rising very rapidly to the point that non-neural
methods, including rule-based strategies, suffer at this momment a very large decline in
popularity. However, it is important to know in which specific NLP tasks neural-based
methods outperform other strategies and in which they do not. In a recent work [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
the authors assessed whether certain grammatical phenomena are more challenging for
neural networks to learn than others. It is also important to take into account which are
the characteristics of the target language, given that it is not the same to perform
experiments on a Germanic language such as English, or a Latin one such as Portuguese, or
even an Uralic language like Finnish with a very rich morphological base. In a recent
work by [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] focused on comparing parsing methods for Finnish using neural and
rulebased strategies, rule-based methods still outperform neural networks at a considerable
distance.
      </p>
      <p>
        In this article, we directly compare a neural-based tool for Named Entity
Recognition (NER) with a rule-base system using the same test dataset. In addition, we also
tested a rule-based strategy for Open Information Extraction (OIE), which is a complex
task traditionally addressed through unsupervised or rule-based approaches. To the best
of our knowledge, the recent work reported in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is the first time that the OIE task is
addressed using a neural approach with promissing results. However, that system is still
dependent on traditional strategies as the neural OIE model described in the paper was
trained with highly confident binary extractions bootstrapped from a state-of-the-art
OIE system [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        To evaluate the proposed systems and make the corresponding comparisons, we
participated at the IberLEF 2019 Portuguese Named Entity Recognition and Relation
Extraction Tasks (NerReIberLEF2019) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].1 The main goal is to allow participants to
apply their systems to several tasks, including NER and OIE in Portuguese texts. These
shared tasks are part of IberLEF 2019.
      </p>
      <p>The article is organized as follows. In Section 2, we describe the two systems
submitted for the NER task, while Section 3 describes the properties of our OIE approach.
Experiments and evaluation are reported in Section 4, and conclusions are addressed in
Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Named Entity Recognition for Portuguese</title>
      <p>Two very different NER strategies have been developed, rule-based and neural-based,
which are described in the following subsections.
2.1</p>
      <sec id="sec-2-1">
        <title>A Rule-Based Approach with External Resources</title>
        <p>
          We have adapted the NER module integrated in LinguaKit [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].2
The NER is constituted by two kinds of rules: first, identification heuristics to select
named entities from texts, and second, classification rules applied on previously
identified named entities in order to classify them as Location, Person, Organization, or
Miscellaneous. All rules require external resources to be applied.
        </p>
        <p>Identification heuristics make use of lexicographic resources such as a lexicon of
tokens and lemmas. Rules take into account letter capitalization of tokens, their
position in the sentence, and lexicon membership. Considering these elements, a basic
identification rule is the following example:</p>
        <p>If a token with initial uppercase letter starts a sentence and it is not part of the
regular lexicon, then it is a named entity candidate.</p>
        <p>Classification rules take identified named entities as input and assign them a
semantic class. Two external resources are required: both a list of gazetteers for
locations, persons and organizations, as well as a list of trigger words for the same three
classes. These resources were automatically generated from Wikipedia. Given an
identified named entity (NE), the classification algorithm works as follows. First, it verifies
if the NE is an unambiguous expression appearing in just one gazetteer. If this is the
case, it is assigned the class of the gazetteer. Second, if the NE appears in various
gazetteers (ambiguity) or it is unknown (missing in gazetteers), then a disambiguation
process is activated by searching relevant trigger words within its linguistic context. For
instance, “Santiago” is an ambiguous NE that can be either a person or a location. In
the following expression:
“Santiago é uma cidade galega” (Santiago is a Galician town)
It refers to a town and, therefore, should be classified as a location. In order to
disambiguate it, the common noun “cidade” (town) is a trigger word in the list of locations
that is used to select the appropriate class of the NE. If there are several trigger words
of different classes in the context of the target NE, we give preference to the closest
one. If there are two triggers at the same distance, the preference is given to the left
position. If the NE remains ambiguous as cannot be disambiguated by using contextual
triggers, then we check if its constituent expressions belong to the gazetteers or trigger
words and apply the previous rules. If no rule is applied, then the NE is classified as
miscellaneous.</p>
        <p>To adapt the NER module to the shared task requirements, we have added specific
rules for dates, currencies and measures. The new rules are applied on external lists of
currency names and measures as well as their usual abbreviations, e.g., cm for
centimeters, or min for minutes.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>A Neural-Based Approach with Cross-View Training</title>
        <p>
          Our neural network approach to NER in Portuguese was based in Cross-View
Training (CVT), which performs semi-supervised learning by combining supervised and
unsupervised methods [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]3. CVT improves the representation of a bidirectional long
short-term memory encoder (Bi-LSTM) by adding, together with the annotated data,
unlabeled representations to the input. In a NER scenario, CVT uses the unlabeled data
to learn the different contexts in which a named entity occurs, apart from different
properties (e.g., sequences of characters) of each entity type.
        </p>
        <p>On the one hand, a CVT model needs annotated data for the desired task to be
trained on. On the other hand, it also requires a large unlabeled corpus for the
unsupervised learning process.</p>
        <p>
          We obtained our supervised training data from the following resources:
– The corpus used to train the FreeLing NER modules for Portuguese [
          <xref ref-type="bibr" rid="ref11 ref15 ref8">8,15,11</xref>
          ].
        </p>
        <p>As it had been labeled only with ‘enamex’ entities (Person, Place, and
Organization), Value (VAL) and Time (TME) tags were automatically added with LinguaKit.</p>
        <p>
          Then, it was carried out a brief revision to correct the most frequent errors.
– LeNER [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This dataset, of Brazilian legal texts, was preprocessed by removing
all the NE tags different from the ones used in the shared task.
– HAREM [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. We used the NLTK-format corpus provided by [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]4.
        </p>
        <p>
          It is worth noting that annotation guidelines used in these three corpora differ,
namely the ones used in HAREM. For instance, the initial prepositions of
prepositional phrases containing temporal expressions are labeled as ‘TEMPO’ by HAREM
(“DuranteB-TEMPO osI-TEMPO desoladosI-TEMPO anosI-TEMPO ReaganI-TEMPO”), while the
other datasets consider they do not belong to the named entity [
          <xref ref-type="bibr" rid="ref1 ref8">8,1</xref>
          ]. In this respect,
we automatically removed some differences by harmonizing the HAREM annotation
with the one used in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Apart from that, there are several other differences concerning
the annotation of each resource, such as the representation of contractions, which some
datasets keep in a single token while others split them in two elements. Obviously,
training machine learning models in mixed resources from several datasets have an impact
on the training process.
        </p>
        <p>After the automatic processing of the corpora, they were merged into a single file,
which was randomly splited in two sets (for training, and development). The size (in
number of tokens) of the train set is of 898,157, while the dev has 50,120 tokens.</p>
        <p>
          As unlabeled corpora, we combined resources from different varieties of Portuguese,
totaling about 600 milion tokens: Wikipedia (300M), Jornal Público (215M), Jornal do
Brasil (60M), and Europarl (31M). Additionally, we initialized the CVT model with
the pre-trained GloVe embeddings described in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The word embeddings have 300
dimensions, and the LSTM 1024 hidden layers.
        </p>
        <p>Figure 1 shows the performance of our model in the dev set depending on the
training steps.5 As it can be seen, the improvement after 200k steps is very small, so we
stopped at 250k with the following results: 88.89 precision; 93.11 recall, and 90.95 f1.6</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Open Information Extraction for Portuguese</title>
      <p>
        The OIE system we have used for the shared task is an adapted version of the
corresponding module installed in LinguaKit and described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The OIE module
consists of two tasks: identification of argument structures and generation of relations
(triples).
3.1
      </p>
      <sec id="sec-3-1">
        <title>Argument Structure</title>
        <p>Each clause has an argument structure which relies on a verb. To identify argument
structures, the system takes a parsed sentence as input represented by means of the
dependency-based ConLL-X format. For each verb (V), the system selects all
dependents whose syntactic function can be part of its argument structure. The functions
considered to build an argument structure are the following: subject (S), direct object
(O), attribute (A), and all complements headed by a preposition (C). So, there is no
distinction between obligatory vs. optional arguments. Five types of argument structures
were defined: SVO, SVC+, SVOC+, SVA, SVAC+, where “C+” means one or more
complements.</p>
        <p>Within a sentence, it is possible to find several argument structures
corresponding to different clauses. Let us see an example. Table 1 shows three argument
structures extracted from one of the input sentences of the test dataset provided organizers
of NerReIberLEF2019. This example is quite complex as it includes a relative clause
whose antecedent is not just a nominal phrase but a whole clause, namely the antecedent
of “o que” (which) is the clause “Erlynne se envolve com Robert” (Erlynne gets
involved with Robert). Our system wrongly substitutes the relative pronoun by “Robert”
since the dependency parser identified this proper noun, and not the verb “se envolve”
(gets involved), as the antecedent. While the two first argument structures in Table 1 are
correct, the third one is wrong because of that odd dependency concerning the relative
clause and its antecedent.</p>
        <p>Type Constituents
1 SVC S=”Erlynne”, V=”se envolve”, C=”com Robert”</p>
        <p>Erlynne, gets involved, with Robert
2 SV0C S=”ele”, V=”está traindo”, O=”Meg” C=”com a visitante”</p>
        <p>he, is betraying, Meg, with the visitor
3 SVO S=”Robert”, V=”gera”, O=”rumores”</p>
        <p>Robert, generates, rumors
Table 1. Three argument structures extracted by our system from the sentence “Erlynne se
envolve com Robert, o que gera rumores de que ele está traindo Meg com a visitante” (Erlynne gets
involved with Robert, which generates rumors that he is betraying Meg with the visitor., which is
part of the testing dataset of Task 3 at NerReIberLEF2019.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Generation of Triples</title>
        <p>Once the argument structures have been detected in the previous task, the OIE
system builds a set of verbal relations (triples) with two arguments. These Arg1-Verb-Arg2
relations represent basic propositions or facts standing for minimal units of coherent,
meaningful, and non over-specified information. For example, from the second
argument structure in Table 1, two triples are generated and showed in Table 2.</p>
        <p>To adapt the LinguaKit module to the requirements of task 3 at NerReIberLEF2019,
we made some little adjustments of the OIE system by taken into account the annotation
criteria found in the training/development dataset provided by the organizers. In
particular, the shared task criteria include the fact that any relation between two noun phrases
is to be considered. So, the main adjustment we made was to prevent from generating
arguments headed by verbs. To do that, the subordinated verb was placed within the
verb relation after the main verb by giving rise to a composite verbal phrase. This way,
the nominal argument of the subordinated verb was considered to be the argument of the
verbal phrase and, thus, it was converted into the second argument of the
corresponding triple. For instance, if we apply the official LinguaKit module on a sentence like
“Mohsen Makhmalbaf decide realizar uma chamada” (Mohsen Makhmalbaf decides to
make a call), it results in the following triple:</p>
      </sec>
      <sec id="sec-3-3">
        <title>Argument_1 Relation Argument_2</title>
        <p>Mohsen Makhmalbaf decide realizar uma chamada</p>
        <p>Mohsen Makhmalbaf decides to make a call
However, in the version adapted to the criteria of NerReIberLEF2019, the output is a
slightly different triple:</p>
      </sec>
      <sec id="sec-3-4">
        <title>Argument_1 Relation Argument_2</title>
        <p>Mohsen Makhmalbaf decide realizar uma chamada
Mohsen Makhmalbaf decides to make a call
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <sec id="sec-4-1">
        <title>NER Task</title>
        <p>37,706 tokens. In total, the annotators extracted 916 named entities of the Person
category. As shown in Table 3, There is a big difference between the two strategies, CVT
and LinguaKit, which also happens with regard to the other participants: there are three
systems with low F1 scores between 30 and 40% (like LinguaKit), and three with very
high scores between 88 and 90% (like CVT), being the highest F1 value obtained by
CVT. It will be necessary to analyze the test dataset to explain these so important
differences among systems.</p>
        <p>Clinical Dataset consists of clinical notes which were annotated for the Person
category. Clinical notes present particular challenges such as names with codes inside;
for example, the annotators must understand “AnaR1” or “####Paulo” refer to Person
entities. The corpus size is small: it consists of 50 notes with 50 sentences and 9,523
tokens. The total number of Person entities is 77. The performance of the neural system
CVT is clearly better than LinguaKit, even though the F1 value remains discrete. CVT
achieves the best score among all participants, which, therefore, gives also discrete
values (ranging between 10 and 41%).</p>
        <p>
          The evaluation with the General Dataset takes into account 5 categories: Person,
Place, Organization, Time and Value. It was built from two different annotated
corpora: SIGARRA [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and Second HAREM (Relation Version) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The total dataset
contains 5,054 sentences with 179,892 tokens. The named entities were classified
following this distribution: 2; 159 Person (PER), 1; 593 Place (PLC), 2; 320 Organization
(ORG), 3; 826 Time itens (TME), and 106 Values of quantities (VAL). Table 5 shows the
results of our two systems, CVT and LinguaKit, including micro and macro-average. In
this corpus, the distance between the two systems is not so important. Even though the
neural-based approach outperforms the rule-based in both micro and macro-average,
the latter performs better on TME entities, which are, in fact, the most frequent class of
named entities in this dataset.
        </p>
        <p>System: CVT
Class Prec Rec F1
PER 75.64% 58.83% 66.18%
ORG 54.24% 28.04% 39.27%
PLC 55.93% 42.47% 48.28%
TME 58.68% 58.57% 58.62%</p>
        <p>VAL 96.23% 96.23% 96.23%
Micro-AV 61.27% 46.07% 52.60%</p>
        <p>Macro-AV 68.14% 56.82% 61.71%
The pure OIE task correspond, in fact, with Test 2 of Task 3 at NerReIberLEF2019.
The objective is generate verbal relations with two nominal arguments, that is, triples
referring to basic propositions.</p>
        <p>In the evaluation, two scores metrics were considered: a completely correct
relations score and a partially correct relations score. Completely correct relations (exact
matching) stands when all terms that make up the relation descriptors in the key are
equal to the relations descriptors of the system’s output. Partially correct relations
(partial matching) stands when at least one of the terms in the relation descriptors of the
systems output corresponds to a term in the relation descriptors of the key.</p>
        <p>
          Test 2 consists of a set of golden triples extracted from 25 sentences. A description
of the constraints for extractions of relations and arguments is reported in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Table 6 shows the results obtained by the 6 systems involved in this task. Most of
the them have been described in previous work, for instance, DEPENDENTIE [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ],
INFERPOROIE [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and ICEIS [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. The OIE of LinguaKit is a more recent version
of DepOIE [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and ArgOIE [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. It clearly outperforms the other systems in terms of
Precision, both in exact and partial matching. However, in F1, LinguaKit is the first
system only in exact matching. In partial matching, DPTOIE system performs better
than LinguaKit as its Recall is higher. It is worth noting that all systems have very low
recall, which shows the difficulty of the task.
In this article, we compared a neural-based tool for NER with a rule-base system using
the datasets of NerReIberLEF2019 Task 1. Moreover, we also compared a rule-based
strategy with the rest of systems participating to the OIE shared task in
NerReIberLEF2019 (Task 3 - Test 2).
        </p>
        <p>For the NER task, the neural-based system, trained on a corpus of about 900k tokens
and provided with pre-trained word embeddings, clearly outperformed the rule-based
strategy in all datasets: legal, medical, and general. Concerning the OIE task, we could
not make the same kind of comparison as there is no training corpus for this specific
task, which has not been modelled so far using neural classifiers due to its excessive
complexity. In this case, the precision of our rule-based tool clearly outperformed that
of the other systems in the competition. However, it will be necessary to analyze the
test dataset in order to know how to improve the recall, which remains still very low.</p>
        <p>
          In future work, we will explore the possibility of developing a hybrid strategy
mixing rules and neural networks, such as the recent study on sentiment analyzer described
in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], where the proposed technique mixes a deep learning approach (namely,
Convolutional Neural Networks) and a rule-based method to improve aspect level sentiment
analysis.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has received financial support from DOMINO project
(PGC2018-102041B-I00, MCIU/AEI/FEDER, UE), eRisk project (RTI2018-093336-B-C21), the
Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016-2019,
ED431G/08), the Spanish Ministry of Economy, Industry and Competitiveness under
its Competitive Juan de la Cierva Postdoctoral Research Programme (FJCI-2016-28032
and IJCI-2016-29598) and the European Regional Development Fund (ERDF). We
gratefully acknowledge the support of NVIDIA Corporation with the donation of the
Titan Xp GPU used for this research.
Notes
1http://www.inf.pucrs.br/linatural/wordpress/iberlef-2019/
2LinguaKit is freely available at: https://github.com/citiususc/Linguakit
3https://github.com/tensorflow/models/tree/master/research/cvt_text
4https://github.com/arop/ner-re-pt/tree/master/datasets/harem/nltk
5These values were obtained with the tagging_scorer script provided by CVT (see footnote 3).
6We achieve F 1 &gt; 95% using different corpora combinations (with more harmonized
annotations) in preliminary experiments. However, we decided to submit this model as the training
corpora were actually more balanced with regard to the different sources.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. de Araujo,
          <string-name>
            <given-names>P.H.L.</given-names>
            ,
            <surname>de Campos</surname>
          </string-name>
          , T.E.,
          <string-name>
            <surname>de</surname>
            <given-names>Oliveira</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.R.</given-names>
            ,
            <surname>Stauffer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Bermejo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            :
            <surname>LeNER-Br</surname>
          </string-name>
          :
          <article-title>A Dataset for Named Entity Recognition in Brazilian Legal Text</article-title>
          .
          <source>In: International Conference on Computational Processing of the Portuguese Language</source>
          . pp.
          <fpage>313</fpage>
          -
          <lpage>323</lpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised sequence modeling with cross-view training</article-title>
          .
          <source>In: EMNLP</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Inferportoie: A portuguese open information extraction system with inferences</article-title>
          .
          <source>Natural Language Engineering</source>
          <volume>25</volume>
          (12
          <year>2018</year>
          ). https://doi.org/10.1017/S135132491800044X
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , a Xavier,
          <string-name>
            <surname>C.C.</surname>
          </string-name>
          :
          <article-title>Portuguese named entity recognition and relation extraction tasks at iberlef 2019</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS.org (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Neural open information extraction</article-title>
          .
          <source>In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          . pp.
          <fpage>407</fpage>
          -
          <lpage>413</lpage>
          . Association for Computational Linguistics, Melbourne,
          <source>Australia (Jul</source>
          <year>2018</year>
          ), https://www.aclweb.org/anthology/P18-2065
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>H.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carvalho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Second</surname>
            <given-names>HAREM</given-names>
          </string-name>
          :
          <article-title>Advancing the state of the art of named entity recognition in Portuguese</article-title>
          .
          <source>In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)</source>
          .
          <source>European Languages Resources Association (ELRA)</source>
          , Valletta, Malta (May
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piñeiro</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez-Castaño</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichel</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction</article-title>
          .
          <source>In: 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)</source>
          . pp.
          <fpage>239</fpage>
          -
          <lpage>244</lpage>
          (
          <year>2018</year>
          ). https://doi.org/10.1109/SNAMS.
          <year>2018</year>
          .8554689
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A resource-based method for named entity extraction and classification</article-title>
          . In: Antunes,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Pinto</surname>
          </string-name>
          , H.S. (eds.)
          <source>Progress in Artificial Intelligence, 15th Portuguese Conference on Artificial Intelligence, EPIA</source>
          <year>2011</year>
          , Lisbon, Portugal,
          <source>October 10- 13</source>
          ,
          <year>2011</year>
          .
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>7026</volume>
          . Springer (
          <year>2011</year>
          ). https://doi.org/http://dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -24769-9
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Multilingual Open Information Extraction</article-title>
          .
          <source>In: Progress in Artificial Intelligence - 17th Portuguese Conference on Artificial Intelligence, EPIA</source>
          <year>2015</year>
          , Coimbra, Portugal, September 8-11. pp.
          <fpage>711</fpage>
          -
          <lpage>722</lpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gamallo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández-Lanza</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dependency-based open information extraction</article-title>
          .
          <source>In: ROBUS-UNSUP</source>
          <year>2012</year>
          : Joint Workshop on Unsupervised and
          <article-title>Semi-Supervised Learning in NLP at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL</article-title>
          <year>2012</year>
          ). pp.
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          . Avignon, France (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Extracção de relações semânticas. Recursos, ferramentas e estratégias.
          <source>Ph.D. thesis</source>
          , University of Santiago de Compostela (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Glauber</surname>
            , R., de Oliveira,
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sena</surname>
            ,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Challenges of an annotation task for open information extraction in portuguese</article-title>
          .
          <source>In: Computational Processing of the Portuguese Language - 13th International Conference, PROPOR</source>
          <year>2018</year>
          , Canela, Brazil,
          <source>September 24-26</source>
          ,
          <year>2018</year>
          , Proceedings. pp.
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2018</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          - 99722-
          <issue>3</issue>
          _7, https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -99722-
          <issue>3</issue>
          _
          <fpage>7</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hartmann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shulby</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Treviso</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aluísio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Portuguese word embeddings: Evaluating on word analogies and natural language tasks</article-title>
          .
          <source>In: Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology</source>
          . pp.
          <fpage>122</fpage>
          -
          <lpage>131</lpage>
          . Sociedade Brasileira de Computação, Uberlândia, Brazil (Oct
          <year>2017</year>
          ), https://www.aclweb.org/anthology/W17-6615
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mausam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Open information extraction systems and downstream applications</article-title>
          .
          <source>In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence</source>
          . pp.
          <fpage>4074</fpage>
          -
          <lpage>4077</lpage>
          . IJCAI'16, AAAI Press (
          <year>2016</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>3061053</volume>
          .
          <fpage>3061220</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Padró</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Analizadores multilingües en freeling</article-title>
          .
          <source>Linguamática</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <fpage>13</fpage>
          -
          <lpage>20</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pires</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Named entity extraction from Portuguese web text</article-title>
          .
          <source>Master's thesis</source>
          ,
          <source>Universidade do Porto</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pirinen</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          :
          <article-title>Neural and rule-based Finnish NLP models-expectations, experiments and experiences</article-title>
          .
          <source>In: Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages</source>
          . pp.
          <fpage>104</fpage>
          -
          <lpage>114</lpage>
          . Association for Computational Linguistics, Tartu,
          <source>Estonia (Jan</source>
          <year>2019</year>
          ), https://www.aclweb.org/anthology/W19-0309
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ravfogel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Linzen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Studying the inductive biases of RNNs with synthetic variations of natural languages</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <fpage>3532</fpage>
          -
          <lpage>3542</lpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          ), https://www.aclweb.org/anthology/N19-1356
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ray</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakrabarti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A mixed approach of deep learning method and rulebased method to improve aspect level sentiment analysis</article-title>
          .
          <source>Applied Computing and Informatics</source>
          (
          <year>2019</year>
          ). https://doi.org/https://doi.org/10.1016/j.aci.
          <year>2019</year>
          .
          <volume>02</volume>
          .002, http://www.sciencedirect.com/science/article/pii/S2210832718303156
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardoso</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilel</surname>
          </string-name>
          , R.: HAREM:
          <article-title>An advanced NER evaluation contest for portuguese</article-title>
          .
          <source>In: 5th International Conference on Language Resources and Evaluation - LREC-2006</source>
          . pp.
          <fpage>1986</fpage>
          -
          <lpage>1981</lpage>
          . Genova,
          <string-name>
            <surname>Italy</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sena</surname>
            .,
            <given-names>C.F.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            <given-names>.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Claro</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.B.</surname>
          </string-name>
          :
          <article-title>Inference approach to enhance a portuguese open information extraction</article-title>
          .
          <source>In: Proceedings of the 19th International Conference on Enterprise Information Systems - Volume</source>
          <volume>1</volume>
          : ICEIS,. pp.
          <fpage>442</fpage>
          -
          <lpage>451</lpage>
          . INSTICC,
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          (
          <year>2017</year>
          ). https://doi.org/10.5220/0006338204420451
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Souza De Oliveira</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Dependentie: An open information extraction system on portuguese by a dependence analysis</article-title>
          .
          <source>In: ENIAC-2017 XIV Encontro Nacional de Inteligência Artificial e Computacional</source>
          (10
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>