=Paper=
{{Paper
|id=Vol-2421/NER_Portuguese_paper_2
|storemode=property
|title=Adapting NER (CRF+LG) for Many Textual Genres
|pdfUrl=https://ceur-ws.org/Vol-2421/NER_Portuguese_paper_2.pdf
|volume=Vol-2421
|authors=Juliana Pirovani,James Alves,Marcos Spalenza,Wesley Silva,Cristiano da Silveira Colombo,Elias Oliveira
|dblpUrl=https://dblp.org/rec/conf/sepln/PirovaniASSCO19
}}
==Adapting NER (CRF+LG) for Many Textual Genres==
<pdf width="1500px">https://ceur-ws.org/Vol-2421/NER_Portuguese_paper_2.pdf</pdf>
<pre>
    Adapting NER (CRF+LG) for Many Textual
                   Genres ?

 Juliana Pirovani1 , James Alves2 , Marcos Spalenza2 , Wesley Silva2 , Cristiano
                   da Silveira Colombo2 , and Elias Oliveira2
1
     Universidade Federal do Espı́rito Santo (UFES), 29.500-000 - Alegre - ES - Brasil
                               juliana.campos@ufes.br
    2
      Programa de Pós-Graduação em Informática Universidade Federal do Espı́rito
                   Santo (UFES), 29.075-910 - Vitória - ES - Brasil
                          {james,elias}@lcad.inf.ufes.br


        Abstract. Named Entity Recognition is the task of automatically iden-
        tifying named entities and classifying them into predefined categories
        such as person, place, organization, among other categories considered
        relevant in specific domains. This task is important and challenging, es-
        pecially when the system must be able to recognize named entities in
        many textual genres, including genres that differ from those for which it
        was trained. CRF+LG is a hybrid system for Named Entity Recognition
        in Portuguese texts that combines a labeling obtained by a Conditional
        Random Fields with a term classification obtained by a Local Grammar
        as an additional informed feature. This paper aims to report the initial
        efforts made to adapt CRF+LG system for many textual genres in ac-
        cordance with the proposed Portuguese Named Entity Recognition task
        in IberLEF 2019. We adapted the LG to capture rules of textual genres
        that do not appear in the examples of the training corpus and thus as-
        sist the Named Entity Recognition, even when there is no training set of
        an available textual genre. CRF+LG was also trained in an augmented
        training corpus.

        Keywords: Named Entity Recognition · Conditional Random Fields ·
        Local Grammars · Domain Adaptation


1     Introduction

Named Entity Recognition (NER) is a task for identifying and classifying auto-
matically named entities (NEs) in free written texts. These NEs correspond to
names of person, places, organizations, among other categories considered rele-
vant in specific domains. This task is important because it is a fundamental step
  Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 Septem-
  ber 2019, Bilbao, Spain.
?
  The second author was financed in part by the Coordenação de Aperfeiçoamento de
  Pessoal de Nı́vel Superior - Brasil (CAPES) - Finance Code 001
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


of preprocessing for several applications such as question answering systems [20],
relation and event extraction [5] and entity-oriented search [6]. Indeed, NEs are
an essential source of information in textual information retrieval.
    NER is a very challenging task as several categories of named entities are
written similarly and they appear in similar contexts. In addition, NER depends
on the language, the training corpus and a given domain [17]. Considering the
domain dependency, the same category of NE can be written in different ways
depending on the textual genre under analysis. For example, in e-mail texts it is
common to see person names after words as Hello and Good afternoon, whereas
in memorandum texts it is common to see person names after words as Public
servants and Professor. Consistent training sets including texts from different
genres are not always available.
    In 1995, the Message Understanding Conference [13] included the NER task
for the first time for the English, carrying out a joint assessment of the area.
Thereafter, several similar events have emerged such as the ACE [8], CoNLL [24],
HAREM [12, 27] and TAC [14]. HAREM was an initiative for the Portuguese or-
ganized by Linguateca [11]. The annotated corpora used in the First and Second
HAREM, known as the Golden Collections (GC), are used as a golden standard
reference for NER systems in Portuguese.
    This year (2019), the Portuguese NER task was one of the tasks proposed in
the Iberian Languages Evaluation Forum (IberLEF) [23]. The objective of this
task is to evaluate the submitted systems in many textual genres. The partic-
ipants were free to choose their own training datasets. The categories person,
place, organization, value and time were evaluated in datasets that have as main
textual genres: news, memorandums, e-mails, interviews and magazine articles;
and the person category was evaluated in clinical notes and police texts.
    This paper presents the initial efforts to adapt the system CRF+LG [21] for
many textual genres in accordance with this proposed task in IberLEF 2019.
CRF+LG is a hybrid system for Portuguese NER that combines a labeling ob-
tained by a Conditional Random Fields (CRF) with a term classification ob-
tained by a Local Grammar (LG) as an additional informed feature. The idea
of this system was to study a way to improve the performance of NER systems
that use the machine learning approach using less training corpus. In order to
participate in the IberLEF 2019, we observed some datasets from different tex-
tual genres, we also adapted the LG and retrained the model with an augmented
training corpus.
    The remaining of this paper is organized as follows. In Section 2 we discuss
some of the more related works which both support some of our arguments and
complement some point of view we discuss in this paper. The methodology is ex-
plained in the Section 3. Within this section we enumerate each of the necessary
steps to perform the training and testing and we describe the adaptations made
in this architecture to the IberLEF. We also introduce some challenges we had
found within the datasets used for training which decrease the performance of
the learning process. The Section 4 discusses the results yielded by our algorithm
which was run by the IberLEF organizers. We also discuss some aspects faced


                                          422
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


when dealing with cross domain datasets. Our conclusions are presented in the
Section 5.


2   Related work
Named Entity Recognition systems can be developed using the following ap-
proaches: linguistics [17, 22], machine learning [4, 25, 29] or hybrid [19, 30]. Some
of the main NER systems for Portuguese will be described below.
    The system proposed by [25] is based on the CharWNN Deep Neural Net-
work, which uses word-level and character-level representations to perform se-
quential classification. The system was tested for the Portuguese and Spanish
and, for the Portuguese, the GC of the First HAREM was used as training
set and the MiniHAREM as the test set. The approach was compared to the
ETLC M T system [26], an ensemble method based on Entropy Guided Trans-
formation Learning (ETL) and outperformed this system in both total (10 cat-
egories of HAREM) and selective (categories person, place, organization, time
and value) scenarios.
    A Deep Neural Network architecture with word-level and character-level rep-
resentations was also used in [4]. A combination of these representations is fed
into a bidirectional Long Short-Term Memory with Conditional Random Fields
(Bi-LSTM-CRF) to perform sequential classification. The authors evaluated dif-
ferent combinations of hyperparameters for training such as word embeddings
model, tagging schemes, word capitalization feature and number of hidden units
for each LSTM, obtaining the optimal values for the parameters that had a great-
est impact in the performance of the model. A very similar architecture was used
by [7] for two sequence labeling tasks (POS-tagging and NER) obtaining very
close results.
    A hybrid approach to Portuguese NER is presented in [18, 21] using the
machine learning approach CRF [10] and the linguistics approach LG [9]. The
classification obtained from LG was sent as an additional feature for the learning
process of the CRF prediction model. The CRF model assigns the final label of
the NEs. This approach is a good way to take into account the human expertise
for capturing the rules that do not appear in examples of the annotated corpus
used for training by the CRF. A study about the boundaries of CRF’s perfor-
mance when using a result coming from any other classifier as an additional
feature was also presented.
    The systems that used Neural Networks [4, 7, 25] presented superior results
using massive corpora for unsupervised learning of features, which was not the
case of the work presented in [21]. However, the results obtained by [21] outper-
form the results of systems reported in the literature that were evaluated under
equivalent conditions: a system that uses only CRF [1] and the system based on
the CharWNN presented in [25] without the unsupervised pre-training.


                                          423
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


3   Methodology

In order to participate in the IberLEF 2019, we have used the architecture of our
system CRF+LG[21]. CRF+LG does not use massive corpora for unsupervised
learning of features. The LG is a good way to take into account the human ex-
pertise for capturing the rules and a way to perform the NER using the linguistic
approach when there is no available training corpus. The Figure 1 presents an
overview of the methodology used, demonstrating how the steps to perform the
training occur.


                               Fig. 1. Train Workflow


     Initially, each input file goes through the sentence segmentation process (step
1). Segmentation was performed using the Unitex (http://unitexgramlab.org/)
tool. Unitex uses LGs to describe the different ways that indicate the end of a
sentence. For this work, the LG that performs sentence segmentation in Unitex
has been changed so as not to segment sentences in a colon (:) and semicolon
(;). This flexibility is a strength of the tool.
     A copy of the targeted files has their tags removed since the CD used has
the NEs markings (step 2). The LG built in this work is applied to these
files without any marking and the NEs identified by it are annotated (step
3). On the other hand, the segmented files are tokenized using the OpenNLP
(http://opennlp.apache.org/) library (step 4). This library is based on machine
learning and performs common NLP tasks such as segmentation, tokenization,
POS-Tagging, etc.
     In order to represent the NER as a sequence labeling problem, a label must be
assigned to each token of the text. The BIO notation was used (steps 4 and 5). In
the sequence, several features [18] are added for each token of the files, including
the NE label previously assigned by the LG (step 6). These characteristics are
used during supervised learning of the CRF prediction model (step 7).


                                          424
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    The methodology used for testing is similar, but the input files do not have
the NEs tags. In addition to the files containing the tokens and features, the
CRF receives the previously trained model to predict a label for each token.
    The next two sections have a short description of how the system obtains a
tip by the LG and explain how CRF works. In the last section we described the
adaptations made to participate in the IberLEF event.

3.1   Local Grammars (LG)
An LG created in Unitex is represented as a set of one or more graphs. The
LG used by CRF+LG consists of 10 graphs, one for each of the NEs categories
considered by HAREM.
   We observed in the training file in which context each type of NE appeared,
what words could somehow indicate the existence of NE to construct each graph.
We observed that, for example, words with the first letter capitalized preceded
by the preposition em (in) were labeled as place. We also observed that some
NEs of the person category are preceded by words such as diz (say), explicou
(explained), afirmou (said), etc.
   Thus, the graphs created capture some simple heuristics to the recognition of
NEs in the training set. An example of rule in the graph created for the person
category is presented in Figure 2.


      Fig. 2. Example of rule in the graph that recognizes the Person category


    This graph recognizes words such as diz (say) or afirmou (said) followed by
words with the first letter capitalized, as identified by the code < FIRST > in
Unitex dictionaries. Among words with the first letter capitalized, prepositions
may appear whose recognition has been previously detailed in graph Preposi-
cao.grf included as subgraph. Examples of occurrences identified by this graph
were: diz < PESSOA > Moncef Kaabi < /PESSOA >
afirmou < PESSOA > José SÓCRATES < /PESSOA >
afirma < PESSOA > Jason Knight < /PESSOA > .
    Note that identified person will appear between the tags < PESSOA > (
< PERSON > ) and < /PESSOA > in the concordance file containing the list
of occurrences identified.


                                          425
            Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


3.2     Conditional Random Fields (CRF)

Conditional Random Fields (CRF) is a machine learning method for structured
prediction proposed by [10]. It is used for labeling of sequential data based on a
conditional approach.
    Let X = (x1 , x2 , ..., xn ) be a sequence of words in a text, we want to determine
the best sequence of labels Y = (y1 , y2 , ..., yn ) for these words, corresponding
to the categories of NEs (10 categories of the HAREM or the label O in this
work). The CRF models a conditional distribution p(Y |X) that represents the
probability of obtaining the output Y given the input X.
    In this work, we used a linear-chain CRF that predict the output variables
Y as a sequence for sequences of input variables X. According to [28], a linear-
chain CRF is a conditional distribution that takes the form shown in Equation
1:
                                       T
                                             (K                           )
                                  1 Y          X
                 p(y|x) =                exp       θk fk (yt , yt−1 , xt )          (1)
                               Z(x) t=1
                                               k=1

      where Z(x) is a normalization function given by Equation 2
                                T
                                       (K                           )
                           XY           X
                    Z(x) =         exp      θk fk (yt , yt−1 , xt )                (2)
                               y t=1         k=1
                              K
    F = {fk (yt , yt−1 , xt )}k=1 is a set of feature functions that must be fixed
according to the problem. An example is a function which takes the value 1
when the word begins with a capitalized letter (component of the input vector
xt ), its label is Person (yt ) and the previous label (yt−1 ) is Other and 0
otherwise. The vector xt contains all the components of the global observations
x that are needed for computing features at time t. θ = θk is a vector of weights
that must be estimated from the training set. This is usually done by maximum
likelihood learning. The weights depend on each feature function and the more
discriminating the function, the higher its computed weight will be.
    The MALLET (http://mallet.cs.umass.edu/) toolkit was used in this work
to estimate the vector of weights and then apply the CRF model obtained to
label the test set. This CRF model combines the weights of each feature function
to determine the probability of a certain value (yt ).

3.3     Adaptation of CRF+LG to IberLEF
CRF+LG was built to recognize the 10 named entities categories of the HAREM
(person, place, organization, value, time, event, abstraction, work, thing and
other ). Then, the system was initially adaptated to consider only the five cate-
gories of the IberLEF (person, place, organization, value and time) during the
CRF training phase. Nevertheless, we have kept the recognition of the 10 cate-
gories by the LG because we believe that this helps the system to disambiguate
NEs.


                                            426
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    The Golden Collection of the First and Second HAREM, considered as a
reference for Named Entity Recognition systems in Portuguese, were used in
previous experiments [19] as training and testing sets, respectively, for evaluation
of the CRF+LG. Several errors occurred due to some inconsistencies in the GC
of the First HAREM and Second HAREM. For example, in the GC of the First
HAREM, strings as 2004 preceded by the preposition em (in) are considered
NEs of the Time category and the CRF+LG learned this and labeled all similar
strings preceded by em as Time. However, in the GC of the Second HAREM,
the preposition em is part of the NE. So all these NEs were wrongly labeled.
The same happened in other situations of the categories time, value and person.
    Some of these major inconsistencies were removed by Pirovani [21] and others
were removed during this work. The goal was to get a more consistent dataset,
normalized, composed of the three GCs of the HAREM (First HAREM, Mini
HAREM and Second HAREM) to use as training.
    The GCs of the HAREM include documents from different textual genres
such as news, web texts, literary fiction, transcribed oral interviews, technical
texts, journalistic and personal blog, essays and FAQ questions [12, 27]. How-
ever, the task of the IberLEF proposes to evaluate the systems in other specific
textual genres such as memorandums, e-mails, magazine articles, clinical notes
and police texts.
    In order to train CRF+LG to this task, we have researched and reviewed
other corpus from different textual genres in Portuguese:
 1. SIGARRA [16]: SIGARRA corpus has 905 articles, manually annotated us-
    ing eight NEs categories: hour, event, organization, course, person, location,
    date and organic unit.
 2. WikiNER [15]: This corpus is a silver-standard automatically annotated con-
    taining three different NEs annotated: person, location and organization. We
    created 592 subsets and reviewed 40 parts including annotation for value and
    time for NEs and adjusting the automatic annotation mistakes.
 3. LeNER-BR [2]: LeNER-BR was manually annotated with a focus on legal
    documents. This dataset has 70 documents with the following categories of
    NEs: organization, person, time, locations, law and decisions regarding law
    cases.
 4. aTribuna [21]: This dataset has 100 newspaper documents with 2714 NEs
    person manually annotated.
 5. administrative orders (http://gedoc.ifes.edu.br/): We also annotated manu-
    ally 20 administrative orders of the Instituto Federal de Educação, Ciência
    e Tecnologia do Espı́rito Santo (IFES).
    Our initial intention was to use these datasets to 1) identify new rules to
insert into LG and 2) combine them to increase the training set and thus improve
the model prediction. However, some inconsistencies observed between the GCs
of the HAREM and others such as LeNER and SIGARRA made it difficult to
integrate all these datasets to create a unique training set.
    The LG used in CRF+LG was built by analyzing only the CD of the First
HAREM. By analyzing some texts of these new domains, we observed some very


                                          427
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


strict patterns for writing of NEs and several adaptations have been introduced
at LG to recognize these patterns. Here are some examples:
 1. Sequences of words with the first letter capitalized or numbers beginning
    with words such as Sala, Salão, Auditório and Anfiteatro as place category.
 2. Recognition of dates (time category) with dots (25.12.2010).
 3. Recognition of dates preceded by words such as até, a partir de, entre, dia
    and desde.
 4. Recognition of values preceded by abbreviations or words such as num. N.,
    art. Art., matrı́cula and siape.
    One of the main inconsistencies observed among the datasets was the differ-
ent categories of NEs annotated. For example, the SIGARRA corpus does not
contain the value category annotated, however there are NEs of this category in
the texts. Another example of inconsistency are the NEs annotated in different
ways. Sometimes specific words in lowercase letters should form part of NEs
and other times not. For example, rainha (queen) in rainha Elizabeth (queen
Elizabeth) and mais de (more than) in mais de 30 (more than 30). This cer-
tainly deteriorate the model learning because of the lack of correct or consistent
annotation.


4   Experiment Result
Before submitting the system to the IberLEF, we repeated some of the experi-
ments performed in [21]. Initially, the LG built in [21] and the new version of our
LG submitted to IberLEF were applied individually to the GC of the Second
HAREM to evaluate the new rules inserted.
    Although the precision value obtained by adapted LG was lower indicating
that more NEs have been misidentified (false positives) due to the new rules,
these rules also increased the recall value in 9 percentage points. Thus, the gain
obtained by adapted LG in comparison to the original LG was approximately 7
percentage points in F-measure. The decrease in the precision metric is some of
the effect faced when we change the domain of the dataset used for testing. This
experiment only suggests that the continuing adaption of the LG is a necessity.
    CRF+LG was also rerun using the adapted LG. The GCs of the First HAREM
and Second HAREM were used as training and testing sets respectively. The final
gain in F-measure was about 4 percentage points achieving 63.11% in F-measure.
These results are another example of how the combination CRF+LG can im-
prove the NER. In this experiment we were able to identify 31 more entities due
to the new version of the LG.
    We also performed some experiments combining several of the datasets pre-
sented in the previous Section (GCs HAREM normalized, SIGARRA, selected
sentences from WikiNER, aTribuna and administrative orders) for use as train-
ing set. The CRF+LG prediction models were obtained for all combinations
and applied in a testing set that we have created for this purpose. This dataset
contains only 15 texts from different textual genres annotated. The model that


                                          428
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


presented the best results in this initial test was submitted to IberLEF. This
model was trained with the GCs HAREM normalized and the 20 administrative
orders.

4.1   IberLEF Task Results
The IberLEF organizers evaluated the submitted systems in two manually anno-
tated datasets: the Clinical dataset with 50 sentences and 77 NEs and the Police
dataset from the Brazil’s Federal Police with 1388 sentences and 916 NEs. Both
datasets were annotated with only the person category. The systems were also
evaluated in the General dataset containing the SIGARRA dataset with NEs
categories date and time mapped to as a single category time and a subset of
sentences from the GC of the Second HAREM (SecHAREM) annotated with
only the value category since SIGARRA does not have this category annotated.
   The IberLEF organizers used the precision (P), recall (R) and F-measure
(F) [3] metrics and computed the results using the CoNLL-2002’s standard eval-
uation script (http://www.cnts.ua.ac.be/conll2002/ner/bin/conlleval.txt). The
results to our model are exposed in Table 1.


                  Corpus           Category    P      R      F
                  Police Dataset PER        29.59% 58.41% 39.28%
                  Clinical Dataset PER      14.29% 10.09% 11.83%

                                  Overall 56.26% 56.66% 56.46%
                  General
                                  ORG      42.27% 32.31% 36.63%
                  Dataset
                                  PER      57.39% 62.14% 59.67%
                  (SIGARRA
                                  PLC      37.35% 51.38% 43.26%
                  +
                                  TME      71.33% 74.91% 73.08%
                  SecHAREM)
                                  VAL      80.19% 82.52% 81.34%
                           Table 1. IberLEF Task Results


    In the first column, we have the list of datasets: the Police dataset in the first
line, followed by Clinical dataset, and the combined SIGARRA +SecHAREM.
Whereas for the two first datasets only the person entity was evaluated, for the
combined dataset all the five entities were evaluated: ORG – organization, PER
– person, PLC – location, TME – time and VAL – value.
    The best result obtained by our approach was on the identification of the
value category (81.34% in F-measure) in the last line of the Table 1 for the
General dataset, whereas our worst result was on identifying the person category
for the Clinical dataset, in the second line (11.83% in F-measure).
    Note that, based on the results depicted in Table 1, our approach did not
achieve the same figures level on the two first datasets as we could get on
the Overall evaluation when testing on the combined dataset. Although these
datasets (Police and Clinical) were not divulged by the IberLEF organization


                                          429
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


because the information is of a sensitive nature, we imagine that these results are
due to NEs with structures very different from those for which this system was
trained. The Clinical dataset, for example, has a textual structure with words
that should be separated by a space and they are not, several medical abbre-
viations of unusual terms and odd sequences of special characters (AnaR1 and
###Paulo as person names). In order to recognize these very specific structures
the system would need to be trained in texts from that same domain or have
knowledge of those structures to insert into LG.
    The results obtained in the General dataset were a bit better. The results
for the value category exceeded 81 percentage points in F-measure and for the
time category exceeded 73 percentage points in the same metric. NEs of these
categories have better defined structures that are easier to capture in the LG
rules and easier to learn by the CRF.
    In order to understand our results better, we applied the CRF+LG model
to the General dataset (https://github.com/jneto04/iberlef-2019) released by
the organization. By analyzing the results obtained, we observed that many
of the NEs of the value category have words such as mais de (more than), cerca
de (about), aproximadamente (approximately) and until (até) which should be
part of the NE. However, with the purpose of normalizing the three GCs of
the HAREM to use as a training set, these words were removed. So, instead
of recognizing sequences such as mais de 800 milhões, cerca de 600 km, aprox-
imadamente 1,4 tonelada e até 120 kg, CRF+LG recognized 800 milhões, 600
km, 1,4 tonelada e 120 kg, decreasing the value of the metrics.
    CRF+LG recognized sequences preceded by words such as Faculdade (Col-
lege), Universidade (University), Instituto (Institute) and Departamento (De-
partment) as organization (Faculdade de Ciências Médicas da Universidade Nova
de Lisboa, Departamento de Quı́mica). However, the IberLEF organization did
not consider the organic unit category of the SIGARRA as an organization.
    We also believe that the use of the 20 administrative orders as training set
may have somewhat impaired the recognition of words in capital letters since
many NEs are written in uppercase in this dataset.
    It is important to note that the results obtained by the systems should not
be directly compared as the participants used different training corpora. In this
case, the CRF+LG also did not use massive corpora for unsupervised learning
of features. In order to compare the techniques used by the systems, they must
be trained in the same dataset and under equivalent conditions.


5   Conclusion
This paper is a result of the IberLEF task force which the objective is to evaluate
intelligent algorithm models on the NER problem in many textual genres. Our
proposed model used the combination of two strategies: a supervised learning
algorithm named CRF, and a tailored set of LGs used here to give tips to the
former algorithm. In [21] we discussed that the more valuable tips we offer to
the CRF algorithm, the better is its performance.


                                          430
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    In this paper we present the results yielded by the IberLEF organizers when
running our model over the three datasets used to compare the participating
systems. Two of these datasets were used solely to compute the performance
of the submitted algorithms on automatically annotating the person entity on
police texts and clinical notes.
    The LG adapted in this work for use with the CRF+LG approach obtained
a gain of 7 percentage points in F-measure in comparison to the original LG
and a final gain of approximately 4 percentage points combined with the CRF
according to the experiments presented in Section 4. These results show the
potential of LG for use in the NER task and the necessity of the continuous
adaptation of the LG.
    The results obtained by the CRF+LG in the IberLEF task, especially for
the Police and Clinical datasets, show the difficulty of the NER in new textual
genres containing very specific structures that differ from those for which the
system was trained. Our F-measure metric was below 12 percentage points in
the Clinical dataset that presents particular challenges.
    We observed some errors when analyzing the result obtained by the CRF+LG
in the General dataset that could be avoided if we knew in advance which words
should or should not be part of the NEs. In this way, LG and the training dataset
could be tailored for this.
    We claim that the IberLEF is a milestone towards on building a more uniform
and better way to compare different approaches, measure their results and build
better datasets for experimentation.
    As a possible future work we think of better understanding how to decrease
the impact of increasingly learning from a different domain. The idea is that
a learning model from one domain can be cheaply used into another domain
without a great impact observed in this paper. Besides, the preprocessing stage
of the algorithms has also a great impact on the results. We are working on a way
to introduce an intelligence layer within this stage in order to quickly learn the
different textual genres and thus reduce the mistakes we also could find during
the experiments carried out in this work.


References
 1. Amaral, D.O.F.: O Reconhecimento de Entidades Nomeadas por Meio de Condi-
    tional Random Fields para a Lı́ngua Portuguesa. Master’s thesis, Pontifı́cia Uni-
    versidade Católica do Rio Grande do Sul, Porto Alegre, Brasil (2013)
 2. Araujo, P., Campos, T., Oliveira, R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-
    Br: a Dataset for Named Entity Recognition in Brazilian Legal Text. In: Interna-
    tional Conference on the Computational Processing of Portuguese (PROPOR).
    pp. 313–323. Lecture Notes on Computer Science (LNCS), Springer, Canela, RS,
    Brazil (September 24-26 2018)
 3. Baeza-Yates, R., Ribeiro-Neto, B.: Recuperação de Informação - 2ed: Conceitos e
    Tecnologia das Máquinas de Busca. Bookman Editora (2013)
 4. Castro, P.V.Q., da Silva, N.F.F., da Silva Soares, A.: Portuguese Named Entity
    Recognition Using LSTM-CRF. In: Villavicencio A. et al. (eds) Computational


                                          431
          Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


    Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Com-
    puter Science, vol 11122. pp. 83–92. Springer, Cham, Canela, RS (Sep 2018)
 5. Chan, Y.S., Roth, D.: Exploiting Syntactico-Semantic Structures for Relation Ex-
    traction. In: Proceedings of the 49th Annual Meeting of the Association for Com-
    putational Linguistics: Human Language Technologies-Volume 1. pp. 551–560. As-
    sociation for Computational Linguistics (2011)
 6. Cheng, T., Yan, X., Chang, K.C.C.: Supporting Entity Search: a Large-scale Pro-
    totype Search Engine. In: Proceedings of the 2007 ACM SIGMOD international
    conference on Management of data. pp. 1144–1146. ACM (2007)
 7. Costa, P., Paetzold, G.H.: Effective Sequence Labeling with Hybrid Neural-CRF
    Models. In: Villavicencio A. et al. (eds) Computational Processing of the Por-
    tuguese Language. PROPOR 2018. Lecture Notes in Computer Science, vol 11122.
    pp. 490–498. Springer, Cham, Canela, RS (Sep 2018)
 8. Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S.,
    Weischedel, R.M.: The Automatic Content Extraction (ACE) Program-Tasks,
    Data, and Evaluation. In: LREC. vol. 2, p. 1. European Language Resources As-
    sociation (ELRA), Lisboa, PORTUGAL (2004)
 9. Gross, M.: The Construction of Local Grammars. In ROCHE, E.; SCHABÈS, Y.
    (eds.). Finite-state language processing, Language, Speech, and Communication,
    Cambridge, Mass. pp. 329–354 (1997)
10. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilis-
    tic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the
    Eighteenth International Conference on Machine Learning, ICML 2001. vol. 1, pp.
    282–289. ACM, San Francisco, CA, USA (2001)
11. Linguateca: (2018), http://www.linguateca.pt/HAREM/, acesso em: 02/03/2018
12. Mota, C., Santos, D.: Desafios na Avaliação Conjunta do Reconheci-
    mento de Entidades Mencionadas: O Segundo HAREM. Linguateca (2008),
    https://www.linguateca.pt/LivroSegundoHAREM/
13. MUC-7: MUC-7 Proceedings (2016), acesso em: 11/10/2018
14. NIST:          Text        Analysis         Conference        (TAC)         (2018),
    https://tac.nist.gov/2018/index.html, acesso em: 24/05/2018
15. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning Mul-
    tilingual Named Entity Recognition from Wikipedia. Artificial Intelligence 194,
    151–175 (2013)
16. Pires, A.R.O.: Named Entity Extraction from Portuguese Web Text. Ph.D. thesis
    (2017)
17. Pirovani, J.P.C., Oliveira, E.: Extração de Nomes de Pessoas em Textos em Por-
    tuguês: uma Abordagem Usando Gramáticas Locais. In: Computer on the Beach
    2015. pp. 1–10. SBC, Florianópolis, SC (March 2015)
18. Pirovani, J.P.C., Oliveira, E.: CRF+LG: A Hybrid Approach for the Portuguese
    Named Entity Recognition. In: Abraham A., Muhuri P., Muda A., Gandhi N.
    (eds) Intelligent Systems Design and Applications (ISDA 2017). Advances in In-
    telligent Systems and Computing. vol. 736, pp. 102–113. Springer, Cham, Delhi,
    India (2017). https://doi.org/https://doi.org/10.1007/978-3-319-76348-4 11
19. Pirovani, J.P.C., Oliveira, E.: Portuguese Named Entity Recognition using Con-
    ditional Random Fields and Local Grammars. In: chair), N.C.C., Choukri, K.,
    Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani,
    J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., Tokunaga, T. (eds.) Proceed-
    ings of the Eleventh International Conference on Language Resources and Evalua-
    tion (LREC 2018). European Language Resources Association (ELRA), Miyazaki,
    Japan (may 2018)


                                          432
           Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019)


20. Pirovani, J.P.C., Spalenza, M.A., Oliveira, E.: Geração Automática de Questões
    a Partir do Reconhecimento de Entidades Nomeadas em Textos Didáticos. In:
    XXVIII Brazilian Symposium on Computers in Education (Simpósio Brasileiro de
    Informática na Educação - SBIE 2017). vol. 28, pp. 1147–1156. Sociedade Brasileira
    de Computao - SBC, Recife, Brasil (2017)
21. Pirovani, J.P.C.: CRF+ LG: Uma Abordagem Hı́brida para o Reconhecimento de
    Entidades Nomeadas em Português. Ph.D. thesis (2019)
22. Rocha, C., Jorge, A., Sionara, R., Brito, P., Pimenta, C., Rezende, S.: PAMPO:
    Using Pattern Matching and Pos-tagging for Effective Named Entities Recognition
    in Portuguese (2016), http://arxiv.org/abs/1612.09535
23. Sandra Collovini, Joaquim Santos, B.C.J.T.R.V.P.Q.M.S.D.B.C.R.G.C.C.a.X.:
    Portuguese Named Entity Recognition and Relation Extraction Tasks at IberLEF
    2019 (2019)
24. Sang, E.F., Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-
    Independent Named Entity Recognition. In: Proceedings of the seventh conference
    on Natural language learning at HLT-NAACL 2003-Volume 4. pp. 142–147. Asso-
    ciation for Computational Linguistics, Stroudsburg, PA, USA (2003)
25. Santos, C.N., Guimaraes, V.: Boosting Named Entity Recognition with Neural
    Character Embeddings. In: Proceedings of the Fifth Named Entities Workshop,
    ACL 2015. pp. 25–33. Association for Computational Linguistics, Stroudsburg,
    PA, USA (2015)
26. Santos, C.N., Milidiú, R.L.: Entropy Guided Transformation Learning: Algorithms
    and Applications. Springer-Verlag London, London, United Kingdom (2012)
27. Santos, D., Cardoso, N.: Reconhecimento de Entidades Mencionadas em Português:
    Documentação e Actas do HAREM, a Primeira Avaliação Conjunta na Área. Lin-
    guateca (2007), http://www.linguateca.pt/aval conjunta/LivroHAREM/Livro-
    SantosCardoso2007.pdf
28. Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields. Foun-
    dations and Trends R in Machine Learning 4, 267–373 (2012)
29. Yang, J., Zhang, Y., Dong, F.: Neural Reranking for Named Entity Recognition.
    arXiv preprint arXiv:1707.05127 (2017)
30. Zhang, B., Pan, X., Lin, Y., Zhang, T., Blissett, K., Kazemi, S., Whitehead, S.,
    Huang, L., Ji, H.: RPI BLENDER TAC-KBP2017 13 Languages EDL System. In:
    Proceedings of the Tenth Text Analysis Conference (TAC2017). NIST, Maryland,
    USA (2017)


                                           433

</pre>