<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NLPyPort: Named Entity Recognition with CRF and Rule-Based Relation Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jo~ao Ferreira</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hugo Goncalo Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Ro</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Informatics and Systems of the University of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>College of Education of the Polytechnic Institute of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Informatics Engineering of the University of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>468</fpage>
      <lpage>477</lpage>
      <abstract>
        <p>This paper describes the application of the NLPyPort pipeline to Named Entity Recognition (NER) and Relation Extraction in Portuguese, more precisely in the scope of the IberLEF-2019 evaluation task on the topic. NER was tackled with CRF, based on several features, and trained in the HAREM collection, but results were low. This was partly caused by an issue on the submitted model, which had been trained in lowercase text, but, apparently, also due to the training data used, which highlights the di erent natures of HAREM, the source of the majority of the testing corpus, and SIGARRA. Relations were extracted with a set of rules bootstrapped from the examples provided by the organisation. Despite an F1-score of 0.72, we were the only participants in this task. We also express our doubts concerning the utility of the extracted relations.</p>
      </abstract>
      <kwd-group>
        <kwd>NLP</kwd>
        <kwd>Pattern Based</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>NER</p>
      <p>CRF</p>
      <p>Relation Extraction</p>
      <p>PoS Tagging
Natural Language Processing (NLP) often starts with initial (pre-)processing
tasks that add structural and linguistic information to the text. Modules for
those initial tasks are often assembled in pipelines, which are thus the
cornerstone of NLP. Ensuring the best results for each of those modules will often
provide better results for higher-level applications that rely on them.</p>
      <p>
        Despite the existence of NLP pipelines with modules ready to use or trainable
for di erent languages, they generally fail to target language-speci c aspects.
Having this in mind, we worked on improving some modules of the Natural
Language Toolkit (NLTK) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a popular NLP pipeline in Python, for Portuguese.
This resulted in NLPyPort [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which tackles similar aspects as its Java
counterpart, NLPPort [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Initial improvements focused on Tokenization and Part-of-Speech (PoS)
Tagging, and a new Lemmatization module was developed. A module for Named
Entity Recognition (NER) was then developed, based on Conditional Random
Fields (CRF) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for predicting the BIO tags of the entities, and assembled
into the pipeline using the CRF Suite [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. More recently, we started to work on
a module for fact extraction, currently based on rules composed by PoS tags,
discovered in a bootstrapping fashion.
      </p>
      <p>
        This paper describes how NLPyPort was used in the IberLEF 2019 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] tasks
on Portuguese Named Entity Recognition and Relation Extraction. More
precisely, tasks 1 and 2 were addressed, respectively Named Entity Recognition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
and Relation Extraction for Named Entities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This participation allowed us
to test the pipeline, as well as to pin-point possible existing problems.
      </p>
      <p>In the next sections, the architectural design is described as follows: Section 2
brie y describes the CRF con gurations used in this shared task. Section 3
describes how the current version of FactPyPort, our fact extraction, was
developed, using PoS tags as rules. Section 4 analyses the results obtained for both
NER and Relation extraction Tasks, and explains the reasons found for failure
with pos-evaluation experiments. Section 6 concludes the paper and discusses
lines for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>CRFs for Named Entity Recognition</title>
      <p>In order to identify entities, a new NER module was assembled and implemented
into NLPyPort, using as a starting point the already existing CRF Suite. In the
past, CRFs have lead to good results in sequence processing, including NER.
Moreover, using CRF Suite revealed to be much simpler to train than NLTK's
NER module, and at least as easy to use. Although it relies on text in the CoNLL
2003 format, using BIO tags for delimiting entities, this format is quite common
and easy to convert to.</p>
      <p>Training a new module only requires new data, while features to exploit can
also be set accordingly. In our case, we went further than just the sequence of
words and exploited the following features, in a 5-token context window:
punctuation sign; only ASCII characters; only lowercase characters; only uppercase
characters; only alphabetic characters; only numbers; alphanumeric; starts with
an uppercase character; ends with an `a'; ends with with a `s'; token shape (`A'
means uppercase, `a' lowercase, `#' number, `-' punctuation); length; pre xes
and su xes with lengths from 1 to 5.</p>
      <p>
        The PoS tags and lemmatized words were obtained using the previous
modules of the pipeline. These features are the same Lopes et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used for NER
in Portuguese clinical text.
      </p>
      <p>
        As mentioned earlier, our model could be trained with any collection where
named entities (NEs) are annotated, using the CoNLL 2003 format. For
Portuguese, we know of three collections of this kind: the First and the
Second HAREM [
        <xref ref-type="bibr" rid="ref17 ref9">17, 9</xref>
        ]; and SIGARRA [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Though not originally created in
this format, both of them were later made available this way, in the scope of
Andre Pires MSc thesis [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ](https://github.com/arop/ner-re-pt), though
with some simpli cations, in the case of HAREM | e.g., removal of the type
attribute. However, besides not tackling exactly the same categories, an
important di erence between the collections of HAREM and SIGARRA is that the
former was annotated and extensively revised by a team of ve people,
following well-de ned guidelines, while the latter, to the best of our knowledge, was
annotated by a single MSc student. This is mainly why we decided to train our
model only on the HAREM collection. To meet the shared task guidelines, we
changed the names of the entity categories accordingly.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>FactPyPort</title>
      <p>
        The FactPyPort module is the most recent addition to the NLPyPort pipeline,
with its current version developed speci cally for extracting relations in the scope
of this shared task. It thus deserves a more detailed description. As it happened
in this task, given the delimitation of NEs, FactPyPort relies on a set of examples
for generating a number of PoS-based rules in a bootstrapping fashion, inspired
by earlier work by Hearst [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or by Pantel &amp; Pennachiotti [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
3.1
      </p>
      <sec id="sec-3-1">
        <title>PoS tag-based patterns</title>
        <p>The core idea of the developed system was to take advantage of the existing
patterns between words, while maintaining somewhat of an open set of relations.
Given that the goal of the shared task was not to nd relations of a speci c type,
but rather words that described the relation, it was decided that all the rules had
to rely on patterns valid for any sentence. A way to do this kind of generalisation
was to consider not the words themselves, but rather their PoS tags.</p>
        <p>This way, the system could take a sentence with previously annotated NEs
and the words that described the relations, then save the con gurations of PoS
tags that correspond to the relation, and use this as a new rule. After extracting
a set of rules, the system assumes that a relation is present in a sentence if the
PoS-tag sequence matches those of a rule, and outputs the corresponding words.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Learning, ranking and rules</title>
        <p>In order to learn the PoS patterns necessary for identifying relations, a set of
annotated example sentences was given. The following is an example of a pattern
converted into a rule:
{ 1x0111010xpunc art adj prp n prp art</p>
        <p>The rule is divided into three parts, denoted by the x symbol. The rst part
(in this case, a 1) is the rule ranking. Since the nal system ended up being
sorted by rule size, this was not used. In previous versions of the system, rules
were ranked by frequency, saved here.</p>
        <p>The second part is a set of 1s and 0s. There is a number for each of the
following tags, for all the words between the two NEs. A 1 indicates that the
word with the corresponding PoS tag is part of the relation, while the 0 indicates
that it is not a part of the relation.</p>
        <p>As mentioned earlier, the last part is the sequence of PoS tags between the
two NEs.</p>
        <p>To extract a new rule from the annotated data, the system rst gets the
words between the two NEs and their PoS tags. After this, it nds the position
of the words indicated as being part of the relation by setting the number in the
corresponding indices to 1.</p>
        <p>In order to ensure better results, the rules are sorted by size. This way, we
start with higher speci city and low coverage, and progress to a point of lower
speci city and higher coverage. A highly speci c rule can look somewhat like:
{ 1x000000000000000010000100000111xart n punc pron pron v adv art
n n conj adv v v adv v prp art n punc v prp art n n punc punc
prp art n</p>
        <sec id="sec-3-2-1">
          <title>While a high coverage rule can be as simple as:</title>
          <p>{ 1x1xpunc
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Generating additional rules</title>
        <p>In order to get more rules from one example, a second step is made. It consists of
striping the sentence of the tags with 0. The result is a smaller rule with only 1s,
which means all the words are part of the relation. This can be useful because,
even if a longer sentence is not an exact match, parts of it may be and therefore
a relation may be found.</p>
        <p>This is illustrated with the following example, with a detected rule and the
reduced generated rule. First detected rule:
{ 1x000000010000100011xprp prop punc pron n v n adv art n n conj
v prp art n n adv</p>
        <sec id="sec-3-3-1">
          <title>Reduced generated rule:</title>
          <p>{ 1x1111xadv v n adv
3.4</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Training data</title>
        <p>One of the challenges that come with training a system of this kind is the low
number of available examples that identify the relations, especially those
adhering to the shared task guidelines. For this reason, the training data considered
comprised only the 100 examples released by the task organisers. We believed,
and came also to the conclusion, that the system trained with only these was
fairly good, but that the results of the system could be improved if trained with
more data. The PoS tags necessary for training were obtained with NLPyPort.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation and Result analysis</title>
      <p>In the following sections, we present the obtained results for each of the tasks,
discuss them and share thoughts on how they can be improved.
4.1</p>
      <sec id="sec-4-1">
        <title>NER Results</title>
        <p>The results for the Task 1, in Table Table 1, took us by surprise, since we did
not expect them to be so low.</p>
        <p>Once we saw the results, we noticed that we had trained our model in the
same corpus where some sentences were taken from, HAREM. In fact, the higher
performance for the VAL category is explained by this fact. But it made us think
why it was not even higher.</p>
        <p>To come up with a reason, we started inspecting the process that lead to
the creation of the model, and soon noticed that we had submitted a faulty
model, trained in a lowercase version of the corpus, where an important feature
of entities of many categories is lost (starting with uppercase). We cannot say
much about the results in the clinical and police datasets, because they are not
available. Yet, after xing the previous issue and training in the same collection,
for the general dataset, our results were improved. The main di erence was
that NEs of the VAL category were now perfectly recognised, as expected, given
that the training data was the same as the testing. Also because of this, it
should be perceived as a meaningless result. Moreover, F1 improved about 5
points for all the remaining categories, except TME, for which the tagging in
HAREM is probably inconsistent with SIGARRA. Even with this improvement,
performance is quite low, which suggests that SIGARRA and HAREM are very
di erent, and a (sequence prediction) model trained in one will not perform well
on the other.</p>
        <p>Moving on, we could not help thinking what would have happened if, instead
of training the model with HAREM, we had opted instead to train it with
SIGARRA, or even with both HAREM and SIGARRA, given that they are
both publicly available and ready to use by our CRF. Therefore, in order to
validate the model, we trained the CRF with both SIGARRA and HAREM and
tested it in the general dataset.</p>
        <p>As expected, F1 was above 92 for all categories, with an overall result of
95.36. Although this helped us con rm that the model is working well, we stress
that it is by no means an indicator of its quality, because all testing data was
included in the training collections.</p>
        <p>Even though the results were unsatisfying, our participation in this task
helped us validating and xing some issues with our NER module. Plus, we
showed that training the CRF with the HAREM corpus is not enough to achieve
a good performance in NER for Portuguese in any textual source. Anyway, with
more data, it should be possible to improve the results obtained.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Relation Extraction Results</title>
        <p>The rule-based version of FactPyPort was used for Task 2 | Relation Extraction
for Named Entities. In this task, NEs were already tagged, thus making the goal
of the system easier: to identify relations between the tagged NEs. FactPyPort
was trained according to the description in Section 3 and the results obtained
this way are in Table 2. The \Exactly" column measures when the system was
able to select all the words that are part of the relation, and \Partial" considers
also cases when only part of the relation was identi ed.</p>
        <p>Exactly Partial
Precision Recall F1 Precision Recall F1
0.736 0.711 0.7235 0.7662 0.748 0.757</p>
        <p>Table 2. Results of the FactPyPort system.</p>
        <p>Despite the simplicity of the model, results in this task were interesting.
However, being the only system participating in this task, we do not have a
baseline to which we can compare against.</p>
        <p>Still, upon analysing the results obtained and comparing them with the
expected results, we may speculate that a possible explanation for the high results,
given the reduced training data, could be the test data itself. For the three most
frequent relations in the testing data, Table 3 shows their number of occurrences
and the corresponding proportion. Those were also frequent relations in the
examples provided by the organization and used for bootstrapping the rules. All
relations of those three types were correctly identi ed (partially, in a minority of
situations), with the system additionally suggesting some additional (incorrect)
relations of this kind, especially of the type em. Table 4 has examples of the
latter. This suggests that there is some over- tting.</p>
        <p>We further noticed that ve identi ed relations had an empty type (\") (see
Table 4), even though, in the expected output, there were no instances of this
kind. This happens because the system tries to nd a relation between the given
NEs and, when no rule matches, it returns an empty relation. This could be
turned o , and possibly minimised with more training data.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Working notes</title>
      <p>The NLPyPort pipeline and its source code is available at https://github.
com/jdportugal/NLPyPort. The source code for running the NER system used
for Task 1, for both the submitted and the current versions, is available at
https://github.com/jdportugal/CRFIBERLEF. The README le contains all
the instructions.</p>
      <p>The FactPyPort system used for Task 2 is available at https://github.com/
jdportugal/FactPyPortIBERLEF. For running it, all the requirements should be
installed (requirements.txt le) and the following line should be ran on the
terminal:
&gt;&gt; python FactPort.py input file &gt; output file</p>
      <p>The output le will contain the results in the format speci ed for Task 2.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>Named Entity Recognition and Relation Extraction are arguably highly-relevant
NLP tasks. As time goes by, NER for the Portuguese language is improving and
getting results closer to those for English, but, at the same time, there is still
much room for improvement. In fact, we ended up con rming that training a
CRF in one of the most popular collections with annotated NEs in Portuguese
is not enough for a good-performing model in another collection where NEs
of similar categories are annotated. Coming up with a proper reason for this
would require further analysis, but the di erent nature of both collections, as
well as their annotation guidelines and processing might have an impact on this.
Moreover, we used a simpli ed version of HAREM, which originally is a much
richer collection.</p>
      <p>Anyway, with our pipeline and the integrated CRF module, we hope that
more people will have access to an easy-to-use tool for the computational
processing of Portuguese, namely in Python, and that interesting applications may
come from this. We will, for sure, make further experiments, such as analysing
the impact of di erent feature sets, among those used.</p>
      <p>We should also refer that, once we found that no training data was released
for the NER task, nor detailed guidelines, we had several doubts regarding our
participation. Although the NE categories targeted in this task are common and
broadly used, not everything is always consensual and several speci cities need
to be clari ed. Despite this, the organisers minimised this issue by answering
questions through the tasks' Google Group.</p>
      <p>Despite the interesting results achieved, our approach to the Relation
Extraction task was fairly simple. Relations were identi ed based on a set of rules
applied to the sequence of PoS tags between NEs, bootstrapped from a few
examples provided by the organisers.</p>
      <p>Although more training examples could be used for generating additional
rules, increasing coverage and, thus, performance, we are not sure whether the
relations covered in this task are the most useful for us. Information extraction
has the goal of acquiring meaningful data, while the majority of the relation
types considered (e.g., `de', `(' or `em') are too vague, di cult to interpret and,
thus, hardly useful.</p>
      <p>
        Having this in mind, it is in our plans to develop new versions of FactPyPort,
following alternative approaches, covering Open Information Extraction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], or
training a CRF for speci c types of relation, as others did for Portuguese [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
possibly using available data (e.g., by Batista et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). Future versions will, of
course, be integrated in NLPyPort.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was supported by FCT's INCoDe 2030 initiative, in the scope of the
demonstration project AIA, \Apoio Inteligente a Empreendedores (Chatbots)".</p>
      <p>NLPyPort: NER with CRF and Rule-Based Relation Extraction</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Collovini de Abreu,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Vieira</surname>
          </string-name>
          , R.: Relp:
          <article-title>Portuguese open relation extraction</article-title>
          .
          <source>Knowledge Organization</source>
          <volume>44</volume>
          (
          <issue>3</issue>
          ),
          <volume>163</volume>
          {
          <fpage>177</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forte</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Extracca~o de relaco~es sema^nticas de textos em portugu^es explorando a dbpedia e a wikipedia</article-title>
          .
          <source>Linguamatica</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <volume>41</volume>
          {
          <fpage>57</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>NLTK: The Natural Language Toolkit</article-title>
          .
          <source>In: Proceedings of the ACL 2004 Interactive Poster and Demonstration Sessions</source>
          . pp.
          <volume>214</volume>
          {
          <fpage>217</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Machado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A sequence model approach to relation extraction in Portuguese</article-title>
          .
          <source>In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ). pp.
          <year>1908</year>
          {
          <year>1912</year>
          . ELRA, Portoroz, Slovenia (May
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>dos Santos</surname>
            ,
            <given-names>H.D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
          </string-name>
          , R.:
          <article-title>Annotating relations between named entities with crowdsourcing</article-title>
          .
          <source>In: Natural Language Processing and Information Systems - 23rd International Conference on Applications of Natural Language to Information Systems</source>
          ,
          <string-name>
            <surname>NLDB</surname>
          </string-name>
          <year>2018</year>
          . pp.
          <volume>290</volume>
          {
          <fpage>297</fpage>
          . Paris, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Collovini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Claro</surname>
            ,
            <given-names>D.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glauber</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , a Xavier,
          <string-name>
            <surname>C.C.</surname>
          </string-name>
          :
          <article-title>Portuguese named entity recognition and relation extraction tasks at IberLEF 2019</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , CEUR-WS.org (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christensen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Mausam: Open information extraction:
          <article-title>The second generation</article-title>
          .
          <source>In: Proceedings of 22nd International Joint Conference on Arti cial Intelligence</source>
          . pp.
          <volume>3</volume>
          {
          <fpage>10</fpage>
          .
          <source>IJCAI</source>
          <year>2011</year>
          , IJCAI/AAAI, Barcelona, Spain (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Goncalo</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Rodrigues</surname>
          </string-name>
          , R.:
          <article-title>Improving NLTK for processing Portuguese</article-title>
          .
          <source>In: Symposium on Languages, Applications and Technologies (SLATE</source>
          <year>2019</year>
          ). pp.
          <volume>18</volume>
          :
          <issue>1</issue>
          {
          <issue>18</issue>
          :
          <fpage>9</fpage>
          . OASIcs, Schloss Dagstuhl (
          <year>June 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carvalho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goncalo</surname>
            <given-names>Oliveira</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Mota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            :
            <surname>Second</surname>
          </string-name>
          <string-name>
            <surname>HAREM</surname>
          </string-name>
          <article-title>: advancing the state of the art of named entity recognition in Portuguese</article-title>
          .
          <source>In: Proceedings of 7th International Conference on Language Resources and Evaluation. LREC</source>
          <year>2010</year>
          , ELRA,
          <string-name>
            <surname>La</surname>
            <given-names>Valleta</given-names>
          </string-name>
          , Malta (May
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>In: Proceedings of 14th Conference on Computational Linguistics</source>
          . pp.
          <volume>539</volume>
          {
          <fpage>545</fpage>
          . COLING 92, ACL Press, Morristown, NJ, USA (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.D.,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional random elds: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In: Proc. 18th International Conference on Machine Learning</source>
          . pp.
          <volume>282</volume>
          {
          <fpage>289</fpage>
          . ICML '
          <fpage>01</fpage>
          , Morgan Kaufmann (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lopes</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teixeira</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goncalo</surname>
            <given-names>Oliveira</given-names>
          </string-name>
          , H.:
          <article-title>Named entity recognition in portuguese neurology text using crf</article-title>
          .
          <source>In: Proceedings of 19th EPIA Conference on Arti cial Intelligence</source>
          . p. In press (
          <year>September 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Okazaki</surname>
          </string-name>
          , N.:
          <article-title>CRFsuite: a Fast Implementation of Conditional Random Fields (CRFs) (</article-title>
          <year>2007</year>
          ), http://www.chokkan.org/software/crfsuite/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pantel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennacchiotti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Espresso:
          <article-title>Leveraging generic patterns for automatically harvesting semantic relations</article-title>
          .
          <source>In: Proceedings of 21st International Conference on Computational Linguistics</source>
          and
          <article-title>44th annual meeting of the Association for Computational Linguistics</article-title>
          . pp.
          <volume>113</volume>
          {
          <fpage>120</fpage>
          . ACL Press, Sydney, Australia (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Pires</surname>
            ,
            <given-names>A.R.O.</given-names>
          </string-name>
          :
          <article-title>Named Entity Extraction from Portuguese Web Text</article-title>
          .
          <source>Master's thesis</source>
          , Faculdade de Engenharia da Universidade do Porto (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Goncalo</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.:</surname>
          </string-name>
          <article-title>NLPPort: A Pipeline for Portuguese NLP</article-title>
          .
          <source>In: Proceedings of 7th Symposium on Languages, Applications and Technologies (SLATE '18)</source>
          . pp.
          <volume>18</volume>
          :
          <issue>1</issue>
          {
          <issue>18</issue>
          :
          <fpage>9</fpage>
          . OpenAccess Series in Informatics, Schloss Dagstuhl |
          <article-title>Leibniz-Zentrum fu</article-title>
          r Informatik, Dagstuhl Publishing, Germany (
          <year>June 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardoso</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilela</surname>
          </string-name>
          , R.: HAREM:
          <article-title>An advanced NER evaluation contest for Portuguese</article-title>
          .
          <source>In: Proceedings of 5th International Conference on Language Resources and Evaluation (LREC'06)</source>
          . ELRA, Genoa, Italy (May
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>