<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Contextual Representations and Semi-Supervised Named Entity Recognition for Portuguese Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro Vitor Quinta de Castro</string-name>
          <email>I.pedrovitorquinta@inf.ufg.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nadia Felix Felipe da Silva</string-name>
          <email>II.nadia@inf.ufg.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anderson da Silva Soares</string-name>
          <email>III.anderson@inf.ufg.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidade Federal de Goias</institution>
          ,
          <addr-line>Goia</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>nia GO 74690-900</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>411</fpage>
      <lpage>420</lpage>
      <abstract>
        <p>Named Entity Recognition is a Natural Language Processing task which is di cult to adapt across di erent domains. In this work, we propose a Semi-Supervised approach using Deep Learning models in order to support three di erent domains for the Portuguese language: general, police and medical. We perform the self-training of a model with an architecture based on a Bidirectional Long Short-Term Memory network with a Conditional Random Fields sequential classi er, using ve Portuguese corpora. The word representations of the proposed model are contextual and provided by ELMo's language model. The results achieve a competitive performance in the IberLEF evaluation forum.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Neural Networks</kwd>
        <kwd>Portuguese Language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Information Extraction (IE) is the process of obtaining structured data from
sources which can not be interpreted directly by machines, like texts [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This
is particularly important considering the amount of textual information which is
exchanged every minute on the internet [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Named Entity Recognition (NER)
is the Natural Language Processing (NLP) task which focus on identifying and
classifying named entities from this unstructured textual information, making
them interpretable and accessible to di erent communication channels.
      </p>
      <p>When dealing with multiple domains, a NER prediction model needs to be
able to handle not only the di erence of lexicon between them, but also the
di erence of morphological features. This adds an additional layer of complexity
to this task, requiring a more scalable model to perform well in this challenge.</p>
      <p>
        This paper describes our participation in IberLEF (Iberian Languages
Evaluation Forum), Task 1: Named Entity Recognition [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. We present a system
based on di erent deep learning architectures for both NER model and word
representations. We propose a semi-supervised training in order to deal with the
di erent domains targeted by the evaluation.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The rst Deep Learning architectures to be applied in NER models were based
on CNNs [
        <xref ref-type="bibr" rid="ref32 ref5">5, 32</xref>
        ], and later on Recurring Neural Networks (RNN)[
        <xref ref-type="bibr" rid="ref11 ref17 ref22 ref4 ref9">9, 11, 4, 17, 22</xref>
        ].
The reason why Deep Learning models perform well on NLP tasks is because
they learn latent features from words, as well as the interactions between them,
during the training of speci c tasks, such as NER.
      </p>
      <p>
        Collobert et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a model based on a Multilayer Perceptron with
a convolutional layer, and the following works for NER were mostly based on
bidirectional LSTMs, with a few di erences between them. Huang et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
used a biLSTM-CRF network with manually selected features, combined with
features from SENNA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] word embeddings. Chiu and Nicols [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used a biLSTM
model without the CRF layer for classi cation, and had their best results with
character level features extracted from a CNN layer, concatenated with SENNA
embeddings. Lample et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and Ma and Hovy [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] used similar approaches
based on biLSTM-CRF models, with the di erence that [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used a biLSTM
to extract character level features, combined with Word2Vec [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
representations, while [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] used a CNN to extract the character level features, that were
combined with GloVe [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] embeddings. These works show that biLSTM-CRF
networks became a standard architecture for NER models (as well as for other
NLP sequential classi cation tasks). Following works focused on representation
of the words, instead of the actual NER model. Language models have been the
primary architecture for contextualized word representations.
      </p>
      <p>
        Peters et al. [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], Devlin et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Akbik et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] developed di erent
architectures for contextual word representations based on bidirectional
language models and evaluated their performance on the NER task (as well as
on other NLP tasks). Both papers, [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] and [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] used a biLSTM-CRF baseline
NER model for evaluating their representation models, while [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] evaluated his
model by adding a neural layer to the language model, performing the NER
classi cation with it. The ELMo (Embeddings from Language Model)
representations from [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] are provided by the biLM language model, which is based on 2
biLSTM networks, with 2 layers each, and the model's input is a character level
representation provided by a CNN network. In another way, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] created BERT, a
language model based on the Transformer [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] architecture, which is based only
on the neural mechanism of attention. The author from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] created a language
model on character level, in a way that his objective was not to predict words,
but characters. The architecture of his CharLM model is also based on a biLSTM
network. Table 1 lists the models presented on this section with their respective
F-Score performance on the English benchmark from CoNLL-2003 [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
      </p>
      <p>
        For Portuguese language, the rst work that used a Deep Learning approach
was from Dos Santos and Guimar~aes [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], who adapted the architecture from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
and proposed CharWNN. For this work, besides using character level features
from CNN, the authors also used word embeddings that were pre-trained using
the Word2Vec tool [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. Da Costa and Paetzold [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Quinta de Castro et al.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used a BiLSTM-CRF architecture with minor di erences between them. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
concatenated character level features from a BiLSTM network with FastText [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
word embeddings, prior to passing this concatenation through another BiLSTM
network. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used a similar approach from [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and concatenated the character
level features from a BiLSTM network with the representations of a second
BiLSTM, which processed pre-trained Wang2Vec [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] embeddings.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Model</title>
      <p>
        In this work, we propose a system based on di erent deep learning architectures,
similar to that was used by [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]: a Bidirectional Long Short-Term Memory
(BiLSTM)[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] NER model with a Conditional Random Fields (CRF)[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] sequential
classifer; fed by the contextual word representations from an ELMo [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] language
model, combined with character level representations from a Convolutional
Neural Network (CNN) [
        <xref ref-type="bibr" rid="ref18 ref8">8, 18</xref>
        ]. Our system di ers from [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] in the way that we do not
use pre-trained word embeddings, and we use two di erent ELMo models, one
for the general domain of Portuguese language, and one for the police domain.
      </p>
      <p>
        The ELMo embeddings are obtained using the biLM (bidirectional Language
Model) [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] architecture. This architecture is based on 2 BiLSTM networks, each
of them responsible for one direction in the bidirectional language model: one
for keeping a representation while making predictions in the forward direction of
the text and one for the reverse direction. The rst layer from the biLM model
produces character level features from the training words using two CNNs, one
for each direction of the text, each of them with 2048 convolutional lters. They
produce a representation with a total dimension of 4096, which is fed to the
rst BiLSTM layer of the biLM model. Each layer of the model (the CNN and
the two BiLSTMs) projects the input it receives to a vector of dimension 1024.
These 3 projections represent the ELMo embeddings which are produced by the
biLM model. The size of the biLM training vocabulary determines the amount
of words that will be predicted in the Softmax layer of the model, as shown in
gure 1.
      </p>
      <p>⟶
CNN
⟶
BiLSTM</p>
      <p>⟶
BiLSTM
1024-dim Projection
1024-dim Projection
1024-dim Projection
Softmax for LM prediction
⟵
CNN</p>
      <p>⟵
BiLSTM</p>
      <p>⟵
BiLSTM
2048-dim
(Each CNN)
4096-dim
(Each BiLSTM)
4096-dim
(Each BiLSTM)
Vocabulary
Dimension</p>
      <p>
        The BiLSTM-CRF architecture used in this work is the same from the
AllenNLP framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], following a parameterization similar to the one described
in [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] for the NER task. The CNN network used for producing character level
features from words used embeddings with dimension 16 and 128 convolutional
lters of size 3, with the ReLU [
        <xref ref-type="bibr" rid="ref12 ref26">12, 26</xref>
        ] activation function. The BiLSTM network
used for encoding the words has 2 layers, with 200 hidden units each. Figure 2
shows the dimensionality of the word representations obtained from the CNN and
the 2 ELMo embeddings used. The 2 ELMo we use were trained in two separate
domains: for the general Portuguese domain we used a Portuguese Wikipedia
[
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] dump, and for the police domain we used a 1.6 billion word corpus created
from public documents from Brazil's Labor Courts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The Portuguese ELMo
model we trained is publicly available at https://allennlp.org/elmo. For the
IberLEF evaluation, we performed the ne tuning of this ELMo in this combined
dataset, following [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>Dimension 128</p>
      <p>Dimension 1024</p>
      <p>Dimension 1024
Character Level</p>
      <p>ELMo</p>
      <p>ELMo
Dimension 200</p>
      <p>Words Representations</p>
      <p>Dimension 200
⟶
LSTM
⟶
LSTM
⟵
LSTM
⟵</p>
      <p>LSTM
⟶⟵
[ht, ht]
Dimension 400</p>
      <p>
        CRF
For the Portuguese NER task, IberLEF speci ed the evaluation of models in
three di erent domains: general, police and clinical. For the speci c domains only
person names (PER category) are annotated, while the general domain dataset is
annotated with 5 di erent categories: person, place (PLC), organization (ORG),
value (VAL) and time (TME). The following public corpora were used for the
model proposed in this work: WikiNER [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], LeNER-Br [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], HAREM I [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]
and MiniHAREM [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] golden collections, and Paramopama [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We also used a
private legal corpus provided by the Datalawyer company, consisting of 76
annotated documents from the Brazilian Labor Court. The only dataset annotated
with all ve categories is HAREM. These corpora have the following categories
annotated in them:
{ HAREM: Place, Organization, Person, Time, Value, Abstraction, Work,
      </p>
      <p>Event, Thing and Other;
{ LeNER-Br: Legal Case, Law, Place, Organization, Person and Time;
{ Paramopama: Place, Organization, Person and Time;
{ WikiNER: Place, Miscellaneous, Organization and Person;
{ Datalawyer: Function, Legal Basis, Place, Organization, Person, Court,
Settlement Value, Pleed Value, Conviction Value, Court Costs and District.</p>
      <p>Since only the HAREM datasets contains all the categories needed for the
IberLEF evaluation, we adopted a semi-supervised approach training for an
initial NER model to perform the self-training of the nal model. This training had
the following procedure:
1. For each one of the datasets, we ignored all the entities that were not
annotated as one of the 5 relevant categories for this evaluation. Their annotation
was removed;
2. We merged the datasets from HAREM, LeNER-Br and Paramopama, and
randomly split them into training, validation and test sets;
3. The resulting datasets from the previous step were used to train a NER
model for bootstrapping Time and Value annotations for the datasets that
didn't contain these categories;
4. The bootstrap model was used to annotate:
4.1. Time and Value entities in the WikiNER dataset;
4.2. Value entities in the LeNER-Br dataset;
4.3. Value entities in the Paramopama dataset;
4.4. Time and Value entities in the Datalawyer dataset.
5. The resulting boostrapped corpora were merged and split into training,
validation and test sets;
6. The resulting datasets from the previous step were used to train the nal
NER model that was submitted to the IberLEF evaluation.</p>
      <p>None of the existing annotations was removed or overriden during the
bootstrapping of the datasets. Only words that prior to this process had no category
associated to them were classi ed as either Time or Value, according to the
bootstrap model.
4.1</p>
      <p>
        Models Evaluation
Prior to submitting the NER model with word representations from 2 ELMo and
a CNN (henceforth referred to as 2xELMo+CNN), we performed the training of
two other models, with di erent types of word representation: (i) ELMo+CNN
and (ii) ELMo+CNN+Wang2Vec [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. These two models use only the
general domain ELMo. We performed the training of these three models using the
same con guration, and performed an additional evaluation of them in the
following datasets: MiniHAREM, test datasets from Datalawyer Company and
LeNER-Br, and the full datasets from Paramopama and WikiNER. For all of
them, except MiniHAREM, we evaluated both variants: with and without
bootstrapped Time and Value entities. The best model with the best F-Score was
ELMo+CNN+Wang2Vec, followed by 2xELMo+CNN.
      </p>
      <p>We also evaluated the three models in all nine datasets (MiniHAREM,
Datalawyer, LeNER-Br, Paramopama and WikiNER, with these last four being
evaluated in the original dataset, and the bootstrapped dataset). The 2xELMo+CNN
had the best results for the MiniHAREM dataset, as well as for the datasets in
the police domain (Datalawyer and LeNER-Br datasets). ELMo+CNN had the
best results for Paramopama and WikiNER. After grouping these evaluation
results by model, the best mean F-Score was from the 2xELMo+CNN variant.
Since 2xELMo+CNN performed better in the police domain (which is relevant
for the IberLEF evaluation), we chose this model for the task evaluation.
For the Portuguese NER task of the Iberian Languages Evaluation Forum, we
experimented with di erent systems based on deep learning architectures, for
both NER model and word representations. For the NER model we used the
BiLSTM-CRF architecture, which became a reference for sequential classi
cation NLP tasks. For word representations we experimented with character level
features from Convolutional Neural Networks, Wang2Vec pre-trained word
embeddings, and the ELMo embeddings from a biLM language model. We
evaluated di erent models with di erent types of word representations in 5 di erent
corpora, and submitted a system based on 2 di erent ELMo, combined with
character level features. Our model was trained in a semi-supervised scenario, in
order to account for the lack of certain types of categories in the used corpora.</p>
      <p>Our main contribution is the use of ELMo embeddings for the Portuguese
NER task, which have not been reported so far in the related literature. Our
pre-trained ELMo model is publicly available at https://allennlp.org/elmo.</p>
      <p>For future work, instead of training a single NER model with di erent ELMo
representations for di erent domains, we will experiment with an ensemble of
di erent models, each one trained separately in a di erent domain.
6</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>Thanks to Datalawyer (https://www.datalawyer.com.br/) for the nancial
support and for providing the legal dataset used for training the submitted model.
This work was developed in Deep Learning Brazil research group. Our researches
are sponsored by Copel Energy Distribution, Data-H Arti cial Intelligence,
CyberLabs Arti cial Intelligence, Americas Health and iFood Food Delivery.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akbik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blythe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vollgraf</surname>
          </string-name>
          , R.:
          <article-title>Contextual string embeddings for sequence labeling</article-title>
          .
          <source>In: COLING</source>
          <year>2018</year>
          , 27th International Conference on Computational Linguistics. pp.
          <volume>1638</volume>
          {
          <issue>1649</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. AllenNLP:
          <article-title>An open-source nlp research library, built on pytorch</article-title>
          . (
          <year>2018</year>
          ), https://allennlp.org/, [Online; accessed 06-July-2019]
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Quinta de Castro,
          <string-name>
            <given-names>P.V.</given-names>
            , Felix Felipe
            <surname>da Silva</surname>
            , N., da Silva
          </string-name>
          <string-name>
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Portuguese named entity recognition using lstm-crf</article-title>
          . In: Villavicencio,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Abad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Caseli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gamallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ramisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Goncalo</surname>
          </string-name>
          <string-name>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Paetzold</surname>
          </string-name>
          , G.H. (eds.)
          <source>Computational Processing of the Portuguese Language</source>
          . pp.
          <volume>83</volume>
          {
          <fpage>92</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichols</surname>
          </string-name>
          , E.:
          <article-title>Named entity recognition with bidirectional LSTM-CNNs</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>4</volume>
          ,
          <issue>357</issue>
          {370 (Dec
          <year>2016</year>
          ), https://www.aclweb.org/anthology/Q16-1026
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuksa</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>12</volume>
          ,
          <issue>2493</issue>
          { 2537 (Nov
          <year>2011</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>1953048</volume>
          .
          <fpage>2078186</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. da Costa,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Paetzold</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.H.:</surname>
          </string-name>
          <article-title>E ective sequence labeling with hybrid neural-crf models</article-title>
          . In: Villavicencio,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Abad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Caseli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gamallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ramisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Goncalo</surname>
          </string-name>
          <string-name>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Paetzold</surname>
          </string-name>
          , G.H. (eds.)
          <source>Computational Processing of the Portuguese Language</source>
          . pp.
          <volume>490</volume>
          {
          <fpage>498</fpage>
          . Springer International Publishing (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fukushima</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition una ected by shift in position</article-title>
          .
          <source>Biological Cybernetics</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ),
          <volume>193</volume>
          {202 (Apr
          <year>1980</year>
          ), https://doi.org/10.1007/BF00344251
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>rahman Mohamed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Speech recognition with deep recurrent neural networks</article-title>
          .
          <source>CoRR abs/1303</source>
          .5778 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <volume>1735</volume>
          {1780 (Nov
          <year>1997</year>
          ), http://dx.doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.
          <fpage>1735</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bidirectional lstm-crf models for sequence tagging</article-title>
          .
          <source>CoRR abs/1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jarrett</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.:
          <article-title>What is the best multi-stage architecture for object recognition?</article-title>
          <source>In: 2009 IEEE 12th International Conference on Computer Vision</source>
          . pp.
          <volume>2146</volume>
          {
          <issue>2153</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Bag of tricks for e cient text classi cation</article-title>
          .
          <source>In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>2</volume>
          ,
          <string-name>
            <given-names>Short</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <volume>427</volume>
          {
          <fpage>431</fpage>
          . Association for Computational Linguistics, Valencia,
          <source>Spain (Apr</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Junior</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macedo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bispo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbosa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Paramopama: a Brazilian-Portuguese Corpus for Named Entity Recognition</article-title>
          .
          <source>Tech. rep.</source>
          , Universidade Federal de Sergipe (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. de Justica,
          <string-name>
            <surname>C.N.</surname>
          </string-name>
          :
          <article-title>Processo judicial eletro^nico (pje) (</article-title>
          <year>2019</year>
          ), http://www.cnj.jus.
          <article-title>br/tecnologia-da-informacao/processo-judicial-</article-title>
          <string-name>
            <surname>eletronicopje</surname>
          </string-name>
          , [Online; accessed 06-July-2019]
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.D.,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional random elds: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In: Proceedings of the Eighteenth International Conference on Machine Learning</source>
          . pp.
          <volume>282</volume>
          {
          <fpage>289</fpage>
          . ICML '
          <fpage>01</fpage>
          , Morgan Kaufmann Publishers Inc. (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural architectures for named entity recognition</article-title>
          .
          <source>In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>260</volume>
          {
          <fpage>270</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Le Cun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boser</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denker</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henderson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hubbard</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jackel</surname>
          </string-name>
          , L.D.:
          <article-title>Handwritten digit recognition with a back-propagation network</article-title>
          .
          <source>In: Proceedings of the 2Nd International Conference on Neural Information Processing Systems</source>
          , pp.
          <volume>396</volume>
          {
          <fpage>404</fpage>
          . NIPS'89, MIT Press, Cambridge, MA, USA (
          <year>1989</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2969830</volume>
          .
          <fpage>2969879</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trancoso</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Extension of the original word2vec using di erent architectures</article-title>
          . url: https://github.com/wlin12/wang2vec
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trancoso</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Two/too simple adaptations of Word2Vec for syntax problems</article-title>
          .
          <source>In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>1299</volume>
          {
          <fpage>1304</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. Luz de Araujo, P.H., de Campos, T.E.,
          <string-name>
            <surname>de</surname>
            <given-names>Oliveira</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.R.R.</given-names>
            , Stau er, M.,
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Bermejo</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Lener-br: a dataset for named entity recognition in brazilian legal text</article-title>
          .
          <source>In: International Conference on the Computational Processing of Portuguese (PROPOR)</source>
          . pp.
          <volume>313</volume>
          {
          <fpage>323</fpage>
          .
          <source>Lecture Notes on Computer Science (LNCS)</source>
          , Springer, Canela,
          <string-name>
            <surname>RS</surname>
          </string-name>
          ,
          <source>Brazil (September</source>
          <volume>24</volume>
          -26
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>End-to-end sequence labeling via bi-directional LSTM-CNNsCRF</article-title>
          . In:
          <article-title>Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          . pp.
          <volume>1064</volume>
          {
          <fpage>1074</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Maynard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augenstein</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Natural language processing for the semantic web</article-title>
          .
          <source>Synthesis Lectures on the Semantic Web: Theory and Technology</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <volume>1</volume>
          {
          <fpage>194</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>CoRR</source>
          (
          <year>2013</year>
          ), http://arxiv.org/abs/1301.3781
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D</given-names>
          </string-name>
          . (eds.):
          <article-title>Desa os na avaliaca~o conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM</article-title>
          .
          <string-name>
            <surname>Linguateca</surname>
          </string-name>
          (
          <year>2008</year>
          ), http://www.linguateca.pt/LivroSegundoHAREM/, iSBN:
          <fpage>978</fpage>
          -
          <lpage>989</lpage>
          -20-1656-6
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Recti ed linear units improve restricted boltzmann machines</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on International Conference on Machine Learning</source>
          . pp.
          <volume>807</volume>
          {
          <fpage>814</fpage>
          . ICML'
          <volume>10</volume>
          ,
          <string-name>
            <surname>Omnipress</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Nothman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ringland</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curran</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Learning multilingual named entity recognition from wikipedia</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>194</volume>
          ,
          <fpage>151</fpage>
          {
          <fpage>175</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. Nuno Cardoso:
          <article-title>Harem e miniharem: Uma analise comparativa (7</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <fpage>1543</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <volume>2227</volume>
          {
          <fpage>2237</fpage>
          . Association for Computational Linguistics (
          <year>Jun 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Sandra</surname>
            <given-names>Collovini</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Joaquim</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.C.J.T.R.V.P.</given-names>
            <surname>Q.M.S.D.B.C.R</surname>
          </string-name>
          .G.,
          <string-name>
            <surname>Xavier</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          :
          <article-title>Portuguese named entity recognition and relation extraction tasks at iberlef</article-title>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32. dos Santos,
          <string-name>
            <surname>C.</surname>
          </string-name>
          , Guimara~es, V.:
          <article-title>Boosting named entity recognition with neural character embeddings</article-title>
          .
          <source>In: Proceedings of the Fifth Named Entity Workshop</source>
          . pp.
          <volume>25</volume>
          {
          <fpage>33</fpage>
          . Association for Computational Linguistics, Beijing, China (Jul
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardoso</surname>
          </string-name>
          , N.:
          <article-title>Reconhecimento de entidades mencionadas em portugu^es: Documentaca~o e actas do HAREM, a primeira avaliaca~o conjunta na area</article-title>
          .
          <source>Linguateca (November</source>
          <year>2007</year>
          ), http://www.linguateca.pt/LivroHAREM/, iSBN:
          <fpage>978</fpage>
          -
          <lpage>989</lpage>
          -20-0731-1
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Schultz</surname>
          </string-name>
          , J.:
          <article-title>How much data is created on the internet each day? url: https://blog.microfocus.com/how-much-data-is-created-on-the-interneteach-day/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Tjong Kim Sang</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meulder</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the conll-2003 shared task: Language-independent named entity recognition</article-title>
          .
          <source>In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4</source>
          . pp.
          <volume>142</volume>
          {
          <fpage>147</fpage>
          . CONLL '
          <volume>03</volume>
          ,
          <string-name>
            <surname>Association for Computational Linguistics</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
          </string-name>
          , L.u.,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          . Curran Associates, Inc. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          <article-title>: Wikipedia - a free encyclopedia (</article-title>
          <year>2019</year>
          ), https://www.wikipedia.org/, [Online; accessed 06-July-2019]
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38. Word2vec:
          <article-title>Tool for computing continuous distributed representations of words</article-title>
          . (
          <year>2013</year>
          ), https://code.google.com/archive/p/word2vec/, [Online; accessed 06-July2019]
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>