<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Vicomtech at MEDDOPROF: Automatic Information Extraction and Disambiguation in Clinical Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>SNLT group at Vicomtech Foundation</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Basque Research</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Technology Alliance (BRTA)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikeletegi Pasealekua</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donostia/San-Sebastian</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spain</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ezotova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>agarciap</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>mcuadrosg@vicomtech.org</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Languages and Computer Systems. University of the Basque Country (UPV-EHU)</institution>
          ,
          <addr-line>Leioa</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the Vicomtech NLP team in the MEDDOPROF shared task. The challenge consists in automatic detection of occupations and employment status, as well as their normalization or entity mapping, within medical documents in Spanish language. The competition is split into three tasks, NER, CLASS and NORM. We have participated using a multitask joint model based on Transformers, which tries to solve all the three tasks at once. However, the NORM task, which consists on disambiguation of the detected entities against thousands of di erent possible codes, can be solved more e ectively using other approaches. Because of that, we have submitted an additional sequence-to-sequence based approach and a semantic-search based approach to deal with the NORM task. We achieve a 77% of F1score for the NER task, and 70% of F1-score for the CLASS task, and a 48% of F1-score for the NORM task.</p>
      </abstract>
      <kwd-group>
        <kwd>Clinical Text</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Automatic Indexing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This article presents the participation of the Vicomtech NLP team in the
MEDDOPROF Shared Task: Medical Documents Profession Recognition shared task
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The shared task consists in developing systems for automatic detection of
occupations and employment status, as well as their normalization or entity
mapping, within medical documents in Spanish language. The target data consists
in a corpus of clinical case reports from heterogeneous medical specialities.
      </p>
      <p>
        The competition is divided into three tasks. The rst task, NER, requires
automatically nding mentions of occupations and classifying each of them as a
profession, an employment status or an activity. The second task, CLASS,
requires classifying mentions of occupations to determine whether they are related
to the patient, to a family member, to a health professional or to someone else.
Finally, the third task, NORM, requires mapping the task 1 predictions to one of
the codes in a list of unique concept identi ers. We refer the reader to the shared
task overview article [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for more detailed information about MEDDOPROF.
      </p>
      <p>The rest of the document is structured as follows. Section 2 introduces the
data provided by the organizers of the challenge. Sections 3 and 4 describe
our submitted systems and the training setup, respectively. Section 5 presents
the o cial results. In section 6, we discuss some decisions taken during the
development and training phases, inherent aws of our systems, and potential
improvements. Finally, section 7 provides some concluding remarks and future
work hints.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data description</title>
      <p>The provided corpus is a collection of 1844 clinical cases from over 20 di erent
specialties annotated with professions and employment statuses. The gold
annotations for NER and CLASS are provided in Brat format [12] (see Figure
1), while the codes for the NORM task are provided as a .tsv le with codes
assigned to each profession/activity in the corpus (see Table 1). It must be noted
that, in this regard, the NER and NORM tasks are related because the input
for the NORM task are the entities detected in the NER task.
caso clinico psiquiatria95 haber dejado el ejercito 2562 2586 SCTID: 73438004
caso clinico psiquiatria23 le ha llevado a despedirse 2127 2153 SCTID: 73438004
caso clinico urologia302 medico de Atencion Primaria 185 212 2211.1</p>
      <p>In addition, the organizers provided an extension of the dataset with an
extra set of labels for di erent entities: symptoms, diseases, procedures, negation
markers and negation spans, etc. The entities do not count toward the
competition evaluation, but the organizers encourage the participants to make use of
them to develop more interesting and complete systems.</p>
      <p>The organizers also provide a list of valid codes related to professions, labour
activities and occupations from SNOMED Clinical Terms (www.snomed.org)
(50 codes) and European Skills, Competences, Quali cations and Occupations
(ESCO) (ec.europa.eu/esco) classi cations (3508 codes). Both SNOMED CT
and ESCO are described as a machine-readable multilingual thesaurus with an
ontological foundation.</p>
      <p>The core concepts of the ontology are: concept codes|numerical codes that
identify clinical terms, organized in hierarchies; descriptions|textual
descriptions of concept codes; and relationships between concepts. SNOMED CT
comprehensive coverage includes a large variety of concepts such as symptoms,
diagnoses, procedures, body structures etc. The use of SNOMED CT within this
competition is restricted to classifying activities and employment status.
European Skills, Competences, Quali cations and Occupations (ESCO) is a
multilingual classi cation of skills, competences, quali cations and occupations relevant
for the EU labor market and education.
We have approached the challenge as a joint multitask end-to-end model based
on Transformers, trying to solve the three tasks at the same time. However, the
NORM task can be solved more e ectively using other techniques and separated
models, so we have competed with several di erent approaches for this third task.
3.1</p>
      <sec id="sec-2-1">
        <title>Multitask joint model</title>
        <p>The multitask joint model tries to solve all the tasks, including the detection of
the extended entity set, using a single model based on transformers.</p>
        <p>
          Except for the NORM task, the other tasks are treated as regular
sequencelabelling tasks. At the core, there is a pre-trained BERT model that encodes the
texts, converting each token into a contextual word-embedding. These
embeddings are the base for several classi cation heads that perform a IOB tagging
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>This regular sequence labelling approach solves the NER task and the CLASS
task. However, the joint model also tries to deal with the NORM task treating
with a hierarchical classi cation approach.</p>
        <p>The hierarchical classi cation consists of a bunch of classi cation heads, one
per each non-terminal node of the hierarchy, that are trained at the same time.
The ESCO codes, used in this task to identify the professions, follow an
hierarchical structure being the rst digit the most coarse grained category. Each
following digit adds a more ne-grained de nition of the profession the code is
describing. The key di erence with a at classi er is that, instead of trying to
select a code from a at list of potentially thousand of codes, a tree is built
with the codes, level by level. Each node of the tree has only a limited amount
of children nodes according to the actual ESCO hierarchy. These non-terminal
nodes are turned into classi ers, and their children are the output size of each
of those classi ers.</p>
        <p>For the training, each ESCO code is decoded so only the appropriate
classi ers have something to predict. The rest of the nodes are forced to predict a
special \OUT" value. At inference time, the nal code is reconstructed from the
root of the hierarchy, classi er after classi er, following the hierarchy structure.
The resulting code is emitted when the current classi er predicts the special value
"OUT", or a leaf node (with no further children in the hierarchy) is reached.</p>
        <p>This approach is suitable for this kind of task, but has several disadvantages.
It is computationally expensive depending on the size of the hierarchy: for the
ESCO codes involved in this competition it resulted in about 800 node-classi ers.
Also, it is complex to implement, and each hierarchy may have subtleties that
must be taken into account when modelling the tree structure and how the codes
are encoded/decoded into a set of nodes. Finally, since there are a lot of nodes
to train, the amount of training data available in the competition might not be
enough.</p>
        <p>Due to this reason, for the NORM task we have tried several additional
approaches that are described in the next subsections.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Seq2Seq Translation System for NORM Task</title>
        <p>
          The second approach to tackle NORM task is self-attention Transformer
architecture [14]. We adapt sequence to sequence modelling (seq2seq) [
          <xref ref-type="bibr" rid="ref2">13, 2</xref>
          ] to
the task of mapping terms from clinical texts to their codes in SNOMED CT
and ESCO classi cations. Term description is a source input for encoder and
it's code in the corresponding ontology is a target input for decoder. High-level
architecture of the system is depicted in Figure 3.
        </p>
        <p>In order to train the mapping system, we prepare the training corpus as
follows. The set of valid codes consists of 3.558 unique codes, some of them have
various synonymous de nitions, speci cally, the number of synonyms varies from
1 to 38. We split all the multiple term de nitions and assign a corresponding
code to each synonym and get a dataset arranged as shown in Table 3. Here we
can see that one code may be presented by various highly similar de nitions.</p>
        <p>Furthermore, we combine the training set of the NORM task and
descriptions of all codes from the SNOMED CT and ESCO ontologies, which results
in 297 unique codes with a distribution that ranges from 1 to 182 examples per
code. Finally, we obtain 15.869 examples for train set and reserve 346 examples
for development set (10% of original train set provided for the task). The dataset
is highly unbalanced: only 10% of the codes has more than 10 examples per code.</p>
        <p>During the text preprocessing step the source terms are lower-cased, cleaned
from punctuation and tokenized on word-level. The target codes are not
preprocessed, just tokenized by space. Number of tokens in source examples varies
from 1 to 22, and in target set it is 1 or 2 depending on code type. We train the
jefe de producto de las TIC 1330.6
jefa de producto de las TIC 1330.6
encargado de la gestion de productos de las TIC 1330.6
encargada de la gestion de productos de las TIC 1330.6
product manager de las TIC 1330.6
jefa de producto de las TI 1330.6
jefa de producto de las TICs 1330.6
jefe de producto de las TI 1330.6
jefe de producto de las TICs 1330.6
desempleado SCTID: 73438004
trabajador SCTID: 106541005
refugiado SCTID: 446654005
model with the parameters of the transformer architecture shown in the Table
4.</p>
        <p>This approach has several advantages. First, it turns an extremely large
multi-class classi cation problem into an straightforward sequence-to-sequence
approach. Another advantage is that the pairs of codes and their descriptions,
which are already de ned in ontologies and vocabularies, can be leveraged as
extra training instances complementing the actual training data.
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Semantic similarity mapping for NORM Task</title>
        <p>The following mapping system for NORM task is based on the idea of semantic
search. Semantic search is an information retrieval method that leverages
semantic similarity measure to retrieve semantically close documents. The main
objective of semantic similarity is to measure the distance between the vectors
that represent a pair of words, sentences, or documents. The key concepts of
semantic search are the following: query, collection of documents, and degree of
relevance between a query and retrieved documents.</p>
        <p>We adapt the method to map terms written in natural language to the codes
in SNOMED CT and ESCO classi cations. In this case, a term previously
detected by the NER system (see Subsection 3.1) as PROFESION, ACTIVIDAD
o SITUACION LABORAL is used as the query to search the closest document.
The collection of documents is represented by SNOMED CT and ESCO
ontologies provided by the organizers. The codes are separated by synonyms as shown
in Table 3, so each code has various descriptions. The descriptions are the
documents to search through. To compute a notion of similarity between a term and
a code description we use the cosine distance.</p>
        <p>
          We have experimented with di erent pretrained language models to create
common vector space for terms and code descriptions. We have selected LaBSE
model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], because it obtained the best F-score during the experimentation.
LaBSE is a BERT sentence embedding model supporting 109 languages. It is
developed using masked language modelling [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and translation language
modelling [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with a translation ranking task using bi-directional dual encoders.
        </p>
        <p>Since the type of a term (i.e. whether it is a profession, and activity or an
employment status) is detected in previous task, we execute the mapping process
in two ways: 1) search in SNOMED CT and ESCO separately; 2) search in the
database where SNOMED CT and ESCO codes are united. Figure 4 depicts the
basic algorithm of semantic search applied to the NORM task independent from
the database.</p>
        <p>For the case of separate search, we select the closest description with the
following condition: if assigned tag is SITUACION LABORAL or ACTIVIDAD,
the term is to be search in SNOMED CT database (50 codes), if the tag is
PROFESION, the term is to be searched in ESCO database (3554 codes). In our
experiments, the terms tagged as SITUACION LABORAL, and thus mapped to
the SNOMED CT codes, reached a micro-F1 score of 0.577, while and
PROFESION terms mapped to ESCO obtained a micro-F1 score of 0.215. This suggests
that, as could be expected, the performance of the semantic search method is
in uenced by the number of target elements and to which extent they are
semantically separable.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Training setup and submitted systems</title>
      <p>We have participated in all the tasks proposed by the competition. The rst
two tasks, NER and CLASS, have been only dealt with the multitask joint
model. For the NORM task we have submitted di erent runs using di erent
approaches. The rst approach is the same multitask model, since it aims to
predict all the information requested in the competition in a single step.</p>
      <p>
        In order to train the multitask joint model we used IXAmBERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and
BETO [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as the pre-trained BERT models that form the core of the model.
We have experimented with the two because both of them are pre-trained using
Spanish data. After validation in the development set, BETO seemed to obtain a
slight advantage, so nally we decided to make the submission using the
BETObased multitask model.
      </p>
      <p>
        The multitask joint model has been implemented in Python 3.7 with
HuggingFace's transformers library [15] (github.com/huggingface/transformers) and
it has been trained on a Nvidia GeForce RTX 2080ti GPU with 11GB of
memory. The learning rate was set to 2E-5 and the optimizer was AdamW [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
During the training the micro-F1 score of the predictions on the development
set was monitored, with 100 epochs of early stopping patience. That means that
the model continued training until reaching 100 consecutive epochs without any
improvement in the validation metric.
      </p>
      <p>
        The NORM task Transformer model implemented in OpenNMT toolkit[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and its PyTorch based framework OpenNMT-py (opennmt.net/OpenNMT-py).
The model was trained on on a Nvidia GeForce RTX 2080ti GPU with 11GB
of memory, with learning rate set to 2E-5 and the optimizer AdamW [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] during
10.000 steps. The best model was selected by the micro-F1 score.
      </p>
      <p>
        Semantic similarity inference implemented with Sentence Transformers
library [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] on a Nvidia GeForce RTX 2080ti GPU with 11GB of memory.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>The multitask-joint model performs reasonably well for NER and CLASS
tasks. The F-scores scores for NER and CLASS tasks achieved by our multitask
joint model are 25.6% and 31.7% above the baseline respectively. For NORM
task the best performing system is the one that uses a sequence-to-sequence
approach based on transformers.</p>
      <p>The score for the NORM task, even for the best performing system, is below
the baseline score. A possible explanation is that the most frequent codes are
repeated a lot of times and the the baseline approach can easily nd those
common codes very straightforwardly.</p>
      <p>Since the NORM task input is the output of the NER task, and the
participants do not have access to any gold-labelled input for the NORM task, the
competing systems need to rely on the imperfect outcomes of the corresponding
NER system. This fact results in an error accumulation that lowers the nal
score. To clarify this point, it would be interesting to compare our results with
other participants.</p>
      <p>At the time of writing these working notes, the o cial ranking with the
scores from all the participants has not been published yet, so we cannot assess
to which extent our results are competitive.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>In order to better understand the behaviour and the result of some of our
submitted systems, we have carried out some error analysis to pose some discussion
points for future work.</p>
      <p>In the NORM task we see the following challenging issues:
{ The Seq2Seq Translation system (see Subsection 3.2) seems to be biased
due to unbalanced dataset: some codes have only one description while the
others have more than 130.
{ Hierarchical structure of the ESCO classi cation and short descriptions lead
to many semantically close terms that are labelled with di erent codes. This
leads to codes that are "almost" correctly predicted, in the sense of that only
the most ne-grained part of the code is incorrect. However this counts as
an error regardless of how close the predicted code was from the correct one.
For instance the term \vendedora en un comercio pequen~o" (\salesperson in
a small business") is manually labelled as code 5223 (Asistentes de venta de
tiendas y almacenes - sales assistants of shops and warehouses) and the
system predicts code 5223.7 (vendedor especializado/vendedora especializada
specialized salesperson).
{ The Seq2Seq Translation system performance is highly in uenced by
hyperparameters and other facts that deserve further experimentation.
{ The Semantic mapping method presented in this article is straightforward
and does not require previous training. However, the system fails mainly
in mapping semantically close terms. It performs better when the search
database is of moderate size and the documents are more semantically
separable.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In these working notes we have presented Vicomtech's participation in
MEDDOPROF shared task. We have participated with a multitask joint model based
on Transformers, which solves the three tasks, NER, CLASS and NORM. In
addition, we have presented another two systems to solve the NORM task. The
multitask joint model works for the three tasks at the same time, although the
NORM task can be better tackled using other approaches, such as using a
sequence to sequence approach to map terms and codes. The quantitative results
seem reasonable, but at the moment of this writing the o cial score ranking
has not been published, so we cannot perform any comparison against other
participants to conclude if our proposed systems are competitive or not.</p>
      <p>All in all, the objective of the proposed tasks in relevant and interesting, and
it is still far from being solved. In order to keep improving the results, apart
from trying new approaches, more experimentation will be needed to improve
some design decisions and chose better hyper-parameter settings that seem to
highly in uence the performance of the systems.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially funded by the projects DeepText (KK-2020-00088,
SPRI, Basque Government) and DeepReading (RTI2018-096846-B-C21,
MCIU/AEI/FEDER, UE).
12. Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: A
Web-based Tool for NLP-assisted Text Annotation. In: Proceedings of the
Demonstrations at the 13th Conference of the European Chapter of the Association for
Computational Linguistics (EACL '12). pp. 102{107 (2012)
13. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to Sequence Learning with Neural
Networks. In: Proceedings of the 27th International Conference on Neural
Information Processing Systems - Volume 2. p. 3104{3112. NIPS'14, MIT Press,
Cambridge, MA, USA (2014)
14. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, L., Polosukhin, I.: Attention Is All You Need. In: Proceedings of the
Thirtyrst Conference on Advances in Neural Information Processing Systems (NeurIPS
2017). pp. 5998{6008 (2017)
15. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P.,
Rault, T., Louf, R., Funtowicz, M., Brew, J.: HuggingFace's Transformers:
Stateof-the-art Natural Language Processing. arXiv:1910.03771 pp. 1{11 (2019)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.: Spanish</given-names>
          </string-name>
          <string-name>
            <surname>Pre-Trained BERT</surname>
          </string-name>
          Model and
          <article-title>Evaluation Data</article-title>
          .
          <source>In: Proceedings of the Practical ML for Developing Countries Workshop</source>
          at the Eighth International Conference on Learning
          <source>Representations (ICLR</source>
          <year>2020</year>
          ). pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cho</surname>
            , K., van Merrienboer,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning Phrase Representations using RNN Encoder{Decoder for Statistical Machine Translation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1724</volume>
          {
          <fpage>1734</fpage>
          . Association for Computational Linguistics, Doha, Qatar (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lample</surname>
          </string-name>
          , G.:
          <article-title>Cross-lingual Language Model Pretraining</article-title>
          . In: Wallach,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Beygelzimer</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>d'</surname>
            Alche-Buc,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garnett</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          . vol.
          <volume>32</volume>
          . Curran Associates, Inc. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <issue>4186</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arivazhagan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Language-agnostic BERT Sentence Embedding (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senellart</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rush</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>OpenNMT: Open-Source Toolkit for Neural Machine Translation</article-title>
          .
          <source>In: Proceedings of ACL</source>
          <year>2017</year>
          ,
          <article-title>System Demonstrations</article-title>
          . pp.
          <volume>67</volume>
          {
          <fpage>72</fpage>
          . Association for Computational Linguistics, Vancouver, Canada (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lima-Lopez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farre-Maduell</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranda-Escalada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Briva-Iglesias</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Nlp applied to occupational health: Meddoprof shared task at iberlef 2021 on automatic recognition, classi cation and normalization of professions and occupations from medical texts</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Decoupled Weight Decay Regularization</article-title>
          .
          <source>In: Proceedings of the Seventh International Conference on Learning Representations (ICLR</source>
          <year>2019</year>
          ). pp.
          <volume>1</volume>
          {
          <issue>18</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Otegi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campos</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
          </string-name>
          , E.:
          <article-title>Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque</article-title>
          .
          <source>In: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          . pp.
          <volume>436</volume>
          {
          <issue>442</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ramshaw</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcus</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Text Chunking Using Transformation-based Learning</article-title>
          .
          <source>In: Natural language processing using very large corpora</source>
          , pp.
          <volume>157</volume>
          {
          <fpage>176</fpage>
          . Springer (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation</article-title>
          .
          <source>In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (11</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>