<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Language Transfer for Identifying Diagnostic Paragraphs in Clinical Notes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Di Liello</string-name>
          <email>luca.diliello@unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Uryupina</string-name>
          <email>uryupina@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Moschitti</string-name>
          <email>moschitti@unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper aims at uncovering the structure of clinical documents, in particular, identifying paragraphs describing “diagnosis” or “procedures”. We present transformer-based architectures for approaching this task in a monolingual setting (English), exploring a weak supervision scheme. We further extend our contribution to a cross-lingual scenario, mitigating the need for expensive manual data annotation and taxonomy engineering for Italian.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Big Data approaches have been shown to yield
a breakthrough to a variety of healthcare-related
tasks, ranging from eHealth governance and
policy making to precision medicine and smart
solutions/suites for hospitals or individual doctors.
They rely on large-scale and reliable automatic
processing of vast amounts of heterogeneous data,</p>
      <p>Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
i.e., images, lab reports and, most importantly,
textual medical documentation.</p>
      <p>The current paper focuses on Medical
Discourse Analysis: imposing structure on digitalized
health reports through document segmentation and
labeling of relevant segments (e.g., diagnoses).
Identifying and interpreting discourse fragments is
essential for accurate and robust Information
Extraction from medical documents. In terms of
doctor assistance, such a system could quickly and
reliably identify the most crucial parts of
voluminous health records, allowing to highlight them
for improved visibility and thus reducing
cognitive load on doctors. For example, a highlighted
problematic diagnosis can alert a doctor perusing
a large medical dossier. In terms of automated data
analytics, discourse structure is crucial for correct
interpretation of extracted information. For
example, if we want to study a possible correlation
between the use of a specific medicine and some
outcome, we should only consider documents where
this medicine is mentioned as a part of therapy,
but not as a part of allergies.</p>
      <p>Some medical documents are generated using
task-specific eHealth software imposing certain
discourse structure. In Italy, however, there is
no single software adopted at either national or
regional levels. While there is a general
agreement on the nature of information to be included,
there are no guidelines or programmatic
implementations for structuring it. In addition,
historical records, produced before the adoption of
recording software, follow the logic of individual
doctors and thus show even more variability. We
aim therefore at a statistical model that is able to
infer the discourse structure without making any
assumptions on the recording software.</p>
      <p>An important advantage of our approach is its
adaptability to new domains (e.g., radiology
reports) or languages as well as its robustness in
the (highly probable) scenario where new
reportgenerating systems appear at the market.</p>
      <p>Several recent studies (Sec. 2) focus on segment
labeling for medical records in English. To our
knowledge, no approach has been proposed so far
to analyze medical discourse structure
automatically in other languages, including, most
importantly, Italian. The required research is hampered
by the lack of resources in other languages,
ranging from no data annotated for discourse structure,
either for training or for benchmarking, to lack of
high-coverage resources, e.g., taxonomies. In our
study, we propose a language transfer approach to
the problem of medical discourse analysis in
Italian. We first investigate possibilities for training
robust monolingual models (Sec. 4) and then build
upon our monolingual results to transfer the model
in another language (Sec. 5).
2</p>
    </sec>
    <sec id="sec-2">
      <title>State of the Art</title>
      <p>In the past decade, a massive effort has been
invested into analyzing automatically textual
medical data (clinical notes). The notes’ internal
logic is crucial for interpreting their underlying
semantics, thus enabling better understanding and
interoperability. This has given rise to
empirical studies on the medical document structure:
reliable and interpretable annotation guidelines
and systems for automatically segmenting clinical
notes and annotating segments with labels such as
allergy or diagnosis.</p>
      <p>The most thorough attempt at defining
clinical records’ structure via a taxonomy of
section headers has been undertaken by Denny et al.
(2008). This study developed SecTag—a
hierarchical header terminology, supporting mappings
to LOINC and other taxonomies. Table 1 shows
some SecTag entries related to diagnosis and
their parameters relevant for the present study.1
The SecTag concepts (column 1) are organized
hierarchically, with specific diagnoses (e.g.,
admission or discharge diagnoses) being subnodes
(column 2) of the main diagnosis concept (SecTag
node “5.22”). Different ways of expressing the
specific semantics via headers (column 3) are then
linked to the corresponding nodes. SecTag
advocates a practical data-driven approach, thus listing
headers that are not always grammatical (e.g.,
“admit diagnosis”), provided they are commonly used
1SecTag entires contain 16 parameters, inheriting
information from referenced taxonomies such as LOINC, most of
them are of no practical relevance in our case and, moreover,
are typically set to NULL.
by practicing clinicians. Most importantly,
SecTag goes beyond a superficial view of the task, not
only linking easily identifiable headers, (e.g., most
common spellings, headers containing important
key words), but also organising hierarchically
concepts that are normally expressed in very distinct
ways (e.g., linking “cause of death” or “gaf” to
diagnoses). In total, SecTag provides 94 entries just
for diagnosis. This shows that a considerable
medical expertise is required for creating a similar
resource for other languages from scratch.</p>
      <p>
        The SecTag release has led to the development
of a related method for automatic identification of
sections in clinical notes
        <xref ref-type="bibr" rid="ref4">(Denny et al., 2009)</xref>
        , via
a combination of NLP techniques,
terminologybased rules, and naive Bayes classification.
      </p>
      <p>
        While the SecTag approach exhibits
remarkable performance, creation and maintenance of the
header taxonomy is a very expensive task
requiring considerable medical expertise. More
datadriven approaches have been proposed recently for
English
        <xref ref-type="bibr" rid="ref2 ref8">(Rosenthal et al., 2019; Dai et al., 2015)</xref>
        ,
among others. These systems, however, require
manually labeled data.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data for Identifying Diagnoses and</title>
    </sec>
    <sec id="sec-4">
      <title>Procedures Segments</title>
      <p>3.1</p>
      <sec id="sec-4-1">
        <title>English Data: MIMIC-III</title>
        <p>
          Several large collections of medical data, with
partial NLP annotations, have been released recently,
for example, MIMIC
          <xref ref-type="bibr" rid="ref6">(Johnson et al., 2016)</xref>
          or
I2B22. Unfortunately, none of these resources
provide annotation for discourse structure. Our study
relies on the MIMIC-III dataset, extending it with
an extra layer to label diagnosis and procedure
fragments. Our choice follows practical
motivations: it is the largest available dataset, most
commonly used by the AI community. We only rely on
the textual data from MIMIC discharge notes (the
NOTESEVENTS table), however, a future work
can explore possibilities of joint modeling of
textual and numeric data (e.g., lab measurements).
        </p>
        <p>We have built a rule-based algorithm for
annotating MIMIC with diagnosis/procedure
fragments. We segment a note into fragments and
label them based on the headers, looking them
up in SecTag (Section 2). For fragments with
no header, we propagate the label from the
previous fragment. Fragments with headers not
2https://www.i2b2.org/
concept taxonomy tree id header
diagnoses 5.22 diagnosis
principle diagnosis 5.22.39 primary diagnoses
diagnosis at death 5.22.41 cause of death
admission diagnosis 5.22.44 admit diagnosis
discharge diagnosis 5.22.45 discharge diagnosis
global assessment functioning 5.22.49.58.11 gaf
total documents
paragraphs per doc
diagnoses per doc
documents with no diagnosis
procedures per doc
documents with no procedure
found in SecTag are considered − diagnosis,
− procedure. The headers are then removed
from the document, thus forcing the model to learn
paragraph classification from the textual content,
relying on headers as a silver supervision signal.</p>
        <p>While a typical MIMIC note has a single
diagnostic paragraph, some contain multiple
diagnostic fragments: (i) some notes span multiple related
reports, where each report comes with its own
diagnosis; (ii) some notes contain semantically
different diagnostic sections (e.g., “admitting
diagnosis” and “discharge diagnosis”); (iii) some notes
cover complex cases and the diagnostic section is
expressed in several (consecutive) paragraphs.</p>
        <p>Since SecTag predates major MIMIC releases,
some popular headers are missing—we have
therefore manually extended the taxonomy (6.7k
headers) to cover another 75 of the most popular
headers. The expansion yielded a considerable
increase in procedure paragraphs, augmenting
drastically the number of positive examples for
training the procedure classifier. At the same time,
the overall precision improved, eliminating some
consistent errors with diagnosis paragraphs. In
what follows, we always rely on data preprocessed
with expanded SecTag.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2 Italian Data: Exprivia Datasets</title>
        <p>A large collection of discharge reports in Italian
has been provided by Exprivia S.p.a. The
documents show some similarity to MIMIC discharge
reports: they are typically 0.5-1 page long, they
can be split into paragraphs rather reliably, they
exhibit a considerable variability in terms of the
underlying discourse structure. Each document is
associated with a set of ICD-9 codes for discharge
diagnoses. Yet, similarly to MIMIC, no inline
manual annotation is provided for identifying
textual segments referring to diagnoses/procedures.</p>
        <p>To provide accurate test data for our
multilingual approach, a human expert has conducted a
manual annotation of the Italian set. We have
labeled a pilot of 10 notes and a random
sample of 100 notes. The annotation only covered
diagnosis as our pilot phase revealed that
labeling procedure required considerably more
elaborate guidelines and medical training.</p>
        <p>Table 2 compares document statistics for
discharge notes from MIMIC-III and Exprivia
datasets. It suggests that the pilot can only be used
as a very preliminary sample of the data: the notes
are rather small and with few diagnoses. The
Italian documents from exprivia-100 show a striking
similarity to MIMIC: there are on average around
25-30 paragraphs per document, 1.2-1.3 of which
are diagnostic. The major difference comes from
the documents with no diagnosis (27% in Italian,
14.5% in English). We believe that this
similarity reflects the fact that, despite differences in
national and local healthcare regulations as well as
individual practicing/recording approaches,
clinical notes reflect a common underlying semantics
and thus a language transfer model can be
successful for our task, mitigating the need for very
time-consuming and costly expert effort on
constructing taxonomies similar to SecTag in Italian.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Transformer-Based Architectures for</title>
    </sec>
    <sec id="sec-6">
      <title>Diagnosis and Procedure Extraction</title>
      <p>
        Transformer-based models have recently become
the standard in NLP. Models like BERT
        <xref ref-type="bibr" rid="ref5">(Devlin
et al., 2019)</xref>
        and ELECTRA
        <xref ref-type="bibr" rid="ref1">(Clark et al., 2020)</xref>
        showed impressive performance when compared
to previous state of the art. These models are based
on the Transformer block
        <xref ref-type="bibr" rid="ref10">(Vaswani et al., 2017)</xref>
        ,
which exploits the attention mechanism to find
relations between all pairs of tokens in the input text
and thus creates deep contextualized
representations. Transformer layers can be stacked to create
more powerful and refined models. For
computational efficiency, we focus on architectures with no
more than 12 layers.
      </p>
      <p>Tokenization. Raw text cannot be provided
directly to a transformer-based model: it is first
tokenized using a fixed-size vocabulary, created via a
segmentation algorithm, e.g., WordPiece. We
extended BERT vocabulary to account for eventual
deidentified medical input.</p>
      <p>Pre-training and fine-tuning.
Transformerbased models are usually trained in a 2-step
fashion. The model is first pretrained on a huge
amount of artificially labelled text taken from
sources like Wikipedia or CommonCrawl. At
the fine-tuning stage, the model is adapted to a
specific task, e.g., Question Answering or
Diagnosis Extraction. Since the model is already able
to create good contextualized representations,
the fine-tuning requires only a small amount
of manually labelled examples. Following the
common transformer fine-tuning practices, we
classify paragraphs into ± diagnosis with a
binary classification head on top of the first token
output.
5</p>
    </sec>
    <sec id="sec-7">
      <title>Language Transfer for Diagnosis</title>
    </sec>
    <sec id="sec-8">
      <title>Identification</title>
      <p>The main bottleneck for NLP on medical data in
Italian lies in the lack of annotated data and
professionally created resources, similar to SecTag. To
mitigate this issue, we advocate a language
transfer approach, combining our transformer models
(Section 4) with state-of-the-art machine
translation (MT).</p>
      <p>We investigate three cross-lingual setting. In
the baseline set up, we do not perform any
translation, relying on BERT’s tokenizer and
cross</p>
      <sec id="sec-8-1">
        <title>Transformer</title>
        <p>BERT-base-uncased
BERT-base-cased
ELECTRA-small
BERT-Ita
BERTino
lingual embeddings to learn informative sub-word
clues for diagnostic paragraphs.</p>
        <p>Our second cross-lingual pipeline builds
directly upon the model presented in Section 4. We
use an MT component to translate test documents
from Italian into English, run our diagnosis
identiifcation model and then port the results to the
Italian original via a trivial paragraph-level alignment.
Note that this model is trained on high-quality data
in English and tested on noisy automatically
translated data.</p>
        <p>For the third pipeline, we first translate the
whole training set from English into Italian, while
keeping paragraphs aligned. We follow the
methodology from Section 4 to train a new model,
operating on Italian directly. Note that, unlike the
second pipeline, this approach implies training on
noisy automatically translated data while testing
on high-quality Italian. The effect of this is
twofold: on one hand, the task becomes more difficult
to learn, on the other hand, the resulting classifier
should be more robust.</p>
        <p>
          To obtain a satisfactory translation using
opensource architectures, we rely on the transformer
encoder-decoder models
          <xref ref-type="bibr" rid="ref7 ref9">(Tiedemann and
Thottingal, 2020)</xref>
          trained on the OPUS corpus3. While
the OPUS corpus is not tailored specifically to the
medical domain, its large size and generic nature
allow for training very robust MT models. We
exploit the two models to translate from English to
Italian 4 and from Italian to English 5. Both are
transformer encoder-decoder models trained with
the Causal Language Modeling objective.
Data processing. We split the MIMIC III
discharge dataset into training, development and
test
        </p>
      </sec>
      <sec id="sec-8-2">
        <title>3https://opus.nlpl.eu 4https://huggingface.co/Helsinki-NLP/opus-mt-en-it 5https://huggingface.co/Helsinki-NLP/opus-mt-it-en</title>
        <p>
          ing sets (60%, 20% and 20% respectively). We
used the first for training all the models
presented in this study, while we use the other two
for checkpoint selection, hyper-parameter tuning
(batch size and learning rate) and evaluating the
monolingual model. We used the exprivia-10 set
for validation and exprivia-100 set for testing in
the cross-lingual (language transfer) experiments.
Transformer Models. We run most
experiments in two modes: (i) with powerful
transformer components comprising a large number of
parameters and providing top performance such
as BERT
          <xref ref-type="bibr" rid="ref5">(Devlin et al., 2019)</xref>
          and BERT-ita6 and
(ii) with small and efficient transformer models
such as ELECTRA small
          <xref ref-type="bibr" rid="ref1">(Clark et al., 2020)</xref>
          and
BERTino
          <xref ref-type="bibr" rid="ref7 ref9">(Muffo and Bertino, 2020)</xref>
          . The
objective of this setup was to measure the
performance/efficiency trade-off.
        </p>
        <p>Table 3 presents all the used transformer models
with the respective number of parameters.
Evaluation metrics. Diagnosis/Procedure
classification task shows a very skewed label
distribution. For this reason, we approach it from an
information retrieval viewpoint, i.e., we rank
paragraphs based on their probability of containing a
diagnosis. We use Mean Average Precision and
Precision@1 to evaluate the ranking quality. The
former takes into account the whole ranking and
is therefore the best indicator of the ranking
quality. The latter indicates the number of times a
correct diagnosis is returned in the first position. To
provide a better comparison, we report MAP and
P@1 averaging only over the documents that
contain at least one diagnosis. We also report model
accuracy in recognizing documents with no
diagnoses (Filtering Accuracy). This metric was
introduced because a relevant fraction of documents
did not contain a diagnosis, see Table 2.</p>
        <p>6https://huggingface.co/dbmdz/bert-ba
se-italian-xxl-cased
Monolingual results. Table 4 summarizes the
English results. The numbers refer to a
BERTbase-cased model fine-tuned with a batch size of
64 and a learning rate of 2 ∗ 10− 6. The model is
able to identify very accurately documents with no
diagnoses/procedures (92.4% and 97.1% accuracy
respectively). Moreover, the binary classification
of paragraphs into diagnoses (or not), and
procedures (or not) is very reliable: 95.9% and 98.4%
P@1 at document level.</p>
        <sec id="sec-8-2-1">
          <title>Cross-lingual experiments. Table 5 shows the</title>
          <p>results of our language transfer experiments. A
moderate performance (58.8% Filtering Accuracy,
49.2% P@1) can be achieved via a BERT model
trained on English MIMIC data and directly tested
on the Italian exprivia-100 set.
MultilingualBERT does slightly better as it was trained on
104 languages, English and Italian included. This
approach relies on joint multilingual embeddings
and fine tokenization. It can, for example, identify
and align stems of Latin origin for some disease
names. However, it cannot go much beyond: it is
not able to model deep semantics related to
medi</p>
          <p>The use of MT shows considerable
improvement over the baseline. The results suggest a better
performance for the setting where the training set
is translated into Italian and the diagnosis
extraction model is then learned on (noisy) Italian data.
Moreover, this approach is much faster when used
as a service, as it directly operates on Italian input.</p>
          <p>We performed all the MT-based experiments 5
times using random seeds to enable a better
statistical assessment of the results. While in general
the standard deviation is rather small considering
the very small test set, the setting with a translated
test set leads to unstable benchmarking, especially
for the smaller ELECTRA transformer.</p>
          <p>Finally, smaller transformer models, especially
BERTino, exhibit very small performance drops
compared to larger transformers. This suggests
that they are robust enough to capture
paragraphlevel diagnosis semantics. Therefore, it is possible
to run the extraction service with low
computational resources, e.g., using CPUs. Figure 1 shows
the stability of the learning with translated training
data. Small models are able to match the
performance of larger models, being also faster to
converge. We believe that smaller models overfit less
the MIMIC training data, thus providing a final
better performance on the Exprivia data. Note that
training was stopped after a fixed amount of time
for every experiment. BERTino, being smaller, is
able to do more steps in the same amount of time.
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Conclusion</title>
      <p>We present a language transfer approach to
unraveling discourse structure of clinical notes,
focusing on diagnosis and procedure. We combine
transformer-based paragraph modeling with
stateof-the-art MT architectures in a novel application,
that is essential for eHealth big data analytics.
Most importantly, our language transfer approach
helps mitigate the need for expensive and
timeconsuming medical resource creation (annotated
train data as well as header taxonomy) in Italian.</p>
      <p>We empirically investigate two
translationbased architectures, showing that both of them
outperform a generic cross-lingual pipeline. The
approach based on translating train data is more
robust and efficient (at runtime) compared to
translating the test data, yielding more stable
performance.</p>
      <p>In future, we plan to expand our study to other
discourse segments, such as allergy or history.
However, our first experiments with procedure
segments show that, unlike diagnosis, modeling
and even annotating other headers require a more
tight collaboration with medical experts.
8</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>The research presented in this paper has been
supported by the Autonomous Province of Trento
(project CareGenius). The computational power
has been provided by the High Performance
Computing department of the CINECA Consortium
(ISCRA project CareGeni).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <surname>Minh-Thang</surname>
            <given-names>Luong</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quoc</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
            , and
            <given-names>Christopher D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Electra: Pretraining text encoders as discriminators rather than generators</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Hong-Jie</surname>
            <given-names>Dai</given-names>
          </string-name>
          , Shabbir Syed-Abdul,
          <string-name>
            <surname>Chih-Wei</surname>
            <given-names>Chen</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Chieh-Chen Wu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Recognition and evaluation of clinical section headings in clinical documents using token-based formulation with conditional random fields</article-title>
          . BioMed Research International,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Joshua</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Randolph</given-names>
            <surname>Miller</surname>
          </string-name>
          , Kevin Johnson, and
          <string-name>
            <given-names>Anderson</given-names>
            <surname>Spickard</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Development and evaluation of a clinical note section header terminology</article-title>
          .
          <source>In Proceeding of AMIA Annual Symposium</source>
          , pages
          <fpage>156</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Joshua</given-names>
            <surname>Denny</surname>
          </string-name>
          , Anderson Spickard, Kevin Johnson, Neeraja Peterson, Josh Peterson, and
          <string-name>
            <given-names>Randolph</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Evaluation of a method to identify and categorize section headers in clinical documents</article-title>
          .
          <source>Journal of the American Medical Informatics Association : JAMIA</source>
          ,
          <volume>16</volume>
          (
          <issue>6</issue>
          ):
          <fpage>806</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Alistair E.W.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , Tom J.
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>Lu</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            , Li wei
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lehman</surname>
          </string-name>
          , Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark.
          <year>2016</year>
          .
          <article-title>MIMIC-III, a freely accessible critical care database</article-title>
          .
          <source>Scientific Data</source>
          ,
          <volume>3</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Matteo</given-names>
            <surname>Muffo</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Bertino</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Bertino: An italian distilbert model</article-title>
          . In CLiC-it.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Sara</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          , Ken Barker, and
          <string-name>
            <given-names>Zhicheng</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Leveraging medical literature for section prediction in electronic health records</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          , pages
          <fpage>4864</fpage>
          -
          <lpage>4873</lpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China, November. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>Jo¨rg Tiedemann and Santhosh Thottingal</source>
          .
          <year>2020</year>
          .
          <article-title>OPUS-MT - Building open translation services for the World</article-title>
          .
          <source>In Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)</source>
          , Lisbon, Portugal.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Lukasz Kaiser, and
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>