<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A support for understanding medical notes: correcting spelling errors in Italian clinical records</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roger Ferrod</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Brunetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luigi Di Caro</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Di Francescomarino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Dragoni</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Ghidini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renata Marinello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilio Sulis</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City of Health and Science</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Treno</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Turin</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>19</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>In a context of digitalization and modernization of healthcare, automatic analysis of clinical data plays a leading role in improving the quality of care. Since much of the information lies in an unstructured form within clinical notes, it is necessary to make use of modern Natural Language Processing techniques to extract and build structured knowledge from the data. However, clinical texts pose unique challenges due to the extensive usage of i) acronyms, ii) non-standard medical jargons and iii) typos over technical terms. In this paper, we present a prototype spell-checker specifically designed for medical texts written in Italian.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Clinical notes</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Spelling correction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>traces regarding activities, patients, as well as medical notes. In particular, it is relevant to
consider both structured and unstructured data, i.e., clinical and textual.</p>
      <p>Since a portion of the healthcare data is in textual form, it is increasingly of interest to
provide a Natural Language Processing (NLP) pipeline to extract and analyse useful information.
Unstructured texts are often noisy, with typing errors and extensive use of non-standard
acronyms and medical jargon, which are usually accompanied by a less rigorous structure of
the document itself. To overcome these problems, researchers must begin to address issues of
spelling correction, acronyms disambiguation and entity normalization. However, in languages
other than English it is very dificult to find advanced models, data or other resources.</p>
      <p>
        In this paper we deal with the spelling correction task (i.e. the correction of typos) of notes
written by physicians, so as to provide the most correct text for the sophisticated Information
Extraction (IE) techniques that generally follow the initial data cleaning phase. Indeed, this work
is part of a larger project [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that involves the Turin’s City of Health and Science2, the Bruno
Kessler Foundation3 and the University of Turin4. The project aims at supporting physicians in
making decisions in the context of home hospitalization services [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Specifically, with this work we introduce a prototype of a spell-checker designed to work
on Italian clinical texts. Although it is still a work-in-progress, to the best of our knowledge it
represents currently the first and unique study specifically designed to correct medical texts in
Italian.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        The automatic spelling correction task is the first step to be taken in order to analyse clinical
texts, representing one of the most important open problems in Natural Language Processing.
The correction process can be divided, according to [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], in: 1) error detection; 2)
correction candidates generation; 3) suggestions ranking. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] also categorizes errors into two
types: non-word errors (errors that give rise to non-existent words in the vocabulary) and
real-word errors (when the typo is a meaningful word but not the intended word in that context).
In the latter case, particular attention must be paid by developing specific mechanisms, as done
for example in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        According to the Claude Shannon’s noisy-channel framework, the problem is broken down
into error modelling (aka channel model) and language modelling. The first one measure “fitness”
of the correction candidate with respect to the corrected string, meanwhile the second one
expresses the probability of correct word occurrence, considering - possibly - also the context.
Most of the works, such as [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], use edit distance (Damerau–Levenshtein or
Levenshtein distance or Longest common subsequence) for error modelling. Instead, other
more refined models make use of word-confusion matrices (calculated from a corpus of typical
errors) [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], n-grams of characters [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] or rely on tools such as Aspell5 which includes
phonetic algorithms. In a similar way, it is possible to approach the language model with the
      </p>
      <sec id="sec-2-1">
        <title>2https://www.cittadellasalute.to.it 3https://www.fbk.eu/en/ 4https://ch4i.di.unito.it 5http://aspell.net/</title>
        <p>
          simplest n-grams [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], integrating POS tagging [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] or word embeddings [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. State-of-the-art
works [
          <xref ref-type="bibr" rid="ref12 ref20">12, 20</xref>
          ] still rely on such techniques.
        </p>
        <p>
          Recently, it has also been shown how good results can be obtained through purely neural
approaches. For example, a state-of-the-art corrector for Italian ([
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]) uses a biLSTM network
for learning the error model and directly correct typos. Unfortunately, in addition to being the
only recent work for the Italian language, the errors are artificially generated and therefore
they do not fully represent human-like typos. Diametrically opposite is the solution of [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
which uses a Denoising Transformer to learn real error patterns and generate a training set
that is as truthful as possible. Unfortunately, solutions of this type require large amounts of
data which are dificult to find in languages other than English; the specificities of the medical
domain makes the procurement of these resources even more dificult. Indeed, to the best of
our knowledge, there are still no public available solutions to correct medical texts in Italian.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>6https://it.wikipedia.org/ – Apr 2021 dump 7https://www.issalute.it/index.php/la-salute-dalla-a-alla-z – retrieved Jun 2021 8https://www.dica33.it/ – retrieved Jun 2021</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposal</title>
      <p>or, by applying Bayes’ rule:
The proposed spell checker prototype is based on Shannon’s noisy channel framework described
by the equation:
ˆ = arg max  (|)</p>
      <p>∈
ˆ = arg max  (|) ()</p>
      <p>∈
where ˆ indicates the best correction for the misspelled word ; word  is selected from a
given and finite vocabulary  .</p>
      <p>Since the prior probability  () carries too little information, we replace it with a Language
Model (LM) that involves context  (|− 1). Finally, we weight the LM with a parameter
 and, for numerically stability reasons, we move on the logarithmic space. The equation is
therefore:
ˆ = arg max  (|) +  (|− 1)</p>
      <p>
        ∈
3.1. Data
Unlike English, languages such as Italian are characterized by limited publicly available resources.
Furthermore, considering the specificity of clinical language, the availability of medical texts
is even more dificult. Medical terms such as surgical procedures, drugs, anatomical parts etc.
constitute a very specific vocabulary that difers from what we can normally find in Italian public
corpora. It is therefore necessary to find a collection of suitable medical/scientific documents
and build new Language Models on them. Following the suggestions of [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] we collected 2M
sentences from Wikipedia scientific articles 6, informative articles from the Ministry of Health’s
website7, pathologies, drugs and package inserts from Dica338 (a popular medical information
(1)
(2)
(3)
web-site) and – to build a more accurate model of the Italian language – a selection of newspaper
articles9. Finally, we integrated our corpus with personal medical resumes, which cover most of
the subjects studied during the university course. Details on the composition of the corpus are
shown in the Table 1. A common feature of the corpora described above is the control over the
texts, which limits the presence of typing errors (contrarily to what can happen in a hospital
environment).
      </p>
      <p>For computational reasons, and to avoid too rare expressions (a symptom of a possible error)
we only consider the elements that occur more than 8 times for the terms and 48 for the n-grams.
The resulting vocabulary has a total of 787,940 unique words.</p>
      <p>Source
Wikipedia
News
Ministry of Health
Dica33
Notes
TOTAL</p>
      <p>Sentences
1,096,672
247,872
39,838
1,059,063
58,160
2,501,605</p>
      <p>Words
25,605,524
5,878,905
1,151,371
37,333,844
962,408
70,932,052</p>
      <p>As regards clinical documents, we relied on a sample of 200 anamnesis notes that were
provided by the hospital. The texts, once anonymized, were manually corrected by physicians,
thus constituting the gold-standard. Acronyms and abbreviations are excluded from the
correction process. Out of the total 9374 words, 269 (2.87%) constitute typos, of which 28 (0.30%)
are real-word errors. Errors attributable to purely medical context are 107 (40%), of these 25
are names of drugs/active substances. 88% of typos have Damerau–Levenshtein distance of 1
from the correction (e.g. mammamria → mammaria); only 10% have DLD 2 (e.g. ematochici →
ematochimici) and a less than 1% for higher distances (e.g. idrixixizima → idroxizina).
3.2. Model
For simplicity, and considering the rarity with which they occur, we have chosen to discard
realword errors, thus focusing on the remaining 88% of typos. These errors are easily identifiable
by searching for terms that do not match the vocabulary. Potential acronyms and abbreviations
are excluded; in this regard we have built, in collaboration with domain names, a blacklist of
terms not to be corrected.</p>
      <p>Once a potential error has been found, a list of candidate corrections is generated, considering
the words “similar" to the original one as candidates. Also in this case, the Damerau–Levenshtein
distance is used as similarity metric between strings. Since the generation of candidates is a
very demanding task in computational terms, we rely on the optimized tools SymSpell10, which
9https://webhose.io/free-datasets/italian-news-articles/ – Crawled Oct 2015
10https://github.com/wolfgarbe/SymSpell
can operate under the “CLOSEST" regime (i.e. finding the first word at the shortest distance) or
“ALL" (all words with maximum distance ).</p>
      <p>Always with reference to Equation 3, we list below the solutions that have been implemented
and tested.</p>
      <sec id="sec-3-1">
        <title>3.2.1. Error model</title>
        <p>The simplest way to implement the channel model is to assign a probability ( ) for the event
 =  (i.e. the word found is not an error even if it does not appear in the vocabulary) and use
the Damerau–Levenshtein distance to evaluate other cases.</p>
        <p>(|) =
{︃</p>
        <p>if  = 
− ((, )) otherwise</p>
        <p>
          In 1991, [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] proposes a simple, but efective, model that uniformly distribute the 1
ability over all generated candidates (). The formula, to which we have added a parameter 
− 
probas lower-bound, is therefore:
 (|) =
⎧
        </p>
        <p>⎪⎨ 1− 
⎪
⎩
|()|
if  = 
if  ∈ ()
otherwise</p>
        <p>
          Finally, we tested a slightly more sophisticated variant, proposed by [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], by replacing the
uniform distribution with a more informed probability on the characteristics of the language.
More specifically, [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] introduced confusion matrices, for each transformation11, that lists the
number of times one character was confused with another one. We can formulate the model as:
⎧
⎪
⎨
        </p>
        <p>⎪
⎩
 (|) =
Π  (,)
(,)
if  = 
if  ∈ ()
otherwise</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2.2. Language model</title>
        <p>
          We have chosen to approach the language model in two ways: through n-grams, either
monodirectional (− 2, − 1, ) or bidirectional (− 1, , +1), or through contextualized word
embeddings (Masked Language Model). The n-grams are calculated according to the “stupid
backof” scheme [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], having a significant number of tokens available. With regard to the
embeddings models, we have experimented with Italian pre-trained BERT-like models such as
ELECTRA[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]12, RoBERTa[
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]13 and XLMRoberta[
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]14 (multi-language RoBERTa that includes
        </p>
        <sec id="sec-3-2-1">
          <title>Italian).</title>
          <p>11Transformations allowed by Damerau–Levenshtein are: deletion, insertion, substitution and transposition
Model</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We tested the diferent models on the gold standard described above, comparing the results
with the state of the art as shown in Table 2. Unfortunately the absence of clinical corpus, as
well as publicly available models, makes it dificult, if not impossible, the comparison between
spell-checkers in the hospital setting. For this reason we have relied on commonly used general
purpose tools such as: Hunspell15, LanguageTool16, Google Docs17 and Microsoft Ofice 18. To
standardize the results we have chosen to correct all the possible typos suggested by the software,
replacing them with the first suggestion. It is also necessary to consider the specificity of the
medical vocabulary, which is usually absent in generic tools. For this reason we have excluded
all the terms that belong to the vocabulary in our possession and the blacklist defined with the
experts. In the Table 2 it is possible to distinguish the two cases (with or without the vocabulary
extension) by means of the label “+voc".</p>
      <p>As regards the models developed by us, the optimal configuration of the parameters is shown
in Table 3. Any change to them does not bring any benefit, as described below. Among the
proposed solutions, the best model is the combination of uniform distribution, as error model,
and n-grams for the language model.</p>
      <p>The advantage of n-grams, over word embeddings, may also be due to the generic nature of
the embeddings used; unfortunately, however, the few data available did not allow us to train a
15http://hunspell.github.io/
16https://languagetool.org/it
17https://docs.google.com
18https://www.ofice.com
Parameter
alpha
lambda
epsilon
n-grams size</p>
      <p>Value
new model from scratch. By excluding Wikipedia and newspapers from the initial corpus, thus
focusing on purely medical texts, we can obtain 1 157 061 sentences, useful – after subdivision
into training set (96%) and validation set (4%) – to continue the training of the ELECTRA model.
However, the efort is vain, failing to difer much from the performance of ELECTRA without
ifne-tuning; for this reason the results are omitted from the table. The ELECTRA model is
however better than other architectures such as RoBERTa and its multilingual variant. A possible
advantage in the use of the neural model, in addition to a slight improvement in Precision,
consists in the significant reduction of processing times (15 min for the n-grams case vs 25 sec
for ELECTRA).</p>
      <p>
        Focusing instead on n-grams, a reduction in their size (passing from 5 to 3) slightly decrease
the performances (F1 from 66.31 to 66.19). Similarly, the monodirecional/bidirectional choice
is almost irrelevant in terms of score. On the contrary, the application of standardization
techniques (stemming and numbers masking) considerably worsens the scores, obtaining F1
62.97. The result is not surprising, as a similar phenomenon has already been observed by [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
In that case the normalization consisted of lemmatization, but without reliable POS tagging, the
lemmatization is de facto reduced to stemming, while the presence of errors, abbreviations and
technical terminology makes POS tagging unreliable.
      </p>
      <p>With regard to the error model, on the other hand, the use of the confusion matrix is
counterproductive, lowering the F1 score by 8 points on average. While using distance alone as
a probability measure has no benefit over the uniform distribution.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusions</title>
      <p>
        Most of the gaps highlighted in the previous section are probably attributable to the scarcity of
data. The examples collected by [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], for the Italian language, are still too few to be
exploited in a machine learning scenario. Meanwhile, synthetic datasets, such as the one used
in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], do not carry with them any useful information to characterize the typical errors of the
Italian language. For this reason we are working on the construction of a corpus of common
errors for Italian, with the aim of collecting a few thousand pairs ⟨, ⟩. Such a
dataset would provide the basis for training more sophisticated error models to replace the
uniform distribution used in this work.
      </p>
      <p>The texts that characterize the medical domain are also particularly interesting. Just think,
for example, that the addition of medical notes (which weighs just over 1%) made it possible to
improve performance by 4 points, passing from F1 64.37 to F1 66.31, although with a diferent
nature of texts. Indeed, all the texts used in our corpus present linguistic characteristics that
are very diferent from those that appear in clinical documents. The Wikipedia entries, for
example, provide encyclopedic information, as well as the pages of medical disclosure and
personal essays. The search for texts closer to clinical reality will be a fundamental objective
that we will pursue in future works.</p>
      <p>We also think that - in addition to a mere increase in the size of the datasets for statistical
learning purposes - also the integration of syntactic parsing and POS tagging can improve the
results. This is especially true in a low resources languages, like Italian, where machine learning
is severely limited. For this reason we are conducting a study aimed at evaluating the reliability
of these techniques on medical texts.</p>
      <p>Finally, the abbreviations (standard and non-standard) usually used in texts remain to be
addressed. In this regard it is dificult to think of a pipeline that orders the 3 tasks to be
performed: POS tagging, acronyms disambiguation and spelling correction. More likely, the
3 tasks will be carried out in parallel, as one can help the other. We will also evaluate this
possibility in future works.</p>
      <p>Not having reached the state of the art, represented by Google’s spell checker, we believe
there is still room for improvement. Moreover, the task is of fundamental importance in order
to continue with the analysis of the texts and, ultimately, for clinical decision support.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research has been partially carried out within the “Circular Health for Industry” project
funded by “Compagnia di San Paolo” under the call “IA, uomo e società”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Matarese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lommi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. G. De Marinis</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Riegel</surname>
          </string-name>
          ,
          <article-title>A systematic review and integration of concept analyses of self-care and related concepts</article-title>
          ,
          <source>Journal of Nursing Scholarship</source>
          <volume>50</volume>
          (
          <year>2018</year>
          )
          <fpage>296</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Hickey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bakken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Byrne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C. E.</given-names>
            <surname>Bailey</surname>
          </string-name>
          , G. Demiris,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Docherty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Dorsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Guthrie</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Heitkemper</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Jacelon</surname>
            ,
            <given-names>T. J.</given-names>
          </string-name>
          <string-name>
            <surname>Kelechi</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>N. S.</given-names>
          </string-name>
          <string-name>
            <surname>Redeker</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          <string-name>
            <surname>Renn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Resnick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Starkweather</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ward</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          <string-name>
            <surname>McCloskey</surname>
            ,
            <given-names>J. K.</given-names>
          </string-name>
          <string-name>
            <surname>Austin</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          <string-name>
            <surname>Grady</surname>
          </string-name>
          ,
          <article-title>Precision health: Advancing symptom and self-management science</article-title>
          ,
          <source>Nursing Outlook</source>
          <volume>67</volume>
          (
          <year>2019</year>
          )
          <fpage>462</fpage>
          -
          <lpage>475</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Alloatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Di</given-names>
            <surname>Caro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pieraccini</surname>
          </string-name>
          ,
          <article-title>Diabetes and conversational agents: the aida project case study</article-title>
          ,
          <source>Discover Artificial Intelligence</source>
          <volume>1</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
            ,
            <given-names>A. H. M.</given-names>
          </string-name>
          ter Hofstede (Eds.),
          <source>Process-Aware Information Systems: Bridging People and Software Through Process Technology</source>
          , Wiley,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Munoz-Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sepúlveda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Capurro</surname>
          </string-name>
          ,
          <article-title>Process mining in healthcare: A literature review</article-title>
          ,
          <source>Journal of biomedical informatics 61</source>
          (
          <year>2016</year>
          )
          <fpage>224</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Amantea</surname>
          </string-name>
          , E. Sulis, G. Boella,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marinello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bianca</surname>
          </string-name>
          , E. Brunetti,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bo</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>FernándezLlatas, A process mining application for the analysis of hospital-at-home admissions</article-title>
          , in: L. B.
          <string-name>
            <surname>Pape-Haugaard</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lovis</surname>
            ,
            <given-names>I. C.</given-names>
          </string-name>
          <string-name>
            <surname>Madsen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>P. H.</given-names>
          </string-name>
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          Scott (Eds.),
          <source>Digital Personalized Health and Medicine - Proceedings of MIE</source>
          <year>2020</year>
          ,
          <article-title>Medical Informatics Europe</article-title>
          , Geneva, Switzerland, April 28 - May 1,
          <year>2020</year>
          , volume
          <volume>270</volume>
          of
          <article-title>Studies in Health Technology and Informatics</article-title>
          , IOS Press,
          <year>2020</year>
          , pp.
          <fpage>522</fpage>
          -
          <lpage>526</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aringhieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Boella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brunetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Caro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ferrod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marinello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ronzani</surname>
          </string-name>
          , E. Sulis,
          <article-title>Towards the application of process mining for supporting the home hospitalization service</article-title>
          , in: A.
          <string-name>
            <surname>Marrella</surname>
          </string-name>
          , D. T. Dupré (Eds.),
          <source>Proceedings of the 1st Italian Forum on Business Process Management co-located with the 19th International Conference of Business Process Management (BPM</source>
          <year>2021</year>
          ), Rome, Italy,
          <year>September 10th</year>
          ,
          <year>2021</year>
          , volume
          <volume>2952</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Amantea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arnone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Leva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bianca</surname>
          </string-name>
          , E. Brunetti,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marinello</surname>
          </string-name>
          ,
          <article-title>Modeling and simulation of the hospital-at-home service admission process</article-title>
          , in: M. S. Obaidat,
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Ören</surname>
          </string-name>
          , H. Szczerbicka (Eds.),
          <source>Proceedings of the 9th International Conference on Simulation and Modeling Methodologies, Technologies and Applications</source>
          ,
          <string-name>
            <surname>SIMULTECH</surname>
          </string-name>
          <year>2019</year>
          , Prague, Czech Republic,
          <source>July 29-31</source>
          ,
          <year>2019</year>
          , SciTePress,
          <year>2019</year>
          , pp.
          <fpage>293</fpage>
          -
          <lpage>300</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kukich</surname>
          </string-name>
          ,
          <article-title>Techniques for automatically correcting words in text</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>24</volume>
          (
          <year>1992</year>
          )
          <fpage>377</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Pirinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lindén</surname>
          </string-name>
          ,
          <article-title>State-of-the-art in weighted finite-state spell-checking</article-title>
          , in: CICLing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deorowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciura</surname>
          </string-name>
          ,
          <article-title>Correcting spelling errors by modelling their causes</article-title>
          ,
          <source>International Journal of Applied Mathematics and Computer Science</source>
          <volume>15</volume>
          (
          <year>2005</year>
          )
          <fpage>275</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Whitelaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hutchinson</surname>
          </string-name>
          , G. Chung,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Ellis, Using the web for language independent spellchecking and autocorrection</article-title>
          , in: EMNLP,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Damnati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Auguste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nasr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Charlet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heinecke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bechet</surname>
          </string-name>
          ,
          <article-title>Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text</article-title>
          ,
          <source>in: Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ), Miyazaki, Japan,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dziadek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Henriksson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Duneld</surname>
          </string-name>
          ,
          <article-title>Improving terminology mapping in clinical text with context-sensitive spelling correction</article-title>
          ,
          <source>Studies in Health Technology and Informatics</source>
          <volume>235</volume>
          (
          <year>2017</year>
          )
          <fpage>241</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sorokin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shavrina</surname>
          </string-name>
          ,
          <article-title>Automatic spelling correction for russian social media texts</article-title>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Automatic error checking and correction of electronic medical records</article-title>
          , in: G. Chen,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          , M. Shojafar (Eds.),
          <source>Fuzzy System and Data Mining - Proceedings of FSDM</source>
          <year>2015</year>
          [Shanghai, China,
          <source>December 12-15</source>
          ,
          <year>2015</year>
          ], volume
          <volume>281</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2015</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          , E. Brill,
          <article-title>Scaling to very very large corpora for natural language disambiguation, in: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Toulouse, France,
          <year>2001</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vilares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Doval</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Vilares, Studying the efect and treatment of misspelled queries in cross-language information retrieval</article-title>
          ,
          <source>Inf. Process. Manage</source>
          .
          <volume>52</volume>
          (
          <year>2016</year>
          )
          <fpage>646</fpage>
          -
          <lpage>657</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Héja</surname>
          </string-name>
          , G. Surján,
          <article-title>Using n-gram method in the decomposition of compound medical diagnoses</article-title>
          ,
          <source>International journal of medical informatics 70</source>
          (
          <year>2003</year>
          )
          <fpage>229</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bendersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <article-title>Personalized online spell correction for personal search</article-title>
          ,
          <source>in: The World Wide Web Conference</source>
          , WWW '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>2785</fpage>
          -
          <lpage>2791</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sbattella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tedesco</surname>
          </string-name>
          ,
          <article-title>How to simplify human-machine interaction: A text complexity calculator and a smart spelling corrector</article-title>
          ,
          <source>in: Proceedings of the 4th EAI International Conference on Smart Objects and Technologies for Social Good</source>
          , Goodtechs '
          <volume>18</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , p.
          <fpage>304</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Urdiales</surname>
          </string-name>
          ,
          <article-title>Spelling correction with denoising transformer</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2105</volume>
          .
          <fpage>05977</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mensa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Marino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Colla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Delsanto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Radicioni</surname>
          </string-name>
          ,
          <article-title>A resource for detecting misspellings and denoising medical text data</article-title>
          , in: J.
          <string-name>
            <surname>Monti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Tamburini</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Seventh Italian Conference on Computational Linguistics</source>
          ,
          <source>CLiCit</source>
          <year>2020</year>
          , Bologna, Italy, March 1-
          <issue>3</issue>
          ,
          <year>2021</year>
          , volume
          <volume>2769</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mays</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Damerau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Mercer</surname>
          </string-name>
          ,
          <article-title>Context based spelling correction</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>27</volume>
          (
          <year>1991</year>
          )
          <fpage>517</fpage>
          -
          <lpage>522</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>M. D. Kernighan</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          <string-name>
            <surname>Church</surname>
            ,
            <given-names>W. A.</given-names>
          </string-name>
          <string-name>
            <surname>Gale</surname>
          </string-name>
          ,
          <article-title>A spelling correction program based on a noisy channel model</article-title>
          ,
          <source>in: COLING</source>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brants</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Popat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Och</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Large language models in machine translation</article-title>
          ,
          <source>in: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing</source>
          and
          <string-name>
            <surname>Computational Natural Language Learning (EMNLP-CoNLL</surname>
            <given-names>)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Prague, Czech Republic,
          <year>2007</year>
          , pp.
          <fpage>858</fpage>
          -
          <lpage>867</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , ELECTRA:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wenzek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guzmán</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagiwara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mita</surname>
          </string-name>
          ,
          <article-title>Github typo corpus: A large-scale multilingual dataset of misspellings and grammatical errors</article-title>
          ,
          <source>in: LREC</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>