A support for understanding medical notes:
correcting spelling errors in Italian clinical records
Roger Ferrod1 , Enrico Brunetti1,2 , Luigi Di Caro1 , Chiara Di Francescomarino3 ,
Mauro Dragoni3 , Chiara Ghidini3 , Renata Marinello2 and Emilio Sulis1
1
  University of Turin, Torino, Italy
2
  City of Health and Science, Torino, Italy
3
  Fondazione Bruno Kessler, Treno, Italy


                                         Abstract
                                         In a context of digitalization and modernization of healthcare, automatic analysis of clinical data plays a
                                         leading role in improving the quality of care. Since much of the information lies in an unstructured form
                                         within clinical notes, it is necessary to make use of modern Natural Language Processing techniques to
                                         extract and build structured knowledge from the data. However, clinical texts pose unique challenges
                                         due to the extensive usage of i) acronyms, ii) non-standard medical jargons and iii) typos over technical
                                         terms. In this paper, we present a prototype spell-checker specifically designed for medical texts written
                                         in Italian.

                                         Keywords
                                         Clinical notes, Natural Language Processing, Spelling correction


1. Introduction
Healthcare is more than ever a priority for every country. Post-COVID-19 healthcare will be
characterized by a renewed interest in modernization. Indeed, healthcare is still one of the
least digitized industries, although it generates alone 5% of all the data in the world1 . For these
reasons we are witnessing today a revolution that sees data as protagonists, as in the case of
Electronic Health Records (EHRs) to systematically collect a wide range of patient data (e.g.
medical history, medications, laboratory tests, vital signs etc.). In this context, also self-care
processes are currently undergoing modernization and automation, as shown for example
in [1, 2, 3].
   The digitization of healthcare involves the recent frontiers of the medical internet of things,
sensor applications to monitor both human behavior and the environment. In addition, health-
care organizations must pay attention to the role of information systems [4]. The most recent
developments in organization management involve automated analysis of information recorded
in so-called event-log files [5, 6]. Health information systems collect data to leverage digital

AIxIA 2021 SMARTERCARE Workshop, November 29, 2021, Milan, IT
" roger.ferrod@unito.it (R. Ferrod); enrico.brunetti@unito.it (E. Brunetti); luigi.dicaro@unito.it (L. D. Caro);
dfmchira@fbk.eu (C. D. Francescomarino); dragoni@fbk.eu (M. Dragoni); ghidini@fbk.eu (C. Ghidini);
rmarinello@cittadellasalute.to.it (R. Marinello); emilio.sulis@unito.it (E. Sulis)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings           CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                      Source: UBS Group SA, as of June 2020


                                                                                                          19
Roger Ferrod et al. CEUR Workshop Proceedings                                               19–28


traces regarding activities, patients, as well as medical notes. In particular, it is relevant to
consider both structured and unstructured data, i.e., clinical and textual.
   Since a portion of the healthcare data is in textual form, it is increasingly of interest to
provide a Natural Language Processing (NLP) pipeline to extract and analyse useful information.
Unstructured texts are often noisy, with typing errors and extensive use of non-standard
acronyms and medical jargon, which are usually accompanied by a less rigorous structure of
the document itself. To overcome these problems, researchers must begin to address issues of
spelling correction, acronyms disambiguation and entity normalization. However, in languages
other than English it is very difficult to find advanced models, data or other resources.
   In this paper we deal with the spelling correction task (i.e. the correction of typos) of notes
written by physicians, so as to provide the most correct text for the sophisticated Information
Extraction (IE) techniques that generally follow the initial data cleaning phase. Indeed, this work
is part of a larger project [7] that involves the Turin’s City of Health and Science2 , the Bruno
Kessler Foundation3 and the University of Turin4 . The project aims at supporting physicians in
making decisions in the context of home hospitalization services [8].
   Specifically, with this work we introduce a prototype of a spell-checker designed to work
on Italian clinical texts. Although it is still a work-in-progress, to the best of our knowledge it
represents currently the first and unique study specifically designed to correct medical texts in
Italian.


2. Related works
The automatic spelling correction task is the first step to be taken in order to analyse clinical
texts, representing one of the most important open problems in Natural Language Processing.
The correction process can be divided, according to [9] and [10], in: 1) error detection; 2)
correction candidates generation; 3) suggestions ranking. [9] also categorizes errors into two
types: non-word errors (errors that give rise to non-existent words in the vocabulary) and
real-word errors (when the typo is a meaningful word but not the intended word in that context).
In the latter case, particular attention must be paid by developing specific mechanisms, as done
for example in [11] and [12].
   According to the Claude Shannon’s noisy-channel framework, the problem is broken down
into error modelling (aka channel model) and language modelling. The first one measure “fitness”
of the correction candidate with respect to the corrected string, meanwhile the second one
expresses the probability of correct word occurrence, considering - possibly - also the context.
Most of the works, such as [13], [14] and [15], use edit distance (Damerau–Levenshtein or
Levenshtein distance or Longest common subsequence) for error modelling. Instead, other
more refined models make use of word-confusion matrices (calculated from a corpus of typical
errors) [16, 17], n-grams of characters [18] or rely on tools such as Aspell5 which includes
phonetic algorithms. In a similar way, it is possible to approach the language model with the

   2
     https://www.cittadellasalute.to.it
   3
     https://www.fbk.eu/en/
   4
     https://ch4i.di.unito.it
   5
     http://aspell.net/


                                                20
Roger Ferrod et al. CEUR Workshop Proceedings                                                19–28


simplest n-grams [19], integrating POS tagging [14] or word embeddings [13]. State-of-the-art
works [12, 20] still rely on such techniques.
   Recently, it has also been shown how good results can be obtained through purely neural
approaches. For example, a state-of-the-art corrector for Italian ([21]) uses a biLSTM network
for learning the error model and directly correct typos. Unfortunately, in addition to being the
only recent work for the Italian language, the errors are artificially generated and therefore
they do not fully represent human-like typos. Diametrically opposite is the solution of [22]
which uses a Denoising Transformer to learn real error patterns and generate a training set
that is as truthful as possible. Unfortunately, solutions of this type require large amounts of
data which are difficult to find in languages other than English; the specificities of the medical
domain makes the procurement of these resources even more difficult. Indeed, to the best of
our knowledge, there are still no public available solutions to correct medical texts in Italian.


3. Proposal
The proposed spell checker prototype is based on Shannon’s noisy channel framework described
by the equation:
                                   𝑤ˆ = arg max 𝑃 (𝑤|𝑥)                                   (1)
                                                     𝑤∈𝑉

or, by applying Bayes’ rule:
                                         𝑤
                                         ˆ = arg max 𝑃 (𝑥|𝑤)𝑃 (𝑤)                               (2)
                                                  𝑤∈𝑉

where 𝑤ˆ indicates the best correction for the misspelled word 𝑥; word 𝑤 is selected from a
given and finite vocabulary 𝑉 .
  Since the prior probability 𝑃 (𝑤) carries too little information, we replace it with a Language
Model (LM) that involves context 𝑃 (𝑤|𝑤𝑖−1 ). Finally, we weight the LM with a parameter
𝜆 and, for numerically stability reasons, we move on the logarithmic space. The equation is
therefore:
                         𝑤
                         ˆ = arg max 𝑙𝑜𝑔𝑃 (𝑥|𝑤) + 𝜆𝑙𝑜𝑔𝑃 (𝑤|𝑤𝑖−1 )                              (3)
                                       𝑤∈𝑉


3.1. Data
Unlike English, languages such as Italian are characterized by limited publicly available resources.
Furthermore, considering the specificity of clinical language, the availability of medical texts
is even more difficult. Medical terms such as surgical procedures, drugs, anatomical parts etc.
constitute a very specific vocabulary that differs from what we can normally find in Italian public
corpora. It is therefore necessary to find a collection of suitable medical/scientific documents
and build new Language Models on them. Following the suggestions of [23] we collected 2M
sentences from Wikipedia scientific articles6 , informative articles from the Ministry of Health’s
website7 , pathologies, drugs and package inserts from Dica338 (a popular medical information

    6
      https://it.wikipedia.org/ – Apr 2021 dump
    7
      https://www.issalute.it/index.php/la-salute-dalla-a-alla-z – retrieved Jun 2021
    8
      https://www.dica33.it/ – retrieved Jun 2021


                                                         21
Roger Ferrod et al. CEUR Workshop Proceedings                                                   19–28


web-site) and – to build a more accurate model of the Italian language – a selection of newspaper
articles9 . Finally, we integrated our corpus with personal medical resumes, which cover most of
the subjects studied during the university course. Details on the composition of the corpus are
shown in the Table 1. A common feature of the corpora described above is the control over the
texts, which limits the presence of typing errors (contrarily to what can happen in a hospital
environment).
   For computational reasons, and to avoid too rare expressions (a symptom of a possible error)
we only consider the elements that occur more than 8 times for the terms and 48 for the n-grams.
The resulting vocabulary has a total of 787,940 unique words.

                                 Source                     Sentences        Words
                                 Wikipedia                  1,096,672        25,605,524   36%
                                 News                       247,872          5,878,905    8%
                                 Ministry of Health         39,838           1,151,371    2%
                                 Dica33                     1,059,063        37,333,844   53%
                                 Notes                      58,160           962,408      1%
                                 TOTAL                      2,501,605        70,932,052
                               Table 1
                               Composition of the constructed corpus.

   As regards clinical documents, we relied on a sample of 200 anamnesis notes that were
provided by the hospital. The texts, once anonymized, were manually corrected by physicians,
thus constituting the gold-standard. Acronyms and abbreviations are excluded from the cor-
rection process. Out of the total 9374 words, 269 (2.87%) constitute typos, of which 28 (0.30%)
are real-word errors. Errors attributable to purely medical context are 107 (40%), of these 25
are names of drugs/active substances. 88% of typos have Damerau–Levenshtein distance of 1
from the correction (e.g. mammamria → mammaria); only 10% have DLD 2 (e.g. ematochici →
ematochimici) and a less than 1% for higher distances (e.g. idrixixizima → idroxizina).

3.2. Model
For simplicity, and considering the rarity with which they occur, we have chosen to discard real-
word errors, thus focusing on the remaining 88% of typos. These errors are easily identifiable
by searching for terms that do not match the vocabulary. Potential acronyms and abbreviations
are excluded; in this regard we have built, in collaboration with domain names, a blacklist of
terms not to be corrected.
   Once a potential error has been found, a list of candidate corrections is generated, considering
the words “similar" to the original one as candidates. Also in this case, the Damerau–Levenshtein
distance is used as similarity metric between strings. Since the generation of candidates is a
very demanding task in computational terms, we rely on the optimized tools SymSpell10 , which

   9
        https://webhose.io/free-datasets/italian-news-articles/ – Crawled Oct 2015
   10
        https://github.com/wolfgarbe/SymSpell


                                                          22
Roger Ferrod et al. CEUR Workshop Proceedings                                                            19–28


can operate under the “CLOSEST" regime (i.e. finding the first word at the shortest distance) or
“ALL" (all words with maximum distance 𝑛).
  Always with reference to Equation 3, we list below the solutions that have been implemented
and tested.

3.2.1. Error model
The simplest way to implement the channel model is to assign a probability (𝛼) for the event
𝑥 = 𝑤 (i.e. the word found is not an error even if it does not appear in the vocabulary) and use
the Damerau–Levenshtein distance to evaluate other cases.
                                      {︃
                                        𝛼                   if 𝑥 = 𝑤
                          𝑃 (𝑥|𝑤) =
                                        −𝑙𝑜𝑔(𝐷(𝑥, 𝑤)) otherwise
   In 1991, [24] proposes a simple, but effective, model that uniformly distribute the 1 − 𝛼 prob-
ability over all generated candidates 𝐶(𝑥). The formula, to which we have added a parameter 𝜖
as lower-bound, is therefore:

                                                      if 𝑥 = 𝑤
                                            ⎧
                                            ⎨𝛼
                                            ⎪
                                𝑃 (𝑥|𝑤) = |𝐶(𝑥)| if 𝑥 ∈ 𝐶(𝑥)
                                                1−𝛼
                                            ⎪
                                               𝜖      otherwise
                                            ⎩

  Finally, we tested a slightly more sophisticated variant, proposed by [25], by replacing the
uniform distribution with a more informed probability on the characteristics of the language.
More specifically, [25] introduced confusion matrices, for each transformation11 , that lists the
number of times one character was confused with another one. We can formulate the model as:

                                                           if 𝑥 = 𝑤
                                    ⎧
                                    ⎨𝛼
                                    ⎪
                                             𝑒𝑑𝑖𝑡(𝑥𝑖 ,𝑤𝑗 )
                          𝑃 (𝑥|𝑤) = Π𝑒𝑑𝑖𝑡 𝑐𝑜𝑢𝑛𝑡(𝑥𝑖 ,𝑤𝑗 ) if 𝑥 ∈ 𝐶(𝑥)
                                    ⎪
                                                           otherwise
                                    ⎩
                                      𝜖

3.2.2. Language model
We have chosen to approach the language model in two ways: through n-grams, either monodi-
rectional (𝑤𝑖−2 , 𝑤𝑖−1 , 𝑤𝑖 ) or bidirectional (𝑤𝑖−1 , 𝑤𝑖 , 𝑤𝑖+1 ), or through contextualized word
embeddings (Masked Language Model). The n-grams are calculated according to the “stupid
backoff” scheme [26], having a significant number of tokens available. With regard to the
embeddings models, we have experimented with Italian pre-trained BERT-like models such as
ELECTRA[27]12 , RoBERTa[28]13 and XLMRoberta[29]14 (multi-language RoBERTa that includes
Italian).

   11
      Transformations allowed by Damerau–Levenshtein are: deletion, insertion, substitution and transposition
   12
      dbmdz/electra-base-italian-xxl-cased-generator
   13
      idb-ita/gilberto-uncased-from-camembert
   14
      xlm-roberta-base


                                                      23
Roger Ferrod et al. CEUR Workshop Proceedings                                                  19–28


            Model                    Acc     F1       P       R       TN     FN    FP    TP
            Microsoft Office         93.93   21.52    17.26   28.57   8727   195   374   78
            Microsoft Office + voc   92.72   33.53    22.84   63.00   8520   101   581   172
            Google Docs              98.45   74.96    70.91   79.49   9012   56    89    217
            Google Docs + voc        98.57   75.90    74.56   77.29   9029   62    72    211
            LanguageTool             95.71   10.27    13.14   8.42    8949   250   152   23
            LanguageTool + voc       96.47   16.20    26.23   11.72   9011   241   90    32
            Hunspell                 95.31   6.38     7.61    5.49    8919   258   182   15
            Hunspell + voc           96.16   9.55     15.20   6.96    8995   254   106   19
            N-Grams                  97.98   66.31    64.58   68.13   8999   87    102   186
            ELECTRA                  97.91   62.31    65.59   59.34   9016   111   85    162
            RoBERTa                  97.59   58.30    58.74   57.87   8990   115   111   158
            XLMRoberta               97.55   57.09    58.17   56.04   8991   120   110   153
Table 2
Results obtained from current models (above) and our proposals (below), where TN: non-error words
not corrected; FN: misspelled words not corrected; FP: non-error words erroneously corrected; TP: mis-
spelled words corrected.


4. Results
We tested the different models on the gold standard described above, comparing the results
with the state of the art as shown in Table 2. Unfortunately the absence of clinical corpus, as
well as publicly available models, makes it difficult, if not impossible, the comparison between
spell-checkers in the hospital setting. For this reason we have relied on commonly used general
purpose tools such as: Hunspell15 , LanguageTool16 , Google Docs17 and Microsoft Office18 . To
standardize the results we have chosen to correct all the possible typos suggested by the software,
replacing them with the first suggestion. It is also necessary to consider the specificity of the
medical vocabulary, which is usually absent in generic tools. For this reason we have excluded
all the terms that belong to the vocabulary in our possession and the blacklist defined with the
experts. In the Table 2 it is possible to distinguish the two cases (with or without the vocabulary
extension) by means of the label “+voc".
   As regards the models developed by us, the optimal configuration of the parameters is shown
in Table 3. Any change to them does not bring any benefit, as described below. Among the
proposed solutions, the best model is the combination of uniform distribution, as error model,
and n-grams for the language model.
   The advantage of n-grams, over word embeddings, may also be due to the generic nature of
the embeddings used; unfortunately, however, the few data available did not allow us to train a

   15
      http://hunspell.github.io/
   16
      https://languagetool.org/it
   17
      https://docs.google.com
   18
      https://www.office.com


                                                     24
Roger Ferrod et al. CEUR Workshop Proceedings                                              19–28


                                     Parameter        Value
                                     alpha            0.05
                                     lambda           1.0
                                     epsilon          1e-15
                                     n-grams size     5
                        Table 3
                        Optimized hyperparameters for the proposed model.


new model from scratch. By excluding Wikipedia and newspapers from the initial corpus, thus
focusing on purely medical texts, we can obtain 1 157 061 sentences, useful – after subdivision
into training set (96%) and validation set (4%) – to continue the training of the ELECTRA model.
However, the effort is vain, failing to differ much from the performance of ELECTRA without
fine-tuning; for this reason the results are omitted from the table. The ELECTRA model is
however better than other architectures such as RoBERTa and its multilingual variant. A possible
advantage in the use of the neural model, in addition to a slight improvement in Precision,
consists in the significant reduction of processing times (15 min for the n-grams case vs 25 sec
for ELECTRA).
   Focusing instead on n-grams, a reduction in their size (passing from 5 to 3) slightly decrease
the performances (F1 from 66.31 to 66.19). Similarly, the monodirecional/bidirectional choice
is almost irrelevant in terms of score. On the contrary, the application of standardization
techniques (stemming and numbers masking) considerably worsens the scores, obtaining F1
62.97. The result is not surprising, as a similar phenomenon has already been observed by [23].
In that case the normalization consisted of lemmatization, but without reliable POS tagging, the
lemmatization is de facto reduced to stemming, while the presence of errors, abbreviations and
technical terminology makes POS tagging unreliable.
   With regard to the error model, on the other hand, the use of the confusion matrix is
counterproductive, lowering the F1 score by 8 points on average. While using distance alone as
a probability measure has no benefit over the uniform distribution.


5. Discussion and Conclusions
Most of the gaps highlighted in the previous section are probably attributable to the scarcity of
data. The examples collected by [23] and [30], for the Italian language, are still too few to be
exploited in a machine learning scenario. Meanwhile, synthetic datasets, such as the one used
in [21], do not carry with them any useful information to characterize the typical errors of the
Italian language. For this reason we are working on the construction of a corpus of common
errors for Italian, with the aim of collecting a few thousand pairs ⟨𝑡𝑦𝑝𝑜, 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛⟩. Such a
dataset would provide the basis for training more sophisticated error models to replace the
uniform distribution used in this work.
   The texts that characterize the medical domain are also particularly interesting. Just think,
for example, that the addition of medical notes (which weighs just over 1%) made it possible to
improve performance by 4 points, passing from F1 64.37 to F1 66.31, although with a different


                                                 25
Roger Ferrod et al. CEUR Workshop Proceedings                                                 19–28


nature of texts. Indeed, all the texts used in our corpus present linguistic characteristics that
are very different from those that appear in clinical documents. The Wikipedia entries, for
example, provide encyclopedic information, as well as the pages of medical disclosure and
personal essays. The search for texts closer to clinical reality will be a fundamental objective
that we will pursue in future works.
   We also think that - in addition to a mere increase in the size of the datasets for statistical
learning purposes - also the integration of syntactic parsing and POS tagging can improve the
results. This is especially true in a low resources languages, like Italian, where machine learning
is severely limited. For this reason we are conducting a study aimed at evaluating the reliability
of these techniques on medical texts.
   Finally, the abbreviations (standard and non-standard) usually used in texts remain to be
addressed. In this regard it is difficult to think of a pipeline that orders the 3 tasks to be
performed: POS tagging, acronyms disambiguation and spelling correction. More likely, the
3 tasks will be carried out in parallel, as one can help the other. We will also evaluate this
possibility in future works.
   Not having reached the state of the art, represented by Google’s spell checker, we believe
there is still room for improvement. Moreover, the task is of fundamental importance in order
to continue with the analysis of the texts and, ultimately, for clinical decision support.


Acknowledgments
This research has been partially carried out within the “Circular Health for Industry” project
funded by “Compagnia di San Paolo” under the call “IA, uomo e società”.


References
 [1] M. Matarese, M. Lommi, M. G. De Marinis, B. Riegel, A systematic review and integration
     of concept analyses of self-care and related concepts, Journal of Nursing Scholarship 50
     (2018) 296–305.
 [2] K. T. Hickey, S. Bakken, M. W. Byrne, D. C. E. Bailey, G. Demiris, S. L. Docherty, S. G. Dorsey,
     B. J. Guthrie, M. M. Heitkemper, C. S. Jacelon, T. J. Kelechi, S. M. Moore, N. S. Redeker,
     C. L. Renn, B. Resnick, A. Starkweather, H. Thompson, T. M. Ward, D. J. McCloskey, J. K.
     Austin, P. A. Grady, Precision health: Advancing symptom and self-management science,
     Nursing Outlook 67 (2019) 462–475.
 [3] F. Alloatti, A. Bosca, L. Di Caro, F. Pieraccini, Diabetes and conversational agents: the aida
     project case study, Discover Artificial Intelligence 1 (2021).
 [4] M. Dumas, W. M. P. van der Aalst, A. H. M. ter Hofstede (Eds.), Process-Aware Information
     Systems: Bridging People and Software Through Process Technology, Wiley, 2005.
 [5] E. Rojas, J. Munoz-Gama, M. Sepúlveda, D. Capurro, Process mining in healthcare: A
     literature review, Journal of biomedical informatics 61 (2016) 224–236.
 [6] I. A. Amantea, E. Sulis, G. Boella, R. Marinello, D. Bianca, E. Brunetti, M. Bo, C. Fernández-
     Llatas, A process mining application for the analysis of hospital-at-home admissions, in:
     L. B. Pape-Haugaard, C. Lovis, I. C. Madsen, P. Weber, P. H. Nielsen, P. Scott (Eds.), Digital


                                                 26
Roger Ferrod et al. CEUR Workshop Proceedings                                                19–28


     Personalized Health and Medicine - Proceedings of MIE 2020, Medical Informatics Europe,
     Geneva, Switzerland, April 28 - May 1, 2020, volume 270 of Studies in Health Technology
     and Informatics, IOS Press, 2020, pp. 522–526.
 [7] R. Aringhieri, G. Boella, E. Brunetti, L. D. Caro, C. D. Francescomarino, M. Dragoni,
     R. Ferrod, C. Ghidini, R. Marinello, M. Ronzani, E. Sulis, Towards the application of process
     mining for supporting the home hospitalization service, in: A. Marrella, D. T. Dupré (Eds.),
     Proceedings of the 1st Italian Forum on Business Process Management co-located with
     the 19th International Conference of Business Process Management (BPM 2021), Rome,
     Italy, September 10th, 2021, volume 2952 of CEUR Workshop Proceedings, CEUR-WS.org,
     2021, pp. 33–38.
 [8] I. A. Amantea, M. Arnone, A. D. Leva, E. Sulis, D. Bianca, E. Brunetti, R. Marinello, Modeling
     and simulation of the hospital-at-home service admission process, in: M. S. Obaidat, T. I.
     Ören, H. Szczerbicka (Eds.), Proceedings of the 9th International Conference on Simulation
     and Modeling Methodologies, Technologies and Applications, SIMULTECH 2019, Prague,
     Czech Republic, July 29-31, 2019, SciTePress, 2019, pp. 293–300.
 [9] K. Kukich, Techniques for automatically correcting words in text, ACM Comput. Surv. 24
     (1992) 377–439.
[10] T. A. Pirinen, K. Lindén, State-of-the-art in weighted finite-state spell-checking, in:
     CICLing, 2014.
[11] S. Deorowicz, M. Ciura, Correcting spelling errors by modelling their causes, International
     Journal of Applied Mathematics and Computer Science 15 (2005) 275–285.
[12] C. Whitelaw, B. Hutchinson, G. Chung, G. Ellis, Using the web for language independent
     spellchecking and autocorrection, in: EMNLP, 2009.
[13] G. Damnati, J. Auguste, A. Nasr, D. Charlet, J. Heinecke, F. Bechet, Handling Normalization
     Issues for Part-of-Speech Tagging of Online Conversational Text, in: Eleventh International
     Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.
[14] J. Dziadek, A. Henriksson, M. Duneld, Improving terminology mapping in clinical text
     with context-sensitive spelling correction, Studies in Health Technology and Informatics
     235 (2017) 241–245.
[15] A. Sorokin, T. Shavrina, Automatic spelling correction for russian social media texts, 2016.
[16] Y. Lv, Y. Deng, M. Liu, Q. Lu, Automatic error checking and correction of electronic
     medical records, in: G. Chen, F. Liu, M. Shojafar (Eds.), Fuzzy System and Data Mining
     - Proceedings of FSDM 2015 [Shanghai, China, December 12-15, 2015], volume 281 of
     Frontiers in Artificial Intelligence and Applications, IOS Press, 2015, pp. 32–40.
[17] M. Banko, E. Brill, Scaling to very very large corpora for natural language disambigua-
     tion, in: Proceedings of the 39th Annual Meeting of the Association for Computational
     Linguistics, Association for Computational Linguistics, Toulouse, France, 2001, pp. 26–33.
[18] J. Vilares, M. A. Alonso, Y. Doval, M. Vilares, Studying the effect and treatment of misspelled
     queries in cross-language information retrieval, Inf. Process. Manage. 52 (2016) 646–657.
[19] G. Héja, G. Surján, Using n-gram method in the decomposition of compound medical
     diagnoses, International journal of medical informatics 70 (2003) 229—236.
[20] J. Gupta, Z. Qin, M. Bendersky, D. Metzler, Personalized online spell correction for personal
     search, in: The World Wide Web Conference, WWW ’19, Association for Computing
     Machinery, New York, NY, USA, 2019, p. 2785–2791.


                                                27
Roger Ferrod et al. CEUR Workshop Proceedings                                                19–28


[21] L. Sbattella, R. Tedesco, How to simplify human-machine interaction: A text complexity
     calculator and a smart spelling corrector, in: Proceedings of the 4th EAI International
     Conference on Smart Objects and Technologies for Social Good, Goodtechs ’18, Association
     for Computing Machinery, New York, NY, USA, 2018, p. 304–305.
[22] A. Kuznetsov, H. Urdiales, Spelling correction with denoising transformer, 2021.
     arXiv:2105.05977.
[23] E. Mensa, G. M. Marino, D. Colla, M. Delsanto, D. P. Radicioni, A resource for detecting
     misspellings and denoising medical text data, in: J. Monti, F. dell’Orletta, F. Tamburini
     (Eds.), Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-
     it 2020, Bologna, Italy, March 1-3, 2021, volume 2769 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2020.
[24] E. Mays, F. J. Damerau, R. L. Mercer, Context based spelling correction, Information
     Processing & Management 27 (1991) 517–522.
[25] M. D. Kernighan, K. W. Church, W. A. Gale, A spelling correction program based on a
     noisy channel model, in: COLING, 1990.
[26] T. Brants, A. C. Popat, P. Xu, F. J. Och, J. Dean, Large language models in machine
     translation, in: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural
     Language Processing and Computational Natural Language Learning (EMNLP-CoNLL),
     Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 858–867.
[27] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, ELECTRA: Pre-training text encoders as
     discriminators rather than generators, in: ICLR, 2020.
[28] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoy-
     anov, Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692.
[29] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
     M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at
     scale, 2020. arXiv:1911.02116.
[30] M. Hagiwara, M. Mita, Github typo corpus: A large-scale multilingual dataset of mis-
     spellings and grammatical errors, in: LREC, 2020.


                                                28