Towards SMT-Assisted Error Annotation of Learner Corpora


              Nadezda Okinina                                    Lionel Nicolas
                Eurac Research                                   Eurac Research
         viale Druso 1, Bolzano, Italy                    viale Druso 1, Bolzano, Italy
      nadezda.okinina@eurac.edu                         lionel.nicolas@eurac.edu


                                                        ardised and homogeneous way and documented
                        Abstract                        as to their origin and provenance” (Granger,
                                                        2002). Error-annotated learner corpora serve the
       English. We present the results of proto-        needs of language acquisition studies and peda-
       typical experiments conducted with the           gogy development as well as help the creation of
       goal of designing a machine translation          natural language processing tools such as auto-
       (MT) based system that assists the anno-         matic language proficiency level checking sys-
       tators of learner corpora in performing          tems (Hasan et al., 2008) or automatic error de-
       orthographic error annotation. When an           tection and correction systems (see Section 2). In
       annotator marks a span of text as errone-        this paper we present our first attempts at creat-
       ous, the system suggests a correction for        ing a system that would assist annotators in per-
       the marked error. The presented experi-          forming orthographic error annotation by sug-
       ments rely on word-level and character-          gesting a correction for specific spans of text se-
       level Statistical Machine Translation            lected and marked as erroneous by the annota-
       (SMT) systems.                                   tors. In the prototypical experiments, the sugges-
                                                        tions are generated by word-level and character-
       Italian. Presentiamo i risultati degli           level SMT systems.
       esperimenti prototipici condotti con lo          This paper is organized as follows: we review
       scopo di creare un sistema basato sulla          existing approaches to automatic error correction
       traduzione automatica (MT) che assista           (Section 2), introduce our experiments (Sec-
       gli annotatori dei corpora degli appren-         tion 3), present the data we used (Section 4), de-
       denti di lingue durante il processo di an-       scribe and discuss the performed experiments
       notazione degli errori ortografici. Quan-        (Section 5) and conclude the paper (Section 6).
       do un annotatore segna un segmento di
       testo come errato il sistema suggerisce          2    Related Work
       una correzione dell’errore segnato. Gli
                                                        Orthographic errors are mistakes in spelling, hy-
       esperimenti presentati utilizzano dei si-
       stemi statistici di traduzione automatica        phenation, capitalisation and word-breaks (Abel
       (SMT) al livello di parole e di caratteri.       et al., 2016). Automatic orthographic error cor-
                                                        rection can benefit from methods recently devel-
                                                        oped for grammatical error correction (GEC)
1       Introduction
                                                        such as methods relying on SMT and Neural
Manual error annotation of learner corpora is a         Machine Translation (NMT) (Chollampatt et al.,
time-consuming process which is often a bottle-         2017, Ji et al., 2017, Junczys-Dowmunt et al.,
neck in learner corpora research. “Computer             2016, Napoles et al., 2017, Sakaguchi et al.,
learner corpora are electronic collections of au-       2017, Schmaltz et al., 2017, Yuan et al., 2016
thentic FL/SL textual data assembled according          etc.). These approaches treat error correction as a
to explicit design criteria for a particular            MT task from incorrect to correct language. In
SLA/FLT1 purpose. They are encoded in a stand-          the case of orthographic error correction these
                                                        “languages” are extremely close, which greatly
1
    FL: foreign language, SL: second language, SLA:     facilitates the MT task. In that aspect, error cor-
       second language acquisition, FLT: foreign lan-   rection is similar to the task of translating close-
       guage teaching                                   ly-related languages such as, for example, Mace-
donian and Bulgarian (Nakov et al., 2012). In our      ied (Abel et al., 2013, Abel et al., 2015, Abel et
experiments, we rely on the implementation of          al., 2016, Abel et al., 2017, Zanasi et al., 2018).
SMT models provided by the Moses toolkit               Preliminary experiments with the freely available
(Koehn et al., 2007).                                  vocabulary-based spell checking tool Hunspell2
SMT and NMT can be easily adapted to new               yielded unsatisfactory results (see Section 5.1)
languages, but their performance depends on the        and incited us to try SMT in order to train an er-
amount and quality of the training data. In order      ror-correction system and tune it to the specific
to make up for lack of parallel corpora of texts       nature of our data. We thus performed a series of
containing language errors and their correct           experiments to perform a preliminary evaluation
equivalents, various techniques for resource con-      of the range of performances of different n-gram
struction have been suggested, such as using the       models when trained on small-scale data (Sec-
World Wide Web as a corpus (Whitelaw et al.,           tion 5.1), studied the impact of the similarity be-
2009), parsing corrective Wikipedia edits              tween training data and test data to understand
(Grundkiewicz et al., 2014) or injecting errors in     which datasets are the most optimal to train our
error-free text (Ehsan et al., 2013). For our proto-   models on (Sections 5.2 and 5.3) and finally
typical experiments, we deliberately limit our-        made preliminary attempts to improve the per-
selves to the manually-curated high-quality data       formance by optimising the usage of the SMT
at our disposal and use existing German error-         systems (Section 5.4).
annotated corpora as training data.                    As our systems are not directly comparable to
In recent years learner corpora of German have         GEC systems, the usual metrics used to evaluate
been used for the creation of systems for auto-        GEC systems are not fully adequate, because
matic German children’s spelling errors correc-        they target a similar but different use case. We
tion (Stüker et al., 2011, Laarmann-Quante,            thus evaluate our systems according to their ac-
2017), but no work has been done on automatic          curacy that we define as a ratio between the
orthographic error correction of adult learner         number of suggestions matching the target hy-
texts.                                                 pothesis present in the test data (TH)3 and the
                                                       whole number of annotated errors. However,
3    Objectives of the Experiments                     accuracy is not the only criteria as it is also im-
                                                       portant not to disturb the annotators with irrele-
The particularity of our work is that we focus on
                                                       vant suggestions: it is better not to suggest any
a specific use-case where annotators are assisted
                                                       TH than to suggest a wrong one. In order to con-
in error-tagging newly created learner corpora.
                                                       trol the ratio between right and wrong sugges-
To ensure the relevance of our system and limit
                                                       tions, we also evaluate our systems according to
false positives that would hinder its adoption, the
                                                       their precision. We define precision as a ratio
targeted use-case is to only suggest corrections
                                                       between the number of suggestions matching the
while leaving the task of selecting the error to the
                                                       TH and the whole number of suggestions, correct
linguist. Aforementioned GEC systems take as
                                                       and incorrect, thus excluding the errors for which
input text containing language errors and pro-
                                                       the system was consulted, but no correction was
duce corrected text. Thus, they may introduce
                                                       suggested. Precision is mainly used as a quality
changes in any part of the text, even where no
                                                       threshold which should remain high, whereas our
errors are observed. In order to prevent such be-
                                                       main performance measure is accuracy.
havior, we only submit to our system spans of
text marked as erroneous by annotators, while          4    Corpora Used
leaving out spans of text not containing errors.
Therefore, our system is not directly comparable       Our experiments rely on three error-annotated
to existing GEC systems.                               learner corpora: KoKo, Falko and MERLIN.
A given language error may have more than one          KoKo is a corpus of 1.503 argumentative essays
possible correction, but in the presented research     (811.330 tokens) of written German L1 4 from
we limit ourselves to orthographic errors that in      high school pupils, 83% of which are native
most cases have only one correction (Nerius et         speakers of German (Abel et al., 2016). It relies
al., 2007). Our system is meant to be used for the
creation of new learner corpora in the Institute
for Applied Linguistics where learner corpora of       2
                                                         http://hunspell.github.io/
                                                       3
German, Italian and English are created and stud-        The TH corresponds to a correction associated with
                                                             each error (Reznicek et al., 2013).
                                                       4
                                                         first language, native language
on a very precise error annotation scheme with         learner texts and their corrected versions from
29 types of orthographic errors.                       Falko and KoKo. In each fold of the 10-fold val-
The Falko corpus consists of six subcorpora            idation, 1/10 of KoKo is taken out of the training
(Reznicek et al., 2012) out of which we are using      corpus and used as a validation corpus.
the subcorpus of 107 error-annotated written           Since our objective was to only observe the
texts by advanced learners of L2 5 German              overall adequateness of the SMT models, we on-
(122.791 tokens).                                      ly attempted to optimise the way the SMT mod-
The MERLIN corpus was compiled from stand-             els were used at a later stage (see Section 5.4).
ardized, CEFR6-related tests of L2 German, Ital-       These prototypical experiments showed that all
ian and Czech (Boyd et al., 2014). We are using        the SMT models have a rather high precision and
the German part of MERLIN that contains 1033           that, for this amount of training data, the SMT
learner texts (154.335 tokens): a little bit more      model that performed best is the word 5-gram
than 200 texts for each of the covered CEFR lev-       model. It yielded an encouraging result of 39%
els (A1, A2, B1, B2, and C1).                          of accuracy and 89% of precision, which is far
Due to the differences in content and format, we       better than the 11% of accuracy and 8% of preci-
do not use all three learner corpora in all the ex-    sion originally obtained with Hunspell. However,
periments. KoKo is our main corpus, because of         39% of accuracy were obtained by training on
its larger size, easy to use format and detailed       Falko and 9/10 of KoKo and validating on 1/10
orthographic error annotation. We use it in train-     of KoKo, which would be the configuration we
ing, validation and testing of our SMT systems.        would have towards the end of the annotation of
Falko is smaller and its format does not allow an      a new learner corpus. We thus proceeded with
easy alignment of orthographic errors, we thus         our experiments by testing how the SMT models
only use it in some experiments as part of the         would perform at an earlier stage.
training corpus (Sections 5.1 and 5.2). MERLIN
was annotated similarly to KoKo, therefore er-                       word-grams          character-grams
ror-correction results obtained for these two cor-              1     3     5     10     6     10     15
pora are easily comparable. Furthermore, MER-          Prec.   84%   87%   89%    84%   83%   86%    87%
LIN is representative of different levels of lan-      Acc.    32%   37%   39%    38%   16%   21%    29%
guage mastery. We thus use it for testing some of
                                                       Table 1: 10-fold validation on KoKo of SMT models
our systems (Section 5.2).
                                                       trained on KoKo and Falko.
As the language model for our character-based
SMT systems cannot be generated from the lim-          5.2     Testing the Models on New Data
ited amount of data provided by learner corpora,
                                                       At an early stage of the annotation of a new
for that purpose we used 3.000.000 sentences of
                                                       learner corpus, an error-correction system could
a German news subcorpus from the Leipzig Cor-
                                                       be trained on an already existing corpus. We thus
pora Collection7.
                                                       tried to apply the different models trained on
5     Prototypical Experiments                         Falko, KoKo and the newspapers to MERLIN.
                                                       However, none of the 7 models presented in the
5.1    Testing Different N-Gram Models                 previous section achieved more than 13% of ac-
                                                       curacy and 70% of precision on the whole
We started by testing SMT word and character-
                                                       MERLIN corpus. Despite that, these experiments
based language models with various numbers of
                                                       highlighted an interesting aspect: all the models
n-grams in order to understand which one could
                                                       performed better on MERLIN texts of higher
suffer less from data scarcity and thus best suit
                                                       CEFR levels compared to MERLIN texts of low-
our data8 (Table 1). We used Moses default val-
                                                       er CEFR levels (Table 2). We suspect this phe-
ues for all the other parameters. The systems
                                                       nomenon to be due to the fact that the level of
were trained on a parallel corpus composed of
                                                       language mastery of MERLIN texts of higher
                                                       CEFR levels is closer to the level of language
5
  second language, foreign language                    mastery of KoKo and Falko texts. This observa-
6
  Common European Framework of Reference for           tion indicates that the training and test data must
      Languages                                        attest to the same level of language mastery, be-
7
  http://hdl.handle.net/11022/0000-0000-2417-E         cause mistakes made by beginner language
8
   The computational results presented have been       learners tend to differ noticeably from mistakes
achieved in part using the Vienna Scientific Cluster   made by advanced language learners. Therefore,
(VSC).
using existing learner corpora as training data is          only slightly deteriorated the precision (Table 3,
a difficult task as most of them target different           line 2).
types of learners with different profiles and bias          In order to further improve the performance, we
towards specific kinds of errors.                           decided to combine the word-based and charac-
                                                            ter-based systems. For this first experiment we
                 A1      A2      B1       B2       C1
                                                            chose the best-performing of the word-based sys-
     Prec.       60%    61%      77%     72%      78%       tems which is the word 5-gram model and the
      Acc.       15%     9%      12%     14%      17%       second best performing of the character-based
Table 2: precision and accuracy of the word 5-gram          systems which is the character 10-gram model.
model trained on KoKo and Falko when tested on              We chose the character 10-gram model for prac-
MERLIN texts of different CEFR levels.                      tical reasons: it is considerably less resource-
                                                            consuming than the character 15-gram model. By
5.3          Training and Testing on One Corpus             applying both the word 5-gram and the character
The results of the previous experiments incited             10-gram models to the same data and comparing
us to train an SMT model on a small part of a               the overlap in their responses, we verified their
corpus and test it on a bigger part of the same             degree of complementarity. This experiment
corpus in order to observe how an SMT model                 showed that only in 18% of cases the word-based
would behave when trained on an already anno-               and character-based models both suggest a cor-
tated part of a new learner corpus. We thus per-            rection (corresponding or not to the TH). In 39%
formed 3-fold validation experiments with a                 of cases only the word-based system suggests a
word 5-gram model taking 1/3 of KoKo as train-              correction and in 5% of cases only the character-
ing data and 2/3 of KoKo as test data and ob-               based system suggests a correction. It means that
tained 30% of accuracy9. This result was much               by combining the two systems it is possible to
better than 13% of accuracy we had obtained by              improve the overall performance. We calculated
training SMT systems on KoKo and Falko and                  the maximum theoretical accuracy 10 of such a
testing them on MERLIN. We thus decided to                  combined system and came to a conclusion that
pursue our experiments with KoKo as both train-             it cannot exceed 53% when trained on 1/3 of
ing and test data.                                          KoKo and 60% when trained on 2/3 of KoKo
In order to observe the evolution of the system’s           (Table 3, line 3).
performance with the growth of the corpus, we               By simply giving preference to the word-based
also trained it on 2/3 of KoKo and tested it on             model before consulting the character-based
1/3 of KoKo. Augmenting the training corpus                 model, we almost achieved the maximum theo-
size did not change the system’s performance                retical accuracy (Table 3, line 4).
(Table 3, line 1). Such results tend to indicate            However, we realised that by augmenting the
that most of the performance can be obtained at             training corpus size, we augmented the accuracy,
an earlier stage of the annotation process.                 but slightly deteriorated the precision.
                                                            By analysing the performance of different mod-
5.4          Improving the Performance                      ules (word 5-gram highest-ranked suggestions,
After evaluating the impact of the training data            word 5-gram lower-ranked suggestions, charac-
on the system’s performance, we switched our                ter 10-gram) on different kinds of errors, we
focus to the optimisation of the way SMT models             could observe that their performance differs ac-
were used. First of all, we tried to take into ac-          cording to types of errors. For example, the low-
count not only the highest-ranked suggestion of             er-ranked suggestions of the word-based model
Moses, that in many cases was equal to the error            introduce a lot of mistakes in the correction of
text (i.e. no correction was suggested), but also           errors where one word was erroneously written
the lower-ranked suggestions in order to find the           as two separate words (e.g. Sommer fest instead
highest-ranked suggestion that was different
from the error text. This change considerably               10
improved the accuracy for both corpus sizes and              The maximum theoretical accuracy would be
                                                               achieved if it was possible to always choose the
                                                               right system to consult for each precise error
9
    We also calculated the BLEU score for this model           (word-based or character-based) and never con-
      and obtained 95%. This result shows that the             sult the system that gave a wrong result when the
      BLEU score is irrelevant for the evaluation of er-       other system gave a correct result. In that case the
      ror correction systems such as ours that cannot in-      maximum potential of both systems would be
      troduce errors in error-free spans of text.              used.
of Sommerfest). We tried to prevent such false          vant the tuning of parameters can be for such a
corrections by not consulting the lower-ranked          MT task.
suggestions of the word-based model for errors          The choice of training data for our experiments
containing spaces. By introducing this rule we          was dictated by the availability of high-quality
succeeded in improving the precision at the cost        resources. In future experiments we would like to
of loosing some accuracy (Table 3, line 5). This        enlarge the spectrum of resources considered for
experiment showed that add-hoc rules might not          our experiments and work with other languages,
be a workable solution and a more sophisticated         in particular with Italian and English.
approach should be considered if we intend to
dynamically combine several systems. In order           Acknowledgements
to obtain better results combining two or more
                                                        We would like to thank the reviewers as well as
word-based and character-based systems, further
                                                        our colleagues Verena Lyding and Alexander
experiments should be conducted.
                                                        König for their useful feedback and comments.
                              train. 1/3   train. 2/3
                              valid. 2/3   valid. 1/3   References
1 word highest-ranked corr.   30% (88%) 30% (88%)       Abel, A., Konecny, C., Autelli, E.: Annotation and
2 word lower-ranked corr.     48% (84%) 55% (83%)         error analysis of formulaic sequences in an L2
                                                          learner corpus of Italian, Third International
  max. theoretical accuracy
3 word lower-ranked           53% (85%) 60% (84%)
                                                          Learner Corpus Research Conference, 2015, Book
  + character                                             of abstracts, pp. 12-15.
    word lower-ranked                                   Abel, A., Glaznieks, A., Nicolas, L., Stemle, E.: An
4                             53% (84%) 59% (83%)
    + character                                           extended version of the KoKo German L1 Learner
  word lower-ranked
                                                          corpus, Proceedings of the Third Italian Confer-
5 +character                  52% (88%) 57% (88%)         ence on Computational Linguistics CliC-it, Naples,
  with rule on spaces                                     Italy, 2016, pp. 13-18.
Table 3: accuracy and precision (in brackets) of dif-   Abel, A., Glaznieks, A.: „Ich weiß zwar nicht, was
ferent systems according to training corpus size (3-      mich noch erwartet, doch ...“ – Der Einsatz von
fold validation on KoKo).                                 Korpora      zur      Analyse     textspezifischer
                                                          Konstruktionen des konzessiven Argumentierens
6    Conclusion                                           bei Schreibnovizen, Corpora in specialized
                                                          communication, vol. 4, Bergamo, 2013, pp. 101-
Our preliminary experiments brought us to the             132.
conclusion that a SMT system trained on a man-
                                                        Abel, A., Vettori, C., Wisniewski, K.: KOLIPSI. Gli
ually annotated part of a learner corpus can be
                                                          studenti altoatesini e la seconda lingua: indagine
helpful in error-tagging the remaining part of the        linguistica e psicosociale, vol. 2, Eurac Research,
same learner corpus: it is possible to train a sys-       2017.
tem that would propose the right correction for
half of the orthographic errors outlined by the         Boyd, A., Hana, J., Nicolas, L., Meurers, D.,
                                                          Wisniewski, K., Abel, A., Schöne, K., Štindlová,
annotators while proposing very few wrong cor-
                                                          B., Vettori, C.: The MERLIN corpus: Learner lan-
rections. Such results are satisfactory enough to         guage and the CEFR, Proceedings of the Ninth In-
start integrating the system into the annotation          ternational Conference on Language Resources
tool we use to create learner corpora (Okinina et         and Evaluation (LREC), 2014, pp. 1281-1288.
al., 2018).
The combination of a word-based and a charac-           Bryant, C.: Language Model Based Grammatical Er-
                                                          ror Correction without Annotated Training Data,
ter-based systems gave promising results, there-
                                                          Proceedings of the Thirteenth Workshop on Inno-
fore we intend to continue experimenting with             vative Use of NLP for Building Educational Appli-
multiple combinations of word-based and char-             cations, 2018, pp. 247–253.
acter-based systems. We are also considering the
possibility to rely on other technologies (Bryant,      Chollampatt, S., Ng, H.: Connecting the Dots: To-
                                                          wards Human-Level Grammatical Error Correc-
2018). As in our experiments we only wanted to
                                                          tion, Proceedings of the 12th Workshop on Innova-
observe the range of performances we could ex-            tive Use of NLP for Building Educational Applica-
pect, we trained our models with the default con-         tions, 2017, pp. 327-333.
figuration provided with the MOSES toolkit and
did not perform any tuning of the parameters.           Granger, S.: A Bird’s Eye View of Learner Corpus
                                                          Research. In Granger, S., Hung, J., Petch-Tyson, S.
Future efforts will focus on evaluating how rele-
  (eds.), Computer Learner Corpora, Second Lan-           Nerius, D. et al.: Deutsche Orthographie. 4., neu
  guage Acquisition and Foreign Language Teach-             bearbeitete Auflage. Hildesheim/Zürich/New York:
  ing, Amsterdam & Philadelphia: Benjamins, 2002,           Olms Verlag, 2007.
  pp. 3-33.
                                                          Okinina, N., Nicolas, L., Lyding, V.: Transc&Anno:
Ehsan, N., Faili, H.: Grammatical and context-              A Graphical Tool for the Transcription and On-the-
  sensitive error correction using a statistical ma-        Fly Annotation of Handwritten Documents, Pro-
  chine translation framework, Software – Practice          ceedings of the Eleventh International Conference
  and Experience, 2013, 43, pp. 187-206.                    on Language Resources and Evaluation (LREC),
                                                            2018, pp. 701-705.
Grundkiewicz, R., Junczys-Dowmunt, M.: The
  WikEd Error Corpus: A Corpus of Corrective Wik-         Reznicek, M., Lüdeling, A., Hirschmann, H.: Com-
  ipedia Edits and Its Application to Grammatical Er-       peting Target Hypotheses in the Falko Corpus: A
  ror Correction. In Przepiórkowski, A., Ogrod-             Flexible Multi-Layer Corpus Architecture, Auto-
  niczuk, M. (eds.), Advances in Natural Language           matic Treatment and Analysis of Learner Corpus
  Processing. NLP 2014. Lecture Notes in Computer           Data, John Benjamins Publishing Company, Am-
  Science, vol. 8686. Springer, Cham, 2014, pp. 478-        sterdam/Philadelphia, 2013, pp. 101-123.
  490.
                                                          Reznicek, M., Lüdeling, A., Krummes, C.,
Hasan, M. M., Khaing, H. O.: Learner Corpus and its         Schwantuschke,  F.:  Das     Falko-Handbuch
  Application to Automatic Level Checking using             Korpusaufbau und Annotationen, Version 2.0,
  Machine Learning Algorithms, Proceedings of               2012.
  ECTI-CON, 2008, pp. 25-28.
                                                          Sakaguchi, K., Post, M., Van Durme, B.: Grammati-
Ji, J., Wang, Q., Toutanova, K., Gong, Y., Truong, S.,      cal Error Correction with Neural Reinforcement
    Gao, J.: A Nested Attention Neural Hybrid Model         Learning, Proceedings of the Eighth International
    for Grammatical Error Correction, ArXiv e-prints,       Joint Conference on Natural Language Processing,
    2017.                                                   Asian Federation of Natural Language Processing,
                                                            Taipei, Taiwan, pp. 366–372.
Junczys-Dowmunt, M., Grundkiewicz, R.: Phrase
  based machine translation is state-of-the-art for au-   Schmaltz, A., Kim, Y., Rush, A., Shieber, S.: Adapt-
  tomatic grammatical error correction, Proceedings         ing Sequence Models for Sentence Correction,
  of the 2016 Conference on Empirical Methods in            Proceedings of the 2017 Conference on Empirical
  Natural Language Processing. Association for              Methods in Natural Language Processing, 2017,
  Computational Linguistics, Austin, Texas, 2016,           pp. 2807-2813.
  pp. 1546–1556.
                                                          Stüker S., Fay, J., Berkling, K.: Towards Context-
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C.,         dependent Phonetic Spelling Error Correction in
  Federico, M., Bertoldi, N., Cowan, B., Shen, W.,           Children’s Freely Composed Text for Diagnostic
  Moran, C., Zens, R., Dyer, C., Bojar, O., Constan-         and Pedagogical Purposes, Interspeech, 2011.
  tin, A., Herbst, E.: Moses: Open source toolkit for
                                                          Whitelaw, C., Hutchinson, B., Chung, G., Ellis, G.:
  statistical machine translation, Proceedings of ACL
                                                           Using the Web for Language Independent Spell-
  ’07, Prague, Czech Republic, 2007, pp. 177–180.
                                                           checking and Autocorrection, Proceedings of the
Laarmann-Quante, R.: Towards a Tool for Automatic          2009 Conference on Empirical Methods in Natural
  Spelling Error Analysis and Feedback Generation          Language Processing, Singapore, 2009, pp. 890-
  for Freely Written German Texts Produced by Pri-         899.
  mary School Children, Proceedings of the Seventh
                                                          Yuan, Z., Briscoe, T.,: Grammatical Error Correction
  ISCA workshop on Speech and Language Technol-
                                                            Using Neural Machine Translation, Proceedings of
  ogy in Education, 2017, pp. 36-41.
                                                            NAACL-HLT 2016, 2016, pp. 380-386.
Nakov, P., Tiedemann, J.: Combining Word-Level
                                                          Zanasi, L., Stopfner, M.: Rilevare, osservare, consul-
  and Character-Level Models for Machine Transla-
                                                            tare. Metodi e strumenti per l’analisi del plurilin-
  tion Between Closely-Related Languages, Pro-
                                                            guismo nella scuola secondaria di primo grado. In
  ceedings of the 50th Annual Meeting of the Associa-
                                                            Coonan, C., Bier, A., Ballarin, E., La didattica del-
  tion of Computational Linguistics (ACL), 2012,
                                                            le lingue nel nuovo milennio. Le sfide
  pp. 301-305.
                                                            dell’internazionalizzazione, Edizioni Ca’Foscari,
Napoles, C., Sakaguchi, K., Tetreault, J.: JFLEG: A         2018, pp. 135-148.
  Fluency Corpus and Benchmark for Grammatical
  Error Corrections, Proceedings of the 15th Confer-
  ence of the European Chapter of the Association
  for Computational Linguistics, vol. 2, Short Papers.
  Association for Computational Linguistics,
  Valencia, Spain, 2017, pp. 229–234.