=Paper=
{{Paper
|id=Vol-1749/paper1
|storemode=property
|title=An Extended Version of the KoKo German L1 Learner Corpus
|pdfUrl=https://ceur-ws.org/Vol-1749/paper1.pdf
|volume=Vol-1749
|authors=Andrea Abel,Aivars Glaznieks,Lionel Nicolas,Egon Stemle
|dblpUrl=https://dblp.org/rec/conf/clic-it/AbelGNS16
}}
==An Extended Version of the KoKo German L1 Learner Corpus==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper1.pdf</pdf>
<pre>
     An extended version of the KoKo German L1 Learner corpus
           Andrea Abel, Aivars Glaznieks, Lionel Nicolas, Egon Stemle
            Institute for Specialised Communication and Multilingualism
                                   EURAC Research
                                  Bolzano/Bozen, Italy
        andrea.abel@eurac.edu, aivars.glaznieks@eurac.edu
         lionel.nicolas@eurac.edu, egon.stemle@eurac.edu


                 Abstract                        1   Introduction
English. This paper describes an ex-
tended version of the KoKo corpus (ver-
sion KoKo4, Dec 2015), a corpus of               The study of linguistically annotated learner cor-
written German L1 learner texts from             pora has received a growing interest over the past
three different German-speaking regions          20 years (Granger et al., 2013). In learner cor-
in three different countries. The KoKo cor-      pus linguistics, such corpora are usually defined as
pus is richly annotated with learner lan-        “systematic computerized collections of texts pro-
guage features on different linguistic lev-      duced by language learners” (Nesselhauf, 2005).
els such as errors or other linguistic char-     Unlike most learner corpora focusing on L2/FL
acteristics that are not deficit-oriented, and   learners (i.e. learners learning a foreign language),
is enriched with a wide range of metadata.       the KoKo corpus focuses on advanced L1 speakers
This paper complements a previous publi-         that are still learning their mother tongue, which
cation (Abel et al., 2014a) and reports on       typically happens in educational contexts.
new textual metadata and lexical annota-
tions and on the methods adopted for their          This paper describes an extended version of the
manual annotation and linguistic analyses.       KoKo corpus (Abel et al., 2014a), a corpus cre-
It also briefly introduces some linguistic       ated for the purposes of the KoKo project which
findings that have been derived from the         aims at investigating the writing skills of German-
corpus.                                          speaking secondary school pupils. The creation of
                                                 the corpus was guided by two goals: on the one
Italiano.      Il contributo descrive una        hand to describe writing skills at the end of sec-
versione estesa del corpus KoKo (ver-            ondary school, on the other hand to consider ex-
sione KoKo4, Dic 2015), corpus che rac-          ternal socio-linguistic factors (e.g. gender, socio-
coglie produzioni scritte di apprendenti di      economic background etc.).
tedesco L1, provenienti da tre distinte re-
gioni germanofone, a loro volta situate in          The previous description focused on the data
tre diversi paesi. Il corpus KoKo è an-         collection, the data processing, the annotation of
notato dettagliatamente su differenti livelli    orthographic and grammatical features as well as
linguistici rilevanti, quali gli errori o al-    on aspects regarding annotation quality (Abel et
tre caratteristiche linguistiche non diretta-    al., 2014a). This paper, however, introduces the
mente ricollegabili a deficit individuali, ed    new textual metadata and lexical annotations.
arricchito da un’ampia gamma di meta-
dati. Questo contributo integra una prece-          The paper is structured as follows. In section 2,
dente pubblicazione (Abel et al., 2014a) è      key facts are briefly reported, including references
informa sui nuovi metadati testuali e sulle      to related work. The new textual metadata and
nuove annotazioni lessicali cosi come sui        lexical annotations are then described in section 3,
metodi adottati per la loro annotazione          alongside with the methods adopted for their man-
manuale e per le loro analisi linguistiche.      ual annotation and linguistic analyses and some
Inoltre presenta brevemente alcuni risul-        examples of linguistic findings. In section 4, fu-
tati ricavati dal corpus.                        ture works are discussed right before concluding
                                                 in section 5.
2       Key Information about the Corpus                          annotated (Thelen, 2010). Some other corpora in-
                                                                  clude L1 data, but as reference for L2/FL learner
The KoKo corpus is a collection of 1,503 authen-                  corpus research (Reznicek et al., 2010; Zinsmeis-
tic argumentative essays, and the corresponding                   ter and Breckle, 2012).
survey information about their authors, produced
in classrooms under standardized conditions by                    3     New Metadata and Annotations
learners of 85 classes of 66 schools from three
different German-speaking areas: South Tyrol in                   This section describes the main features of the lat-
Italy, North Tyrol in Austria and Thuringia in Ger-               est corpus version KoKo4 (Dec. 2015) that have
many.1 Such areas are particularly suitable for                   been added to the version KoKo3 (Dec. 2014). It
comparative studies because of differences regard-                thus focuses on a new set of textual metadata and
ing the German standard varieties, the use of di-                 a new layer of lexical annotations which is, due to
alectal vs. standard varieties and the monolingual                the selected features and the degree of granular-
vs. plurilingual environments (Abel et al., 2014a).               ity, a novelty in (corpus-based) modeling of L1-
   The corpus is roughly equally distributed over                 writing competences for German .
the three regions and amounts to 824,757 tokens
(punctuation excluded). All writers were attending                3.1    Textual Metadata
secondary schools one year before their school-                   In the KoKo corpus, two kinds of Metadata in-
leaving examinations. 83% of the pupils were                      formation are available: (1) non-linguistic, i.e.
native speakers of German. The corresponding                      person-related information provided by each par-
L1 part of the corpus amounts to 726,247 to-                      ticipant via a questionnaire survey in class that is
kens. Metadata annotations amount to 52,605 an-                   available for the whole sample and (2) linguis-
notations whereas manual annotations amount to                    tic, i.e. text-related information provided for a
117,422 annotations. Furthermore, 366 features                    subsample of the corpus (569 texts, equally dis-
to measure linguistic complexity2 (Hancke et al.,                 tributed over the three regions involved) through
2012; Hancke and Meurers, 2013) were automat-                     an online evaluation form by three different spe-
ically calculated per text (550,098 in total) and                 cially trained raters originating from the different
added as metadata.                                                participating regions.
   Previous evaluation showed high accuracy of                       While type (1) metadata allow for sociolin-
manual transcriptions (> 99%), and automatic to-                  guistic analyses in order to detect relations be-
kenization (> 99%), sentence splitting (> 96%)                    tween linguistic features (e.g. text length, sentence
and POS-tagging (> 96%) (Glaznieks et al.,                        length, orthographic errors, grammatical errors,
2014).                                                            etc.) and non-linguistic person-related informa-
   As it is among the first accessible richly linguis-            tion, type (2) metadata constitute a further expan-
tically annotated German L1 learner corpora, the                  sion of our analysis by including textual features
KoKo corpus is particularly relevant to L1 learner                as well. Text analysis was done holistically us-
language researchers, and for the field of didactics              ing an evaluation form and detailed guidelines that
of German as L1. Other comparable language re-                    were elaborated on the basis of recent findings in
sources are either not accessible (Berg et al., 2010;             writing research and text analyses (Brinker, 2010;
DESI-Konsortium, 2006; Nussbaumer and Sieber,                     Feilke, 2010; Augst et al., 2007; Böttcher and
1994), or although accessible, have not been en-                  Becker-Mrotzek, 2006; Jechle, 1992; Augst and
riched with linguistic information (Augst et al.,                 Faigel, 1986) and the curricula in the participating
2007; Fix and Melenk, 2002) or are only partly                    regions. The text evaluation form distinguishes
    1
                                                                  four categories : (A) formal completeness, (B)
     We followed the privacy policy for such surveys and re-
quested a signed consent from all adult participants and par-
                                                                  content, (C) formal and linguistic means of text
ents of minors. In addition, all students participated anony-     arrangement and (D) overall impression.
mously, no names of the students were collected, names of            For category A, 10 questions of the online eval-
schools were codified and made anonymous.
   2
     e.g. syntactic features such as the average length of NPs,   uation form focused on the presence of obliga-
VPs and PPs as well as their number per sentence, morpho-         tory text parts (introduction, main part, closing
logical features such as the number of modal verbs per total      part) and explicitly requested constituents of ar-
number of verbs or the average compound depth of nouns,
and lexical features such as lexical diversity described by       gumentative essays (opinion of the author, conclu-
means of different measures                                       sion). The 25 questions of category B belong to
two subcategories: (B1) the topics of the essay (9                        Category      Sub-category           Total
questions), (B2) patterns of topic development (16                         Single       Neol. & occas.         4,670
questions). B1 comprises evaluations on e.g. the                           words        Arg. adv. & conj.     14,345
topics of each text part, gaps, and the overall co-                                     Referential           18,708
herence of the text. B2 refers to the main pattern                       Phrasemes      Communicative          4,824
of topic development (argumentative, etc.), the ar-                                     Structural             2,704
gumentation strategies (point of view, concessive                                       Semantic               8,397
or not), and the motivation of arguments (objec-                          Particula-    Stylistic                236
tive vs. subjective stance, quality of arguments).                          rities      Form                   1,923
Formal and linguistic means of text arrangement                                         Metalinguistic         1,412
(category C, 7 questions) focus on the use of para-                      Target hyp.                           4,509
graphs, the explicit announcement of and commit-
ment to the function of the essay, and the use of                    Table 1: Quantitative figures for the 980 docu-
linguistic means to structure the text with regards                  ments annotated with the new lexical annotations.
to content. Finally, category D (20 questions)
aims for an overall impression and therefore fo-
cuses on the completion of the task (successful or                   (qualitative aspect) (Steinhoff, 2009; Böttcher and
not), the overall quality of the text and the over-                  Becker-Mrotzek, 2006; Mukherjee, 2005; Read
all consistency of both the quality and coherence.                   and Nation, 2004; Read, 2000; Nation, 2001).
Of all 62 questions of the entire online evaluation                  Whereas the analyses of quantitative aspects of
form, we used 57 for each document of the sub-                       lexical knowledge were performed automatically
corpus (alltogether 33,972 annotations).                             by using different measures (e.g. lexical diversity
                                                                     measures such as MTDL and Yule’s K, or lexical
   The analyses revealed, among other things, that
                                                                     frequency scores based on dlexDB (Hancke and
the text quality is classified as quite satisfactory
                                                                     Meurers, 2013)), the analyses of qualitative as-
on a 5 point Likert-scale3 . More specifically, there
                                                                     pects were done by means of manual annotations.
are significant correlations between text quality
                                                                     We focus hereafter exclusively on the manual an-
assessment and other linguistic variables: thus, a
                                                                     notations allowing us to model qualitative aspects
lower number of e.g. lexical errors is connected to
                                                                     of lexical knowledge.
a higher text quality score4 , and, finally, a variety
of group differences could be detected (e.g. con-                       For annotating lexical features, we developed
cerning school type: lower text quality scores                       a new hierarchically-structured linguistic classifi-
within vocational schools compared to general                        cation scheme inspired by previous work that fo-
high schools5 ).                                                     cused on L2 learner languages (Abel et al., 2014b;
                                                                     Konecny et al., 2016). The classification scheme
3.2    Lexical Annotations                                           takes both into account occurrences of selected
                                                                     lexical phenomena and defective as well as non-
As for the manual annotations of orthographic
                                                                     defective particularities of learner languages con-
and grammatical features added to previous cor-
                                                                     sidering two dimensions: (1) the linguistic subcat-
pus versions (Abel et al., 2014a), a specifically
                                                                     egory, e.g. collocations and idioms, and (2) a tar-
crafted tag set and annotation manual were used
                                                                     get modification classification, e.g. omission, ad-
for the annotation of lexical features. 61,728
                                                                     dition (Dı́az-Negrillo and Domı́nguez, 2006; Abel
lexical annotations were manually performed by
                                                                     et al., 2014b). Furthermore, we formulated target
trained annotators on a subcorpus of 980 texts, al-
                                                                     hypotheses for those categories that we annotated
most equally distributed over the three regions.
                                                                     as defective in order to make the error interpreta-
   The analyses of lexical features focuses on lex-
                                                                     tion transparent (Lüdeling et al., 2005). The corre-
ical knowledge as a central part of lexical com-
                                                                     sponding annotation scheme contains 77 different
petence which includes the dimensions of lexi-
                                                                     tags including a set of further attributes.
cal breadth (quantitative aspect) and lexical depth
                                                                        In a multi-stage annotation procedure, all oc-
    3
      percentages: 1 (scarse): 6.2 - 2: 22.9 - 3: 39.0 - 4: 26.3 -   currences of phenomena on both single words and
5 (excellent): 5.7)
    4                                                                formulaic sequences (FS) were annotated (Wray,
      Kruskal Wallis H Test: FS errors X2 (1) = 10.417, p =
.036, single word errors: ANOVA F(4, 338) = 2.805, p = .026          2005). Annotations for particularities were subse-
    5
      Kruskal Wallis H Test: X2 (1) = 49.147, p = .000               quently added in order to distinguish between er-
rors concerning correctness, errors concerning ap-                word formation errors (concerning single word
propriateness of usage (Eisenberg, 2007; Schnei-                  units only14 ), and on omission, choice, position
der, 2013), non-defective modifications (to cap-                  and addition errors as well as creative modifica-
ture, for example, creative use of language), and                 tions (concerning FS). Concerning metalinguistic
diasystematic markedness. At the single word                      markers the appropriateness of the use of quota-
level, we considered all out-of-vocabulary tokens                 tion marks for highlighting units is considered.
of the part-of-speech tagger (Schmid, 1994) as                       An overview of the number of annotations is
candidates of neologisms or occasionalisms. In                    provided in Table 1.
addition, we captured a variety of tokens rele-                      Results of the analyses showed, among oth-
vant for the text genre of an argumentative es-                   ers, that pupils use different types of FS quite
say (i.e. argumentative adverbs and conjunctions).                frequently, on average 5.12 constructions per
At the level of FS, we applied a function-based                   100 words: with 62%, non idiomatic referen-
approach distinguishing between three main cate-                  tial phrasemes constitute the major part, fol-
gories of phrasemes (Burger, 2007), each of them                  lowed by idiomatic referential phrasemes (19%),
with further subcategories (Abel et al., 2014b;                   and, finally, structural (10%) and communicative
Konecny et al., 2016; Granger and Paquot, 2008;                   phrasemes (9%). However, lexical errors in gen-
Burger, 2007; Stein, 2007; Steinhoff, 2007), as                   eral affect more often FS than single word units
well as a “mixed classification” (Burger, 2007):                  (10% of the FS vs. 1.04% of the single words).
   Referential phrasemes include collocations6                    The latter are most frequently form errors (5.50%
and idioms7 , distinguished among other things                    of FS affected, especially choice errors: 4.17%).
with respect to their degree of idiomaticity. Com-
municative phrasemes are subdivided into those                    4    Future Work
bound to specific situations8 , and those not                     The KoKo project was completed and presented
bound to specific situations9 . Finally, structural               to the public in December 2015. We will start
phrasemes comprise complex conjunctions and                       releasing the data via the corpus exploration in-
prepositions10 and concessive constructions11 .                   terface ANNIS3 (Krause and Zeldes, 2016) and
   For particularities, we considered four main cat-              for download on request, after signing a license
egories, each with further subcategories:                         agreement.15 Aside from the aforementioned data,
   On a semantic dimension a distinction is                       future versions will also include additional meta-
made between denotative errors concerning cor-                    data information about the authors integrated for
rectness or appropriatness of use12 , and connota-                the purposes of future socio-linguistic analyses.
tive markedness or appropriatness of use13 . The                     Consensus in the annotations among annotators,
stytlistic dimension considers repetition, and re-                and as such an indication of its reliability, will
dundancy. The form dimension focusses on                          be evaluated on sub-sets of texts that were anno-
                                                                  tated for this purpose by more than one annotator.
   6
      further divided into restricted and loose collocations,     Three annotators independently annotated the text
light verb constructions (called ”Funktionsverbgefüge” in
German) as well as special classes such as irreversible bi-       level metadata annotations on 27 texts, and six an-
and trinominals, similes etc.                                     notators independently annotated the lexical level
    7
      further divided into nominative idioms and fixed phrases,   annotations on the same 27 texts. Inter-annotator
and special classes such as irreversible bi- and trinominals
etc.                                                              agreement will be calculated for annotations and
    8
      further divided into general routine or speech act formu-   segmentation, i.e. the agreement on the decision
las, special classes such as commonplaces, slogans, proverbs      which word sequence needs to be tagged vs. what
etc., and empty formulas
    9
      further divided into text organising formulas, and inter-   annotation needs to be assigned to it, and will be
action organising formulas                                        evaluated and reported in the form of Fleiss Multi-
   10
      further divided into phraseological connectors and syn-     k and boundary similarity (Artstein and Poesio,
tactically complex connectors, and secondary prepositions
   11
      further divided into constructions with ”although” and a
                                                                  2008; Fournier, 2013).
correlate of the ”but”-class, and constructions with a modal         Finally, thanks to its relatively large size and its
word and a correlate of the ”but”-class                           richly annotated nature, potential additional uses
   12
      further divided into reference/function, contextual fit-
                                                                    14
ness, semantic compatibility, and precision                            distinguishing between errors with respect to derivation
   13                                                             and to composition
      further divided into speaker’s attitude, and diasystem-
                                                                    15
atic markedness concerning language usage, i.e. diaphasic              We have been trying to make the data available for direct
markedness, diachronic markedness, diatopic markedness            download – but have to take more legal hurdles.
of the KoKo corpus in Natural Language Process-         Gerhard Augst, Katrin Disselhoff, Alexandra Henrich,
ing and Corpus Linguistics are being considered.          Thorsten Pohl, and Paul Völzing. 2007. Text-
                                                          Sorten-Kompetenz. Eine echte Longitudinalstudie
Regarding Natural Language Processing, the error
                                                          zur Entwicklung der Textkompetenz im Grundschul-
annotations paired with target hypothesis annota-         alter. Peter Lang, Frankfurt.
tions allow for creating an aligned corpus. Such
corpora can be used to improve machine transla-         Margit Berg, Anne Berkemeier, Reinold Funke, Chris-
                                                         tian Glück, Christiane Hofbauer, and Jordana
tion for automatically correcting learner texts (Ng      Schneider, editors. 2010. Sprachliche Hetero-
et al., 2014). Regarding Corpus Linguistics, ma-         genität in der Sprachheil- und der Regelschule. Ab-
chine learning methods can be used (e.g. as being        schlussbericht im Programm ,,Bildungsforschung”
done in WebAnno (Yimam et al., 2014)) to drive           der Landesstiftung Baden-Wrttemberg, Germany.
linguistic intuitions when performing annotations       Ingrid Böttcher and Michael Becker-Mrotzek. 2006.
or analyses. Because of the richness of its annota-        Schreibkompetenz entwickeln und beurteilen. Cor-
tion schemes, the KoKo corpus constitutes a chal-          nelsen, Berlin.
lenging but at the same time promising dataset to       Klaus Brinker. 2010. Linguistische Textanalyse. Eine
test if the developed methods are able to uncover         Einführung in Grundbegriffe und Methoden. Bear-
relevant correlations that have already been inves-       beitet von Sandra Ausborn-Brinker, 7., durchgese-
                                                          hene Auflage, volume 29 of Grundlagen der Ger-
tigated, or to uncover even new ones that are worth       manistik. Erich Schmidt Verlag, Berlin.
considering for future linguistic analyses.
                                                        Harald Burger. 2007. Phraseologie: Eine Einführung
5   Conclusion                                            am Beispiel des Deutschen, volume 36 of Grundla-
                                                          gen der Germanistik. Erich Schmidt Verlag, Berlin.
This paper described the most recent version of the     DESI-Konsortium, editor. 2006. Unterricht und Kom-
KoKo corpus, a collection of richly annotated Ger-        petenzerwerb in Deutsch und Englisch. Beltz Ver-
man L1 learner texts, and focused on the new tex-         lag, Weinheim – Bern.
tual metadata and lexical annotations.                  Ana Dı́az-Negrillo and Jesús Fernández Domı́nguez.
   Because other comparable language resources            2006. Error tagging systems for learner corpora.
are either not accessible, or have not been enriched      Revista española de lingüı́stica aplicada, (19):83–
with linguistic information or are only partly an-        102.
notated, the corpus is a valuable resource for re-      Peter Eisenberg. 2007. Sprachliches Wissen im
search on L1 learner language, in particular for          Wörterbuch der Zweifelsfälle. Über die Rekonstruk-
the research on writing skills, and for teachers of       tion einer Gebrauchsnorm. Aptum. Zeitschrift für
                                                          Sprachkritik und Sprachkultur, 3(2007):209–228.
German as L1, in particular for the teaching of L1
German writing skills.                                  Helmuth Feilke.    2010.     Schriftliches Argumen-
                                                          tieren zwischen Nähe und Distanz am Beispiel wis-
                                                          senschaftlichen Schreibens. Nähe und Distanz im
References                                                Kontext variationslinguistischer Forschung, pages
                                                          209–231.
Andrea Abel, Aivars Glaznieks, Lionel Nicolas, and
  Egon Stemle. 2014a. Koko: An L1 learner corpus        Martin Fix and Hartmut Melenk. 2002. Schreiben
  for german. In Proceedings of LREC 2014, pages         zu Texten-Schreiben zu Bildimpulsen: das Lud-
  2414–2421.                                             wigsburger Aufsatzkorpus; mit 2300 Schülertexten,
                                                         Befragungsdaten und Bewertungen auf CD-ROM.
Andrea Abel, Katrin Wisniewski, Lionel Nicolas, and      Schneider-Verlag, Hohengehren.
  Detmar Meurers. 2014b. A trilingual learner cor-      Chris Fournier. 2013. Evaluating Text Segmentation
  pus illustrating european reference levels. RICOG-      using Boundary Edit Distance. In Proceedings of
  NIZIONI— Rivista di Lingue, Letterature e Culture       51st Annual Meeting of the ACL, pages 1702–1712.
  Moderne, 1(2):111–126.                                  ACL.
Ron Artstein and Massimo Poesio. 2008. Inter-coder      Aivars Glaznieks, Lionel Nicolas, Egon Stemle, An-
  agreement for computational linguistics. Computa-       drea Abel, and Verena Lyding. 2014. Establishing
  tional Linguistics, 34(4):555–596.                      a standardised procedure for building learner cor-
                                                          pora. Apples - Journal of Applied Language Studies,
Gerhard Augst and Peter Faigel. 1986. Von der Rei-        8(3):5–20.
  hung zur Gestaltung: Untersuchungen zur Ontoge-
  nese der schriftsprachlichen Fähigkeiten von 13-23   Sylviane Granger and Magali Paquot. 2008. Disen-
  Jahren, volume 5 of Theorie und Vermittlung der         tangling the phraseological web. Phraseology. An
  Sprache. Peter Lang, Frankfurt.                         interdisciplinary perspective, pages 27–50.
Sylviane Granger, Gaëtanelle Gilquin, and Fanny Me-    John Read and Paul Nation. 2004. Measurement
  unier. 2013. Twenty Years of Learner Corpus Re-         of formulaic sequences. In Norbert Schmitt, edi-
  search. Looking Back, Moving Ahead: Proceedings         tor, Formulaic sequences: Acquisition, processing
  of the First Learner Corpus Research Conference         and use, Language Learning & Language Teaching,
  (LCR 2011).                                             pages 23–35. John Benjamins Publishing, Amster-
                                                          dam.
Julia Hancke and Detmar Meurers. 2013. Exploring
   CEFR classification for German based on rich lin-    John Read. 2000. Assessing vocabulary. Cambridge
   guistic modeling. In Proceedings of the Learner        University Press.
   Corpus Research Conference (LCR 2013), pages
   54–56.                                               Marc Reznicek, Maik Walter, Karin Schmidt, Anke
                                                         Lüdeling, Hagen Hirschmann, Cedric Krummes,
Julia Hancke, Sowmya Vajjala, and Detmar Meurers.        and Torsten Andreas. 2010. Das Falko-Handbuch.
   2012. Readability classification for german using     Korpusaufbau und Annotationen. Technical re-
   lexical, syntactic, and morphological features. In    port, Institut für deutsche Sprache und Linguistik,
   Martin Kay and Christian Boitet, editors, Proceed-    Humboldt-Universität zu Berlin.
   ings of COLING 2012, pages 1063–1080, Mumbai.
                                                        Helmut Schmid. 1994. Probabilistic part-of-speech
Thomas Jechle. 1992. Kommunikatives Schreiben:            tagging using decision trees. In International Con-
  Prozess und Entwicklung aus der Sicht kogni-            ference on New Methods in Language Processing,
  tiver Schreibforschung, volume 41 of ScriptOralia.      pages 44–49, Manchester, UK.
  Gunter Narr Verlag, Tübingen.
                                                        Jan Georg Schneider. 2013. Sprachliche ,Fehler’ aus
Christine Konecny, Andrea Abel, Erica Autelli, and         sprachwissenschaftlicher Sicht. In Sprachreport,
  Lorenzo Zanasi. 2016. Identification and Classi-         volume 1-2/2013, pages 30–37. Institut fr Deutsche
  fication of Phrasemes in an L2 Learner Corpus of         Sprache, Mannheim.
  Italian. In Gloria Corpas Pastor, editor, Comput-
                                                        Stephan Stein. 2007. Mündlichkeit und Schriftlichkeit
  erised and Corpus-based Approaches to Phraseol-
                                                           aus phraseologischer Perspektive. In Harald Burger,
  ogy, pages 533–542. Editions Tradulex, Geneva.
                                                           Dmitrij Dobrovolskij, Peter Kühn, and Neal R.
Thomas Krause and Amir Zeldes. 2016. ANNIS3: A             Norrick, editors, Phraseologie. Ein internationales
  New Architecture for Generic Corpus Query and Vi-        Handbuch zeitgenössischer Forschung, volume 1,
  sualization. Digital Scholarship in the Humanities,      pages 220–236. de Gruyter, Berlin – New York.
  31(1):118–139.                                        Torsten Steinhoff. 2007. Wissenschaftliche Textkom-
                                                          petenz: Sprachgebrauch und Schreibentwicklung in
Anke Lüdeling, Maik Walter, Emil Kroymann, and Pe-
                                                          wissenschaftlichen Texten von Studenten und Ex-
  ter Adolphs. 2005. Multi-level error annotation in
                                                          perten, volume 280 of Reihe Germanistische Lin-
  learner corpora. In Proceedings of Corpus Linguis-
                                                          guistik. de Gruyter, Berlin – New York.
  tics 2005, pages 15–17.
                                                        Torsten Steinhoff. 2009. Wortschatz–eine Schalt-
Joybrato Mukherjee. 2005. The native speaker is alive     stelle für den schulischen Spracherwerb?, volume
  and kicking: Linguistic and language-pedagogical        17/2009 of SPASS. Universität Siegen, FB 3.
  perspectives. Anglistik, 16(2):7–23.
                                                        Tobias Thelen. 2010. Automatische Analyse or-
I.S.P. Nation. 2001. Learning Vocabulary in Another       thographischer Leistungen von Schreibanfängern.
   Language. Foreign Language Study. Cambridge            Ph.D. thesis, University of Osnabrück.
   University Press.
                                                        Alison Wray. 2005. Formulaic language and the lexi-
Nadja Nesselhauf. 2005. Collocations in a Learner         con. Cambridge University Press.
  Corpus, volume 14 of Studies in Corpus Linguistics.
  John Benjamins Publishing, Amsterdam.                 Seid Muhie Yimam, Richard Eckart de Castilho, Iryna
                                                          Gurevych, and Chris Biemann. 2014. Auto-
Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian          matic Annotation Suggestions and Custom Annota-
  Hadiwinoto, Raymond Hendy Susanto, and Christo-         tion Layers in WebAnno. In Kalina Bontcheva and
  pher Bryant. 2014. The CoNLL-2014 Shared Task           Zhu Jingbo, editors, Proceedings of the 52nd Annual
  on Grammatical Error Correction. In Proceedings of      Meeting of the ACL. System Demonstrations, pages
  the Eighteenth Conference on Computational Natu-        91–96. ACL, jun.
  ral Language Learning: Shared Task, pages 1–14,
  Baltimore, Maryland. ACL.                             Heike Zinsmeister and Margit Breckle.       2012.
                                                          The alesko learner corpus: design–annotation–
Markus Nussbaumer and Peter Sieber. 1994. Texte           quantitative analyses. Multilingual Corpora and
 analysieren mit dem Zürcher Textanalyseraster. In       Multilingual Corpus Analysis. Amsterdam: John
 Peter Sieber, editor, Sprachfähigkeiten–Besser als      Benjamins, pages 71–96.
 ihr Ruf und nötiger den je!, pages 141–186. Verlag
 Sauerländer, Aarau.

</pre>