=Paper=
{{Paper
|id=Vol-1749/paper1
|storemode=property
|title=An Extended Version of the KoKo German L1 Learner Corpus
|pdfUrl=https://ceur-ws.org/Vol-1749/paper1.pdf
|volume=Vol-1749
|authors=Andrea Abel,Aivars Glaznieks,Lionel Nicolas,Egon Stemle
|dblpUrl=https://dblp.org/rec/conf/clic-it/AbelGNS16
}}
==An Extended Version of the KoKo German L1 Learner Corpus==
An extended version of the KoKo German L1 Learner corpus Andrea Abel, Aivars Glaznieks, Lionel Nicolas, Egon Stemle Institute for Specialised Communication and Multilingualism EURAC Research Bolzano/Bozen, Italy andrea.abel@eurac.edu, aivars.glaznieks@eurac.edu lionel.nicolas@eurac.edu, egon.stemle@eurac.edu Abstract 1 Introduction English. This paper describes an ex- tended version of the KoKo corpus (ver- sion KoKo4, Dec 2015), a corpus of The study of linguistically annotated learner cor- written German L1 learner texts from pora has received a growing interest over the past three different German-speaking regions 20 years (Granger et al., 2013). In learner cor- in three different countries. The KoKo cor- pus linguistics, such corpora are usually defined as pus is richly annotated with learner lan- “systematic computerized collections of texts pro- guage features on different linguistic lev- duced by language learners” (Nesselhauf, 2005). els such as errors or other linguistic char- Unlike most learner corpora focusing on L2/FL acteristics that are not deficit-oriented, and learners (i.e. learners learning a foreign language), is enriched with a wide range of metadata. the KoKo corpus focuses on advanced L1 speakers This paper complements a previous publi- that are still learning their mother tongue, which cation (Abel et al., 2014a) and reports on typically happens in educational contexts. new textual metadata and lexical annota- tions and on the methods adopted for their This paper describes an extended version of the manual annotation and linguistic analyses. KoKo corpus (Abel et al., 2014a), a corpus cre- It also briefly introduces some linguistic ated for the purposes of the KoKo project which findings that have been derived from the aims at investigating the writing skills of German- corpus. speaking secondary school pupils. The creation of the corpus was guided by two goals: on the one Italiano. Il contributo descrive una hand to describe writing skills at the end of sec- versione estesa del corpus KoKo (ver- ondary school, on the other hand to consider ex- sione KoKo4, Dic 2015), corpus che rac- ternal socio-linguistic factors (e.g. gender, socio- coglie produzioni scritte di apprendenti di economic background etc.). tedesco L1, provenienti da tre distinte re- gioni germanofone, a loro volta situate in The previous description focused on the data tre diversi paesi. Il corpus KoKo è an- collection, the data processing, the annotation of notato dettagliatamente su differenti livelli orthographic and grammatical features as well as linguistici rilevanti, quali gli errori o al- on aspects regarding annotation quality (Abel et tre caratteristiche linguistiche non diretta- al., 2014a). This paper, however, introduces the mente ricollegabili a deficit individuali, ed new textual metadata and lexical annotations. arricchito da un’ampia gamma di meta- dati. Questo contributo integra una prece- The paper is structured as follows. In section 2, dente pubblicazione (Abel et al., 2014a) è key facts are briefly reported, including references informa sui nuovi metadati testuali e sulle to related work. The new textual metadata and nuove annotazioni lessicali cosi come sui lexical annotations are then described in section 3, metodi adottati per la loro annotazione alongside with the methods adopted for their man- manuale e per le loro analisi linguistiche. ual annotation and linguistic analyses and some Inoltre presenta brevemente alcuni risul- examples of linguistic findings. In section 4, fu- tati ricavati dal corpus. ture works are discussed right before concluding in section 5. 2 Key Information about the Corpus annotated (Thelen, 2010). Some other corpora in- clude L1 data, but as reference for L2/FL learner The KoKo corpus is a collection of 1,503 authen- corpus research (Reznicek et al., 2010; Zinsmeis- tic argumentative essays, and the corresponding ter and Breckle, 2012). survey information about their authors, produced in classrooms under standardized conditions by 3 New Metadata and Annotations learners of 85 classes of 66 schools from three different German-speaking areas: South Tyrol in This section describes the main features of the lat- Italy, North Tyrol in Austria and Thuringia in Ger- est corpus version KoKo4 (Dec. 2015) that have many.1 Such areas are particularly suitable for been added to the version KoKo3 (Dec. 2014). It comparative studies because of differences regard- thus focuses on a new set of textual metadata and ing the German standard varieties, the use of di- a new layer of lexical annotations which is, due to alectal vs. standard varieties and the monolingual the selected features and the degree of granular- vs. plurilingual environments (Abel et al., 2014a). ity, a novelty in (corpus-based) modeling of L1- The corpus is roughly equally distributed over writing competences for German . the three regions and amounts to 824,757 tokens (punctuation excluded). All writers were attending 3.1 Textual Metadata secondary schools one year before their school- In the KoKo corpus, two kinds of Metadata in- leaving examinations. 83% of the pupils were formation are available: (1) non-linguistic, i.e. native speakers of German. The corresponding person-related information provided by each par- L1 part of the corpus amounts to 726,247 to- ticipant via a questionnaire survey in class that is kens. Metadata annotations amount to 52,605 an- available for the whole sample and (2) linguis- notations whereas manual annotations amount to tic, i.e. text-related information provided for a 117,422 annotations. Furthermore, 366 features subsample of the corpus (569 texts, equally dis- to measure linguistic complexity2 (Hancke et al., tributed over the three regions involved) through 2012; Hancke and Meurers, 2013) were automat- an online evaluation form by three different spe- ically calculated per text (550,098 in total) and cially trained raters originating from the different added as metadata. participating regions. Previous evaluation showed high accuracy of While type (1) metadata allow for sociolin- manual transcriptions (> 99%), and automatic to- guistic analyses in order to detect relations be- kenization (> 99%), sentence splitting (> 96%) tween linguistic features (e.g. text length, sentence and POS-tagging (> 96%) (Glaznieks et al., length, orthographic errors, grammatical errors, 2014). etc.) and non-linguistic person-related informa- As it is among the first accessible richly linguis- tion, type (2) metadata constitute a further expan- tically annotated German L1 learner corpora, the sion of our analysis by including textual features KoKo corpus is particularly relevant to L1 learner as well. Text analysis was done holistically us- language researchers, and for the field of didactics ing an evaluation form and detailed guidelines that of German as L1. Other comparable language re- were elaborated on the basis of recent findings in sources are either not accessible (Berg et al., 2010; writing research and text analyses (Brinker, 2010; DESI-Konsortium, 2006; Nussbaumer and Sieber, Feilke, 2010; Augst et al., 2007; Böttcher and 1994), or although accessible, have not been en- Becker-Mrotzek, 2006; Jechle, 1992; Augst and riched with linguistic information (Augst et al., Faigel, 1986) and the curricula in the participating 2007; Fix and Melenk, 2002) or are only partly regions. The text evaluation form distinguishes 1 four categories : (A) formal completeness, (B) We followed the privacy policy for such surveys and re- quested a signed consent from all adult participants and par- content, (C) formal and linguistic means of text ents of minors. In addition, all students participated anony- arrangement and (D) overall impression. mously, no names of the students were collected, names of For category A, 10 questions of the online eval- schools were codified and made anonymous. 2 e.g. syntactic features such as the average length of NPs, uation form focused on the presence of obliga- VPs and PPs as well as their number per sentence, morpho- tory text parts (introduction, main part, closing logical features such as the number of modal verbs per total part) and explicitly requested constituents of ar- number of verbs or the average compound depth of nouns, and lexical features such as lexical diversity described by gumentative essays (opinion of the author, conclu- means of different measures sion). The 25 questions of category B belong to two subcategories: (B1) the topics of the essay (9 Category Sub-category Total questions), (B2) patterns of topic development (16 Single Neol. & occas. 4,670 questions). B1 comprises evaluations on e.g. the words Arg. adv. & conj. 14,345 topics of each text part, gaps, and the overall co- Referential 18,708 herence of the text. B2 refers to the main pattern Phrasemes Communicative 4,824 of topic development (argumentative, etc.), the ar- Structural 2,704 gumentation strategies (point of view, concessive Semantic 8,397 or not), and the motivation of arguments (objec- Particula- Stylistic 236 tive vs. subjective stance, quality of arguments). rities Form 1,923 Formal and linguistic means of text arrangement Metalinguistic 1,412 (category C, 7 questions) focus on the use of para- Target hyp. 4,509 graphs, the explicit announcement of and commit- ment to the function of the essay, and the use of Table 1: Quantitative figures for the 980 docu- linguistic means to structure the text with regards ments annotated with the new lexical annotations. to content. Finally, category D (20 questions) aims for an overall impression and therefore fo- cuses on the completion of the task (successful or (qualitative aspect) (Steinhoff, 2009; Böttcher and not), the overall quality of the text and the over- Becker-Mrotzek, 2006; Mukherjee, 2005; Read all consistency of both the quality and coherence. and Nation, 2004; Read, 2000; Nation, 2001). Of all 62 questions of the entire online evaluation Whereas the analyses of quantitative aspects of form, we used 57 for each document of the sub- lexical knowledge were performed automatically corpus (alltogether 33,972 annotations). by using different measures (e.g. lexical diversity measures such as MTDL and Yule’s K, or lexical The analyses revealed, among other things, that frequency scores based on dlexDB (Hancke and the text quality is classified as quite satisfactory Meurers, 2013)), the analyses of qualitative as- on a 5 point Likert-scale3 . More specifically, there pects were done by means of manual annotations. are significant correlations between text quality We focus hereafter exclusively on the manual an- assessment and other linguistic variables: thus, a notations allowing us to model qualitative aspects lower number of e.g. lexical errors is connected to of lexical knowledge. a higher text quality score4 , and, finally, a variety of group differences could be detected (e.g. con- For annotating lexical features, we developed cerning school type: lower text quality scores a new hierarchically-structured linguistic classifi- within vocational schools compared to general cation scheme inspired by previous work that fo- high schools5 ). cused on L2 learner languages (Abel et al., 2014b; Konecny et al., 2016). The classification scheme 3.2 Lexical Annotations takes both into account occurrences of selected lexical phenomena and defective as well as non- As for the manual annotations of orthographic defective particularities of learner languages con- and grammatical features added to previous cor- sidering two dimensions: (1) the linguistic subcat- pus versions (Abel et al., 2014a), a specifically egory, e.g. collocations and idioms, and (2) a tar- crafted tag set and annotation manual were used get modification classification, e.g. omission, ad- for the annotation of lexical features. 61,728 dition (Dı́az-Negrillo and Domı́nguez, 2006; Abel lexical annotations were manually performed by et al., 2014b). Furthermore, we formulated target trained annotators on a subcorpus of 980 texts, al- hypotheses for those categories that we annotated most equally distributed over the three regions. as defective in order to make the error interpreta- The analyses of lexical features focuses on lex- tion transparent (Lüdeling et al., 2005). The corre- ical knowledge as a central part of lexical com- sponding annotation scheme contains 77 different petence which includes the dimensions of lexi- tags including a set of further attributes. cal breadth (quantitative aspect) and lexical depth In a multi-stage annotation procedure, all oc- 3 percentages: 1 (scarse): 6.2 - 2: 22.9 - 3: 39.0 - 4: 26.3 - currences of phenomena on both single words and 5 (excellent): 5.7) 4 formulaic sequences (FS) were annotated (Wray, Kruskal Wallis H Test: FS errors X2 (1) = 10.417, p = .036, single word errors: ANOVA F(4, 338) = 2.805, p = .026 2005). Annotations for particularities were subse- 5 Kruskal Wallis H Test: X2 (1) = 49.147, p = .000 quently added in order to distinguish between er- rors concerning correctness, errors concerning ap- word formation errors (concerning single word propriateness of usage (Eisenberg, 2007; Schnei- units only14 ), and on omission, choice, position der, 2013), non-defective modifications (to cap- and addition errors as well as creative modifica- ture, for example, creative use of language), and tions (concerning FS). Concerning metalinguistic diasystematic markedness. At the single word markers the appropriateness of the use of quota- level, we considered all out-of-vocabulary tokens tion marks for highlighting units is considered. of the part-of-speech tagger (Schmid, 1994) as An overview of the number of annotations is candidates of neologisms or occasionalisms. In provided in Table 1. addition, we captured a variety of tokens rele- Results of the analyses showed, among oth- vant for the text genre of an argumentative es- ers, that pupils use different types of FS quite say (i.e. argumentative adverbs and conjunctions). frequently, on average 5.12 constructions per At the level of FS, we applied a function-based 100 words: with 62%, non idiomatic referen- approach distinguishing between three main cate- tial phrasemes constitute the major part, fol- gories of phrasemes (Burger, 2007), each of them lowed by idiomatic referential phrasemes (19%), with further subcategories (Abel et al., 2014b; and, finally, structural (10%) and communicative Konecny et al., 2016; Granger and Paquot, 2008; phrasemes (9%). However, lexical errors in gen- Burger, 2007; Stein, 2007; Steinhoff, 2007), as eral affect more often FS than single word units well as a “mixed classification” (Burger, 2007): (10% of the FS vs. 1.04% of the single words). Referential phrasemes include collocations6 The latter are most frequently form errors (5.50% and idioms7 , distinguished among other things of FS affected, especially choice errors: 4.17%). with respect to their degree of idiomaticity. Com- municative phrasemes are subdivided into those 4 Future Work bound to specific situations8 , and those not The KoKo project was completed and presented bound to specific situations9 . Finally, structural to the public in December 2015. We will start phrasemes comprise complex conjunctions and releasing the data via the corpus exploration in- prepositions10 and concessive constructions11 . terface ANNIS3 (Krause and Zeldes, 2016) and For particularities, we considered four main cat- for download on request, after signing a license egories, each with further subcategories: agreement.15 Aside from the aforementioned data, On a semantic dimension a distinction is future versions will also include additional meta- made between denotative errors concerning cor- data information about the authors integrated for rectness or appropriatness of use12 , and connota- the purposes of future socio-linguistic analyses. tive markedness or appropriatness of use13 . The Consensus in the annotations among annotators, stytlistic dimension considers repetition, and re- and as such an indication of its reliability, will dundancy. The form dimension focusses on be evaluated on sub-sets of texts that were anno- tated for this purpose by more than one annotator. 6 further divided into restricted and loose collocations, Three annotators independently annotated the text light verb constructions (called ”Funktionsverbgefüge” in German) as well as special classes such as irreversible bi- level metadata annotations on 27 texts, and six an- and trinominals, similes etc. notators independently annotated the lexical level 7 further divided into nominative idioms and fixed phrases, annotations on the same 27 texts. Inter-annotator and special classes such as irreversible bi- and trinominals etc. agreement will be calculated for annotations and 8 further divided into general routine or speech act formu- segmentation, i.e. the agreement on the decision las, special classes such as commonplaces, slogans, proverbs which word sequence needs to be tagged vs. what etc., and empty formulas 9 further divided into text organising formulas, and inter- annotation needs to be assigned to it, and will be action organising formulas evaluated and reported in the form of Fleiss Multi- 10 further divided into phraseological connectors and syn- k and boundary similarity (Artstein and Poesio, tactically complex connectors, and secondary prepositions 11 further divided into constructions with ”although” and a 2008; Fournier, 2013). correlate of the ”but”-class, and constructions with a modal Finally, thanks to its relatively large size and its word and a correlate of the ”but”-class richly annotated nature, potential additional uses 12 further divided into reference/function, contextual fit- 14 ness, semantic compatibility, and precision distinguishing between errors with respect to derivation 13 and to composition further divided into speaker’s attitude, and diasystem- 15 atic markedness concerning language usage, i.e. diaphasic We have been trying to make the data available for direct markedness, diachronic markedness, diatopic markedness download – but have to take more legal hurdles. of the KoKo corpus in Natural Language Process- Gerhard Augst, Katrin Disselhoff, Alexandra Henrich, ing and Corpus Linguistics are being considered. Thorsten Pohl, and Paul Völzing. 2007. Text- Sorten-Kompetenz. Eine echte Longitudinalstudie Regarding Natural Language Processing, the error zur Entwicklung der Textkompetenz im Grundschul- annotations paired with target hypothesis annota- alter. Peter Lang, Frankfurt. tions allow for creating an aligned corpus. Such corpora can be used to improve machine transla- Margit Berg, Anne Berkemeier, Reinold Funke, Chris- tian Glück, Christiane Hofbauer, and Jordana tion for automatically correcting learner texts (Ng Schneider, editors. 2010. Sprachliche Hetero- et al., 2014). Regarding Corpus Linguistics, ma- genität in der Sprachheil- und der Regelschule. Ab- chine learning methods can be used (e.g. as being schlussbericht im Programm ,,Bildungsforschung” done in WebAnno (Yimam et al., 2014)) to drive der Landesstiftung Baden-Wrttemberg, Germany. linguistic intuitions when performing annotations Ingrid Böttcher and Michael Becker-Mrotzek. 2006. or analyses. Because of the richness of its annota- Schreibkompetenz entwickeln und beurteilen. Cor- tion schemes, the KoKo corpus constitutes a chal- nelsen, Berlin. lenging but at the same time promising dataset to Klaus Brinker. 2010. Linguistische Textanalyse. Eine test if the developed methods are able to uncover Einführung in Grundbegriffe und Methoden. Bear- relevant correlations that have already been inves- beitet von Sandra Ausborn-Brinker, 7., durchgese- hene Auflage, volume 29 of Grundlagen der Ger- tigated, or to uncover even new ones that are worth manistik. Erich Schmidt Verlag, Berlin. considering for future linguistic analyses. Harald Burger. 2007. Phraseologie: Eine Einführung 5 Conclusion am Beispiel des Deutschen, volume 36 of Grundla- gen der Germanistik. Erich Schmidt Verlag, Berlin. This paper described the most recent version of the DESI-Konsortium, editor. 2006. Unterricht und Kom- KoKo corpus, a collection of richly annotated Ger- petenzerwerb in Deutsch und Englisch. Beltz Ver- man L1 learner texts, and focused on the new tex- lag, Weinheim – Bern. tual metadata and lexical annotations. Ana Dı́az-Negrillo and Jesús Fernández Domı́nguez. Because other comparable language resources 2006. Error tagging systems for learner corpora. are either not accessible, or have not been enriched Revista española de lingüı́stica aplicada, (19):83– with linguistic information or are only partly an- 102. notated, the corpus is a valuable resource for re- Peter Eisenberg. 2007. Sprachliches Wissen im search on L1 learner language, in particular for Wörterbuch der Zweifelsfälle. Über die Rekonstruk- the research on writing skills, and for teachers of tion einer Gebrauchsnorm. Aptum. Zeitschrift für Sprachkritik und Sprachkultur, 3(2007):209–228. German as L1, in particular for the teaching of L1 German writing skills. Helmuth Feilke. 2010. Schriftliches Argumen- tieren zwischen Nähe und Distanz am Beispiel wis- senschaftlichen Schreibens. Nähe und Distanz im References Kontext variationslinguistischer Forschung, pages 209–231. Andrea Abel, Aivars Glaznieks, Lionel Nicolas, and Egon Stemle. 2014a. Koko: An L1 learner corpus Martin Fix and Hartmut Melenk. 2002. Schreiben for german. In Proceedings of LREC 2014, pages zu Texten-Schreiben zu Bildimpulsen: das Lud- 2414–2421. wigsburger Aufsatzkorpus; mit 2300 Schülertexten, Befragungsdaten und Bewertungen auf CD-ROM. Andrea Abel, Katrin Wisniewski, Lionel Nicolas, and Schneider-Verlag, Hohengehren. Detmar Meurers. 2014b. A trilingual learner cor- Chris Fournier. 2013. Evaluating Text Segmentation pus illustrating european reference levels. RICOG- using Boundary Edit Distance. In Proceedings of NIZIONI— Rivista di Lingue, Letterature e Culture 51st Annual Meeting of the ACL, pages 1702–1712. Moderne, 1(2):111–126. ACL. Ron Artstein and Massimo Poesio. 2008. Inter-coder Aivars Glaznieks, Lionel Nicolas, Egon Stemle, An- agreement for computational linguistics. Computa- drea Abel, and Verena Lyding. 2014. Establishing tional Linguistics, 34(4):555–596. a standardised procedure for building learner cor- pora. Apples - Journal of Applied Language Studies, Gerhard Augst and Peter Faigel. 1986. Von der Rei- 8(3):5–20. hung zur Gestaltung: Untersuchungen zur Ontoge- nese der schriftsprachlichen Fähigkeiten von 13-23 Sylviane Granger and Magali Paquot. 2008. Disen- Jahren, volume 5 of Theorie und Vermittlung der tangling the phraseological web. Phraseology. An Sprache. Peter Lang, Frankfurt. interdisciplinary perspective, pages 27–50. Sylviane Granger, Gaëtanelle Gilquin, and Fanny Me- John Read and Paul Nation. 2004. Measurement unier. 2013. Twenty Years of Learner Corpus Re- of formulaic sequences. In Norbert Schmitt, edi- search. Looking Back, Moving Ahead: Proceedings tor, Formulaic sequences: Acquisition, processing of the First Learner Corpus Research Conference and use, Language Learning & Language Teaching, (LCR 2011). pages 23–35. John Benjamins Publishing, Amster- dam. Julia Hancke and Detmar Meurers. 2013. Exploring CEFR classification for German based on rich lin- John Read. 2000. Assessing vocabulary. Cambridge guistic modeling. In Proceedings of the Learner University Press. Corpus Research Conference (LCR 2013), pages 54–56. Marc Reznicek, Maik Walter, Karin Schmidt, Anke Lüdeling, Hagen Hirschmann, Cedric Krummes, Julia Hancke, Sowmya Vajjala, and Detmar Meurers. and Torsten Andreas. 2010. Das Falko-Handbuch. 2012. Readability classification for german using Korpusaufbau und Annotationen. Technical re- lexical, syntactic, and morphological features. In port, Institut für deutsche Sprache und Linguistik, Martin Kay and Christian Boitet, editors, Proceed- Humboldt-Universität zu Berlin. ings of COLING 2012, pages 1063–1080, Mumbai. Helmut Schmid. 1994. Probabilistic part-of-speech Thomas Jechle. 1992. Kommunikatives Schreiben: tagging using decision trees. In International Con- Prozess und Entwicklung aus der Sicht kogni- ference on New Methods in Language Processing, tiver Schreibforschung, volume 41 of ScriptOralia. pages 44–49, Manchester, UK. Gunter Narr Verlag, Tübingen. Jan Georg Schneider. 2013. Sprachliche ,Fehler’ aus Christine Konecny, Andrea Abel, Erica Autelli, and sprachwissenschaftlicher Sicht. In Sprachreport, Lorenzo Zanasi. 2016. Identification and Classi- volume 1-2/2013, pages 30–37. Institut fr Deutsche fication of Phrasemes in an L2 Learner Corpus of Sprache, Mannheim. Italian. In Gloria Corpas Pastor, editor, Comput- Stephan Stein. 2007. Mündlichkeit und Schriftlichkeit erised and Corpus-based Approaches to Phraseol- aus phraseologischer Perspektive. In Harald Burger, ogy, pages 533–542. Editions Tradulex, Geneva. Dmitrij Dobrovolskij, Peter Kühn, and Neal R. Thomas Krause and Amir Zeldes. 2016. ANNIS3: A Norrick, editors, Phraseologie. Ein internationales New Architecture for Generic Corpus Query and Vi- Handbuch zeitgenössischer Forschung, volume 1, sualization. Digital Scholarship in the Humanities, pages 220–236. de Gruyter, Berlin – New York. 31(1):118–139. Torsten Steinhoff. 2007. Wissenschaftliche Textkom- petenz: Sprachgebrauch und Schreibentwicklung in Anke Lüdeling, Maik Walter, Emil Kroymann, and Pe- wissenschaftlichen Texten von Studenten und Ex- ter Adolphs. 2005. Multi-level error annotation in perten, volume 280 of Reihe Germanistische Lin- learner corpora. In Proceedings of Corpus Linguis- guistik. de Gruyter, Berlin – New York. tics 2005, pages 15–17. Torsten Steinhoff. 2009. Wortschatz–eine Schalt- Joybrato Mukherjee. 2005. The native speaker is alive stelle für den schulischen Spracherwerb?, volume and kicking: Linguistic and language-pedagogical 17/2009 of SPASS. Universität Siegen, FB 3. perspectives. Anglistik, 16(2):7–23. Tobias Thelen. 2010. Automatische Analyse or- I.S.P. Nation. 2001. Learning Vocabulary in Another thographischer Leistungen von Schreibanfängern. Language. Foreign Language Study. Cambridge Ph.D. thesis, University of Osnabrück. University Press. Alison Wray. 2005. Formulaic language and the lexi- Nadja Nesselhauf. 2005. Collocations in a Learner con. Cambridge University Press. Corpus, volume 14 of Studies in Corpus Linguistics. John Benjamins Publishing, Amsterdam. Seid Muhie Yimam, Richard Eckart de Castilho, Iryna Gurevych, and Chris Biemann. 2014. Auto- Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian matic Annotation Suggestions and Custom Annota- Hadiwinoto, Raymond Hendy Susanto, and Christo- tion Layers in WebAnno. In Kalina Bontcheva and pher Bryant. 2014. The CoNLL-2014 Shared Task Zhu Jingbo, editors, Proceedings of the 52nd Annual on Grammatical Error Correction. In Proceedings of Meeting of the ACL. System Demonstrations, pages the Eighteenth Conference on Computational Natu- 91–96. ACL, jun. ral Language Learning: Shared Task, pages 1–14, Baltimore, Maryland. ACL. Heike Zinsmeister and Margit Breckle. 2012. The alesko learner corpus: design–annotation– Markus Nussbaumer and Peter Sieber. 1994. Texte quantitative analyses. Multilingual Corpora and analysieren mit dem Zürcher Textanalyseraster. In Multilingual Corpus Analysis. Amsterdam: John Peter Sieber, editor, Sprachfähigkeiten–Besser als Benjamins, pages 71–96. ihr Ruf und nötiger den je!, pages 141–186. Verlag Sauerländer, Aarau.