Practice Report: A blended learning approach to teaching NLP for a DH public Gertrud Faaß Ulrich Heid University of Hildesheim University of Hildesheim Institute for Information Science Institute for Information Science and Natural Language Processing and Natural Language Processing gertrud.faass / ulrich.heid@uni-hildesheim.de Abstract relates more closely to NLP. The ICT students usu- ally plan to work as professional translators, some This paper reports about current practice specialize in the translation of technical texts. We in a staged approach to the introduction find our BA students being less informed about for- of NLP principles and techniques for stu- mal modelling, which we however consider a key dents of information science (IIM) and of issue in introducing NLP methods and applications. international communication and transla- Therefore, we decided to deal with this challenge tion (ICT) as part of their curricula. As in several didactic steps: General introduction, for- most of these students are rather not famil- malisation, and application. iar with computer science or, in the case Attempting the Blended Learning approach, we of IIM students, linguistics, we see them provide a mix of didactic devices: Face-to-face as comparable with students of the humani- (FtF) oral presentations (lectures and tutorials, ties. We follow a blended learning strategy some accompanied by slides) are complemented by with lectures, online materials, tutorials, written scripts and background literature provided and screencasts. In the first two terms, we on an online platform (moodle), and - since summer focus on linguistics and its formalisation, 2017 - downloadable mp4 files containing audio NLP tools and applications are then intro- recordings describing additional slides. Attending duced from the third term on. The lectures the lectures and tutorials cannot be made manda- are combined with tutorials and - since the tory for the students, however, we intensively en- summer term 2017 - with a set of screen- courage a regular attendance in the FtF sessions, casts. following the experiences of Dickson and Stephens (2016, p. 7) finding that “the most significant pre- 1 Background dictor of mark [. . . ] was lecture attendance”. The Natural Language Processing (NLP) BA We have not found a similar programme at a course programme at the University of Hildesheim German university, however, as (Neumeier, 2005, is part of the curriculum of students of International p. 164) rightly states, one has to decide for a com- Information Management (henceforth IIM, 1st year bination of modes given the particular conditions (2nd term)) and of the BA in International Com- at hand. munication and Translation (ICT, 2nd year (1st This practice report begins in section 2 with a term)). Additionally, students of other fields of the description of the course programme and its main Humanities, like, for example, Political Science or addressees. We then focus in section 3 on contents Business Administration may choose the courses and methods utilized in our teaching and report as part of their minor subject. on a first evaluation summarizing our experiences One may assume that International Information in section 4. Finally, in section 5, we summarize Management is a field of study usually not included our programme and widen the scope, as we plan in the Humanities; however, the IIM curriculum fo- to offer our programme - or an adapted version cuses mainly on methods of social sciences and thereof - to other potential addressees. empirical studies (such as, e.g., usability studies or 2 Course programme / addressees qualitative surveys). The students of IIM are pre- pared for consulting jobs in industry and only to a The current curriculum concerning (computational) much lesser extent for information retrieval, which linguistics / NLP expands over 4 terms of 12-13 27 weeks each, addressing mainly two groups of stu- each related to questions of applied linguistics with dents. IIM (about 120) and ICT students (less than a focus on translation and international communi- 200) are to visit a general lecture that provides cation. The lecture is based on (Graefen, 2012). an Introduction to Linguistics (“Einführung in die Students are to pass a written examination. Sprachwissenschaft”) in their first term. This lec- Lecture (2) is a three-part lecture describing ture is offered by the colleagues of the ICT team. The ICT students afterwards focus on translation • Syntax (Constituency (A), Dependency in- studies before returning to linguistics later. About cluding verb valency (B), Topological fields half of them choose to focus rather on computa- (C)); tional linguistics than on the translation of tech- • Morphology (Inflection, Word Formation); nical texts and attend the lecture called (Formal) Description of Language for the Purpose of NLP • Semantics (Lexical Semantics (including (“Sprachbeschreibung für die Sprachtechnologie Word Nets), Compositional and Dialogue Se- (SBST)”) in their 4th term. In subsequent terms, mantics). these students also attend one of the seminars of- fered by our team on Machine Translation, Elec- The focus of this lecture is on the formalisation tronic Lexicography, Wordnets or similar. of the basic description models. Linguistic issues For students of IIM, the SBST lecture is manda- are hence described by ways of phrase structure tory in their second term. From their third term on, (introducing the concept of attribute-value pairs), they can choose to focus their studies either on NLP tagset, item-and-arrangement morphology, word or on Information Retrieval. Hence, only interested structure, dependency and subcategorization, etc. IIM students (about half of all) then continue with The theme of Dialogue Semantics is especially im- our lecture Introduction to Computational Linguis- portant, as up-to-date information systems make tics (“Einführung in die Maschinelle Sprachverar- use of dialogue systems. beitung”), though some also take courses on In- There are three written examinations during the formation Retrieval or Man Machine Interaction. term: Syntax A/B/C, Morphology (containing 25% Lastly, seminars on Corpus Linguistics, Corpus questions on Syntax), and Semantics (containing Processing Tools, Wordnets, Electronic Dictionar- each 12-20% questions on Syntax and Morphol- ies and similar are offered to them, supplemented ogy). There is no specific book offered, instead, by practical courses on scripting languages like a manually compiled lecture script is provided in perl, python or, again, Corpus Based Analyses. parts during the term. Lecture (3) moves forward from general linguis- 3 Methods tic models to tools and applications where these models are implemented. Again, Syntax, Morphol- From the perspective of Blended Learning (BL), ogy, and Semantics are the main themes (feature- we follow the definition of BL as “a combination of based grammars, finite state technology, probabilis- face-to-face (FtF) and computer assisted learning tic tagging, evaluation methodologies). Contained (CAL) in a single teaching and learning environ- in this lecture are the differences between theoreti- ment”, based rather on practice than on research cal approaches and their possible implementations. (Neumeier, 2005, p. 164). We do not plan for a (par- The course structure is otherwise parallel to (2). tial) replacement of FtF methods like O’Connor et. (4) In the seminars, the lecturers focus inter al (2011), as we rather attempt to achieve that “the alia on machine translation or terminology ex- on-line component becomes a natural extension of traction (both mainly directed at ICT students), traditional classroom learning” (OConnor, 2011, e-Dictionaries, Word Nets, and Corpus Linguis- p. 64). The courses are all organized “Module by tics. Practical Courses currently concern the pro- Module”, as described by Lisetskyi (, p. 34). gramming with scripting languages (perl/python) focussing on the manipulation of text data (e.g. 3.1 Lectures boilerplate removal, automated metadata extrac- Lecture (1) provides a general introduction to the tion) and on research on large amounts of text. Sec- history of linguistics, basic functions of language, ond, there is a regular practical course on Corpus and models of communication processes, followed Linguistic studies. For the latter, data from the by introductions to the basic linguistic disciplines, web is utilized for the compilation of own corpora 28 containing texts about specific phenomena and for mentioned in the screencast is summarised in a examining those via their linguistic features (like, bibliography on the last slides. for example, forum texts containing points of view So far, there are no screencasts for NLP issues on a product). available, but for the upcoming winter term 17/18 As prepared in the lectures of “Einführung in we plan to extend the device to accompany lec- die Maschinelle Sprachverarbeitung” (3), we con- ture (3): For each topic (tagging, morphological stantly enhance in (4) the linking of the contents analysis, parsing), we will provide two screencasts; of other courses of the institute with NLP issues one (slides) describing the theoretical model of discussed in our courses, like e-Dictionaries and a specific implementation and a second, showing Usability; Corpus Linguistics and Man-Machine In- a demo, i.e. a step-by-step demonstration of the teraction; and NLP technologies, evaluation meth- tool’s usage. All tools described will be avail- ods and Information Retrieval. able on the internet to ensure that the students can 3.2 Tutorials for (2) and (3) watch the screencasts and try out the tools when- ever and wherever they want. The efforts to pro- Weekly tutorials complement the lectures (2) duce a screencast are slightly higher than preparing and (3). For (2), groups of about 20-30 for a lecture; making use of a commercial software students each are to prepare their own solu- allowing for easy postproduction however speeds tions to exercises handed out during the lec- up the process. On average,the production of such tures. In (3), the students get access to NLP- two screencasts takes one working day. tools. Their training includes hands-on ex- ercises in the use of WebLicht (https:// In blended learning, a key issue is to provide the weblicht.sfs.uni-tuebingen.de), and transferrable knowledge from several perspectives, several online demos, like, for example the depen- so that at least most of individual backgrounds and dency parser Parzu (https://pub.cl.uzh. learning approaches of the students may be cov- ch/demo/parzu/). ered. While in the lectures the order of presentation In all tutorial sessions, possible solutions to the usually leads from a term via its definition to the exercises are presented and discussed with the stu- respective examples, the screencasts always begin dents. According to the regulations of our univer- with example words or sentences, followed by a sity, we cannot force attendance to these tutori- categorisation and, lastly, the term(s) describing als, however, we inform our students regularly that them. A constant colour scheme is used to ease the “presence really works” (Schulmeister, 2017). identification of each item. The sections are kept short and at the end of each one, a summary slide 3.3 Screencasts for (2) and - in future - (3) repeats the terms and their meaning. The slides In our experience, a rather big issue for the stu- are kept as simple as possible, so that the listener dents is practicing and rehearsing (computational) can focus on the audio part describing them; hence linguistic terminology with their lecturers, espe- animation is kept to a minimum. We try to keep cially in the base lecture (2, Language Description each screencast shorter than 15 minutes. for Language Technology). We thus decided to in- As the knowledge transferred in lectures, scripts, troduce a set of screencasts in the summer term of slides and screencasts is based on scientific publica- 2017, provided along with this lecture in order to tions, we obviously cite the background literature. give them “permanent access to the training ma- We aim at our students reading this literature, too, terial and therefore the opportunity for students so our goal is to rather work with a small amount of constantly review the material” ((Lisetskyi:2015, books and we choose books that are either available , p. 32)). The screencasts1 an they re-introduce in the library or not too expensive to buy. A pub- each of the linguistic fields from a new perspec- lication regularly referred to is for example (Buß- tive (compared to the lectures). Each one consists mann, 2008) with the aim of directing students of a set of slides which are provided as .pdf and to use proper terminological dictionaries. Addi- as a downloadable .mp4 file showing the slides tionally, specific textbooks (like (Müller, 2009) or plus a recorded voice describing them. Literature (Wöllstein-Leisten et al., 1997)) are cited and ex- 1 Our plained as further reading hopefully encouraging screencasts are produced with the commercial Software Camtasia https://www.techsmith.com/ our students to read the original literature and to video-editor.html. mention it in discussions. 29 4 Experiences / Evaluation 5 Summary and Future work Generally, the NLP teaching concept is well ac- Teaching NLP should not be begun before teaching cepted by the students and we receive a overall linguistics, in a general introduction followed in- positive feedback on the lectures, both face-to-face, troducing the methods of formal modelling. Such and in the courses’ evaluation which are written modelling forms the basis for understanding NLP anonymously. About half of the participants of (2) applications or implementations. Our teaching afterwards choose lecture (3) as well, though this methods follow the principle of blended learning, is not mandatory. as the knowledge transfer is available on different Seminars and Practical Courses on Corpus Lin- paths from which the individual student can choose. guistics are usually overbooked (around 30 book- One newly introduced instrument are screencasts ings of IIM students for 20 available posts), we that are provided in addition to the live oral presen- assume that this is because the theme of “Big Data” tation. The screencasts present the same knowledge is mentioned in the description of several of the as was provided in the lecture, but from a different courses – this theme is of a high interest for stu- perspective, thereby referring to the same literature. dents of Information Management. So far, we have not specifically evaluated the use The programming courses on scripting lan- of our screencasts, but the feedback we have re- guages are rather small (usually 8-12 participants), ceived during the past term was very encouraging, as most IIM students are hesitant to learn any form hence we plan to extend their provision to the lec- of programming. The courses however regularly ture on Language Description for Language Tech- receive very good evaluation results and it is espe- nology and to additionally produce a number of cially mentioned by the students that they prefer screencasts for the follow-up lecture Introduction the low number of participants. to Natural Language Processing (here: application Concerning feedback on the newly introduced descriptions in theory and practice and demos). At screencasts, so far, only three screencasts have been the end of term 17/18, we will ask our students in provided to support lecture (2), but the students fre- a survey about their utilization of the screencasts quently refer to them in the discussions and they and their point of view (whether they considered were mentioned very positively in the course eval- them helpful). In order to enable other lecturers uation (about 50% of the students took part in the (also of other Universities) to make use of them, written evaluation). This response is encouraging; all screencasts will be made available freely on the hence we plan to extend the number of screencasts web from autumn 2017 on. and to add them also to the higher level courses, as described in paragraph 3.3. In order to reach a wider public, the screencasts will soon be down- References loadable for free from a University Hildesheim Bußmann, Hadumod. 2008. Lexikon der Sprachwis- website which can be read from outside the Univer- senschaft. 4. Auflage Stuttgart: Kröner. sity. This is planned because we consider especially such theme-introducing screencasts to be usable for Dickson, K.A.; Stephens, B.W. Standing room only: other, similar courses, as well. faculty intervention increases voluntary lecture at- tendance and performance for disadvantaged year 1 Additionally, we are currently developing a glos- Bioscience students. Higher Education Pedagogies sary of grammatical terms in a new format. The (1) 1: 1-15 DOI: 10.1080/23752696.2015.1134196 idea for this glossary stems from the evaluation of the beta version of an online project called “Ger- Faaß, Gertrud; Bohle, Ulrike; Quahlo, Jasmin. masked man as a (scientific) language” (eDaF, https: 2017. The Grammatical Glossary: a glossary of German grammar terms. accepted as a poster at //www.uni-hildesheim.de/eDaF) . The eLex 2017 conference in Leiden, September 2017 glossary will add written definitions of terms to the screencasts. We hope that they will assist the Graefen, Gabriele; Liedke, Martin. 2012. Germanis- students in finding definitions and even more exam- tische Sprachwissenschaft. Deutsch als Erst-, Zweit- ples for the terms they have to learn. This glossary oder Fremdsprache. Stuttgart: UTB. will include graphical elements (like an ontology Müller, Horst M. (ed). 2009. Arbeitsbuch Linguis- of terms, or dependency graphs), see (Faass et al., tik. 2. Auflage. Paderborn/München/Wien/Zürich: 2017). Schöning. 30 Lisetskyi, K.A. 2015. Blended Learning Model in the System of Higher Education Advanced Education 2015/4:32-35 Neumann, Petra. 2005. A closer look at blended learning-parameters for designing a blended learn- ing environment for language teaching and learning. ReCALL, 17(2):163-178. O‘Connor, Christine, Mortimer, Dennis, Bond, Sue. 2011. Blended Learning: Issues, Benefits and Chal- lenges IJES, 19(2). Schulmeister, R. 2017. Presence and Self- Study in Blended Learning. eleed, Issue 12. https://eleed.campussource.de/ archive/12/4502 [23-08-17] Wöllstein-Leisten, Angelika; Heilmann, Axel; Stepan, Peter; Vikner, Sten. 1997. Deutsche Satzstruktur. Grundlagen der syntaktischen Analyse. Tübingen: Stauffenburg Verlag Brigitte Narr GmbH. 31