=Paper=
{{Paper
|id=Vol-2535/paper_9
|storemode=property
|title=A data-driven platform for creating educational content in language learning
|pdfUrl=https://ceur-ws.org/Vol-2535/paper_9.pdf
|volume=Vol-2535
|authors=Konstantin Schulz,Andrea Beyer,Malte Dreyer,Stefan Kipf
|dblpUrl=https://dblp.org/rec/conf/qurator/SchulzBDK20
}}
==A data-driven platform for creating educational content in language learning==
A data-driven platform for creating educational content in language learning? Konstantin Schulz, Andrea Beyer, Malte Dreyer, and Stefan Kipf Humboldt-Universität zu Berlin, Germany Abstract. In times of increasingly personalized educational content, de- signing a data-driven platform which offers the opportunity to create content for different use cases is arguably the only solution to handle the massive amount of information. Therefore, we developed the software ”Machina Callida” (MC) in our project CALLIDUS (Computer-Aided Language Learning: Vocabulary Acquisition in Latin using Corpus-based Methods). The main focus of this research project is to optimize the vocabulary acquisition of Latin by using a data-driven language learning approach for creating exercises. To achieve that goal, we were facing problems con- cerning the quality of externally curated research data (e.g. annotated text corpora) while curating educational materials ourselves (e.g. prede- fined sequences of exercises). Besides, we needed to build a user-friendly interface for both teachers and students. While teachers would like to create an exercise or test and use them (even as printed out copies) in class, students would like to learn on the fly and right away. As a result, we offer a repository, a file exporter for various formats and, above all, interactive exercises so that learners are actively engaged in the learning process. In this paper we show the workflow of our software and explain the architecture focusing on the integration of Artificial In- telligence (AI) and data curation. Ideally, we want to use AI technology to facilitate the process and increase the quality of content creation, dissemination and personalization for our end users. Keywords: Educational content · Language learning · Data-driven · Exercise repository 1 Curating language exercises: the user’s point of view In German high schools, Latin is to this day the third most important foreign language, esp. in grades 7 to 10. For that reason, educational publishing compa- nies are investing in teaching materials for Latin classes, but all these materials bear certain challenges for educational stakeholders: they are proprietary, hardly ? This project is funded by the German Research Foundation (project number 316618374) and lead by Malte Dreyer, Stefan Kipf and Anke Lüdeling. Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 K. Schulz et al. adaptable (or not even digital) for teachers and split into a vast amount of dif- ferent items like textbook, exercise book, vocabulary book etc. that all learners have to buy separately, if needed [7, p. 194f.]. On top of that, most of the teach- ing materials only refer to the initial stage of language acquisition, in which Latin original texts do not yet matter [21, p. 133]. Although the companies are also providing teachers with reading books for intermediate learners containing sections of selected Latin original texts, teachers are still in continuous need of adaptable texts and exercises for these advanced stages. In addition, although the curricula offer a standardized canon of Latin authors [20, p. 45], it still in- cludes a wide range of different texts compared to the available time in Latin classes. What is more, teachers prefer to use texts whose vocabulary is covered as much as possible by the basic vocabulary already acquired by the students, since the comprehensibility of the text can be considerably limited if less than 95% of words are known [25, p. 352]. As a consequence, teachers may choose texts from a large pool of Latin au- thors, but without supporting material they rarely do, because they lack the time to prepare texts and exercises independently. Instead, they often fall back on ready-made materials that are quality-tested but rarely fit the needs of the learning group. This situation results in a kind of dilemma: Many teachers would like to enrich their lessons with further authors and support their students indi- vidually in their language acquisition with (personalized) exercises, but they do not feel up to the challenge of selecting and adapting materials to their students’ needs [24, p. 115/117]. This brief outline of the problem shows the need to develop a platform that allows teachers (and students) to create needs-based exercises for authentic Latin texts. Furthermore, for a good user experience it is necessary that the process of generation is fast and easy to handle, that the generated exercises are ready to use (analogically and digitally) or share, and that they are well curated for later reuse. These requirements are illustrated in three exemplary use cases which have been modeled loosely following the guidelines of Cockburn [9]. Use Case 1: The teacher needs 2: The teacher does 3: The teacher wants exercises based on au- not have enough to support his/her thentic Latin texts time to prepare an students in a person- exercise manually alized way to enable individual learning Primary Teacher actor Stakehol- Teacher, students ders Scope An easy to handle ex- A database with Learning Analytics ercise generator well-curated dif- and recommenda- ferent types of tions for future exercises exercises A data-driven platform for creating educational content in language learning 3 User As a teacher, I want As a teacher, I As a teacher, I want story to select a section of search the reposi- an overview of how the work to be read. tory for at least one my students perform I want to compare matching exercise. I in an exercise. I want this section to the want to combine dif- to be able to see used core vocabulary ferent search terms at a glance what for getting an overview in an extended mistakes are made of the amount of un- search, e.g. Latin most often so that known words. Then, I text passage, exer- I know what to fo- want to set the param- cise type, linguistic cus on when creat- eters of the intended focus, popular ex- ing the next exer- exercise: type of exer- ercises, vocabulary. cise. I would also cise and linguistic focus Then, I want to use like a recommenda- (specific lemmata, syn- the exercise in class tion as to which ex- tactic structures, mor- (with smartphones, ercise to select next phology, context-based tablets or interactive if there already is a meaning, word equiva- whiteboard), to em- suitable exercise in lents). After getting a bed it in a learning the database. preview, all selections platform for later can be easily changed, use or to send it if I think that, e.g.,the to the students for exercise is too difficult. their homework. Level Repetition and deep- Repetition and deep- Zone of proximal ening of vocabulary ening of vocabulary (linguistic) devel- knowledge in context knowledge (individ- opment of each ually) student Precondi- Teachers are presented Teachers are pre- Students generate tion with an option to gen- sented with an data about their erate new exercises. option to browse individual progress. exercises from an The data can be existing database. tracked and ana- lyzed automatically. Minimal The generated exercise The database con- Many students Guaran- can be exported. tains exercises and have completed the tees can be searched. same (or similar) exercises. Success The generated exercise The search for a Teachers receive Guaran- can be shared and is matching exercise helpful suggestions tees stored in a database is supported by for choosing the next that is easily accessible advanced filtering. exercise. to end users. Popular and well- curated exercises are marked. 4 K. Schulz et al. Trigger The teacher invokes The teacher decides Students have just the exercise generation to use a ready-made completed an exer- setup. exercise. cise and now should attempt another one. Basic The teacher picks the The teacher picks The students regis- flow: option of generating a the option of search- ter with the software Step 1 new exercise. ing the database. and go through the given exercise. Step 2 The teacher chooses a The teacher selects The teacher receives text passage from a a single or multiple an evaluation about wide range of Latin au- filters or uses the the performance thors. extended search op- (percentage, er- tion. ror types) of each student. Step 3 The teacher compares The teacher eval- The teacher also gets the words of the text uates the results. a recommendation with the used core vo- Depending on the which parameters cabulary and changes results, the teacher to set for the next the section accordingly changes the search exercise or which (go to step 2) or pro- terms / filters (go to exercise to select ceeds to set the param- step 2) or decides to from the database. eters of the exercise. use one of the given exercises. Step 4 The teacher decides on The teacher uses the The students get the exercise format, the exercise in class or their new exercise linguistic focus and the disseminates it using and work on it (go instruction statement. a link, so that stu- to step 1). dents may use their own mobile devices. Step 5 The system presents a preview. The teacher either exports the exer- cise to a printable for- mat or shares it digi- tally or tries other pa- rameters (go to step 4) or even changes the section (go to step 2). Table 1: Use Cases A data-driven platform for creating educational content in language learning 5 2 Automatic parsing and evaluation: the developer’s point of view In order to help teachers create high-quality educational content, we provide support for each of the necessary steps in our software at https://korpling. org/mc. 2.1 Selection of text (Use Case 1) Many Latin text editions are proprietary and thus do not comply with the FAIR data principles [32]. Additionally, such resources are not compatible with the requirements for projects funded by the German Research Foundation, which need to prefer open licenses to closed ones [16]. To solve this problem, we de- cided to rely solely on text editions from the public domain. This choice also narrowed down the range of suitable text repositories a lot. In the end, we set- tled for the Perseus Library [3] because it has a well-defined API (Canonical Text Services [30]) and a standardized citation model (URN [8]) for ancient text passages, works and authors. This repository, however, offers a vast amount of texts: several hundreds of works from dozens of authors can be explored, so our users need a way to prioritize them according to their specific needs. Currently, we support this by offering a vocabulary filter and measures for text complexity. The vocabulary filter has to be targeted at one of several reference vocabu- laries. These are essentially lemmatized word frequency lists derived from text- books [5], treebanks [4] or materials created by publishing houses [31]. The ref- erence vocabularies can be used to estimate the students’ previous knowledge by specifying that, e.g., they should know the 500 most frequent words from that list. This subset of words is then compared to the lemmata occurring in a given corpus. Thus, if teachers specify a large corpus and the desired size of the final text passage, the software will rank all possible subsets of the corpus according to their congruence with the reference vocabulary. The boundaries for each sub- set are chosen intelligently in order to maximize the number of known words. This enables teachers to always choose a text that supports their students’ zone of proximal development [27, p. 238]. Text complexity, on the other hand, does not directly relate to a student’s pre- vious knowledge, but to an intrinsic comparison between multiple Latin texts. In our case, it is a combination of well-known operationalizations of the pre- sumed degree of difficulty that readers may face when approaching a text, e.g. lexical density [19, p. 61]. This helps teachers to determine the suitability of a given text passage (or corpus) with regard to their students’ linguistic compe- tence. The major strength of such measures does not reside in their inherently flawed approximation of actual complexity, but in enabling a formalized linguis- tic comparison that goes beyond mere counting of words and integrates syntax, morphology and semantics [11, p. 607]. By combining information about vocabu- lary and text complexity, teachers can significantly accelerate and improve their choice of texts, thus curating better educational content for their students. 6 K. Schulz et al. 2.2 Focus on specific linguistic phenomena (Use Case 1) Once teachers have committed themselves to a suitable text passage, they may still not know the exact target of a potential exercise. Therefore, we offer a key- word in context (KWIC) view to explore collocations and the specific usage of a particular word [18, p. 97]. The superficial token-based display is enriched by morpho-syntactic information, e.g. part of speech and dependency links. There- fore, teachers can qualitatively inspect usage patterns on multiple linguistic levels as needed. A major problem in this approach is that most Latin texts are not curated as treebanks with scientific annotations, but rather just as plain text. In other words, we lack the key prerequisite to provide a rich KWIC view. To compensate for this shortcoming, we use an AI-driven dependency parser [29] to process plain Latin text in a fully automatic manner. It was trained as a multi-task classifier using representation learning on existing curated treebanks [28, p. 4291]. This is very reliable for basic tasks like tokenization, segmentation, lemmatization and part-of-speech tagging (>95% accuracy), but is rather error-prone (∼80% accuracy) for dependency links. Thus, the syntactic visualization in the KWIC view may not always be entirely correct, but the basic concordance function and the information about parts of speech are highly accurate, thereby enabling teachers to create educational content in a much more well-informed manner. Besides, the lack of performance on the syntactic level may be alleviated by accessing and linking further resources to the existing parser output [22, p. 75]. 2.3 Design of interaction / learning setting (Use Case 1) Now that the basic content (i.e. texts and phenomena) of a new exercise has been established, it is time to look at the layout. Depending on the chosen phenomenon, but also on a student’s personal preferences, certain types of interaction may be more appropriate than others in order to reach a specific educational goal (see Fig. 1). In gen- eral, a systematic variation of interac- tion types can support more learning styles [26, p. 169], make the learning process more multifaceted [17, p. 1] and lead to a higher degree of moti- vation [17, p. 5] and engagement [26, p. 165]. On the other hand, the exclu- sive usage of ready-made exercises in Fig. 1. Setting parameters for a new exer- various formats can also cause men- cise tal overload for students [26, p. 161]. A data-driven platform for creating educational content in language learning 7 Therefore, we offer teachers the pos- sibility to choose from a range of ex- isting exercises with the same type of interaction, so it is easier for them to maintain a certain level of consistency, even in longer learning sequences. Fur- thermore, some of the exercise formats may be considered part of the same line of progression, e.g. clozes can be solved with a visible pool of boxes using Drag and Drop (easy, see Fig. 2) or by typing characters into blank text fields (more difficult). Besides, the same basic technology and layout can be used to produce different exercises, e.g. Drag and Drop works for both the cloze and matching format. In this regard, the usage of a large common framework (H5P [2]) allows for a diverse, but consistent learning experience. As an inspiration for longer sequences of exercises, we offer the so-called Vocabulary Unit which roughly corresponds to the length of an average lesson in school (about 45 minutes). Fig. 2. Drag-and-Drop-based cloze exercise with visible pool and binary feedback 2.4 Dissemination (Use Case 2) When teachers are satisfied with their created content, they typically want to distribute it to their students to employ it in a didactic context. To that end, every exercise is labeled with a unique identifier, so it can be saved in a database and shared via deep links to the software server (e.g. https://korpling.org/ mc/exercise?eid={EXERCISE_ID}). When creating an exercise as well as at any later point in time, users may also export a given exercise to specific file formats: PDF and DOCX for printing, XML for integration into a learning management system. That way, teachers and students are able to build their own collections of useful exercises over time and, in the case of XML, derive additional benefit from the features offered by Learning Management Systems like Moodle [1]: structured online courses, user management, learning analytics and so on. If, on the other hand, teachers do not have the time to curate their own content, we provide access to public exercises that can be filtered and searched for using an extensive metadata schema, including the author, work, text passage, interaction type, popularity, vocabulary and text complexity (see Fig. 3). 8 K. Schulz et al. Fig. 3. Exercise Repository with keyword search and options for sorting/filtering 2.5 Evaluation (Use Case 3) Moodle already offers summative evaluation for created exercises, but teachers usually refrain from using it because they have not been trained [6, p. 160] to deal with the technological complexity during setup, maintenance and everyday us- age [10, p. 342]. This also applies to digital media in general [14, p. 18]. Therefore, in the long run, we need to provide such evaluation ourselves. A basic prototype that goes beyond the single-exercise binary feedback (correct/incorrect) has been implemented in our Vocabulary Unit. It shows the overall performance for the given exercises, the student’s development from beginning to end and how many words from the target vocabulary are already known (see Fig. 4). In the future, we would like to add further analyses pertaining to the preferred type of inter- action, problematic performance on certain linguistic phenomena and the speed of problem solving. These goals are in line with the recent trend of focusing on the learner’s perspective in computer-assisted evaluation [15, p. 313]: Where are my strengths and weaknesses? How did I develop during the last weeks? What can I do to improve specific skills? Fig. 4. Summative evaluation of a student’s performance in the Vocabulary Unit However, user-specific quantitative evaluation is not enough. In order to in- crease students’ learning success, they also need adaptive qualitative feedback. A prerequisite for that is the detection and classification of errors: the integrated A data-driven platform for creating educational content in language learning 9 binary evaluation of H5P can be used as a basis to categorize various error types, e.g.: Did the student fail to give any answer at all? Did the student actually pro- vide the correct answer, but with minor typing mistakes? Did the student make obvious grammatical mistakes? If so, are they related to morphology, vocabulary or syntax? Depending on the specific type of error, suitable feedback needs to be generated. Our main objective here is to provide deeper support for teachers and students in order to optimize the learning progress towards a specific goal, e.g. being able to read texts from a specific corpus. A good approach in that case may be to create exercises for this corpus and use the students’ performance as an objective for reinforcement learning [13, p. 2094]. The AI model should then learn to utilize suitable pedagogical actions (e.g. distributing exercises for learning) to maximize a student’s performance on the test exercise dataset for a corpus. 3 Next steps: Learning Analytics and semantic analysis For the future integration of Learning Analytics in our software, we have already built a prototype that evaluates a learner group’s performance across multiple dimensions, e.g. working speed, interaction type, accuracy and performance gain over time. A large part of this analysis is most suitable for groups, which is why it is probably useful for teachers. Individuals, on the other side, would need a stronger emphasis on their development over time, which is harder to track because it would require them to use the software as their main source of language learning. Therefore, specific milestones are to be reached in the next months: – summarize group performances as an indicator that helps teachers to read- just their general didactic strategy, e.g. by focusing more heavily on certain linguistic phenomena – analyze results for individual students over time and suggest the most suit- able exercises for them considering their personal characteristics, i.e. learning style, thematic priority and particular weaknesses Apart from improving the quality of the existing workflow, we also consider increasing its quantity, e.g. by adding new linguistic phenomena: Semantics is currently underrepresented in our automatic analyses, which makes it hard for teachers to group their educational content around a certain topic. This could be alleviated by integrating representation learning as an independent feature: Unsupervised machine learning, in the form of Contextual Word Embeddings like those provided by BERT [12], may be used to distinguish different usages of the same word in different sentences, thereby highlighting fine-grained semantic differences between authors or even within the same work. While we already used Word2Vec [23] to perform simple vector-based analyses on existing Latin treebanks, it still remains a challenge to generalize the calculation, visualization and interpretation in this workflow while maintaining a sufficient level of quality. A well-founded evaluation of representation learning for the purposes of language acquisition is arguably the most important goal in this respect. 10 K. Schulz et al. References 1. Moodle: A learning platform designed to provide educators, administrators and learners with a single robust, secure and integrated system to create personalised learning environments. Moodle Pty Ltd 2. H5P. Create, share and reuse interactive HTML5 content in your browser. Joubel AS (Jun 2018) 3. Almas, B., Babeu, A., Krohn, A.: Linked Data in the Perseus Digital Library. ISAW Papers 7(3) (2014) 4. Bamman, D., Crane, G.: The Ancient Greek and Latin Dependency Treebanks [AGLDT]. In: Language Technology for Cultural Heritage, pp. 79–98. Springer (2011) 5. Bartoszek, V., Datené, V., Lösch, S., Mosebach-Kaufmann, I., Nagengast, G., Schöffel, C., Scholz, B., Schröttel, W.: VIVA 1 Lehrerband, vol. 1. Vandenhoeck & Ruprecht (2013) 6. Bäsler, S.A.: Lernen und Lehren mit Medien und über Me- dien. Ph.D. thesis, Technische Universität Berlin (2019). https://doi.org/http://dx.doi.org/10.14279/depositonce-7833 7. Beyer, A.: Das Lateinlehrbuch Aus Fachdidaktischer Perspektive: Theorie - Anal- yse - Konzeption. Universitätsverlag Winter GmbH, Heidelberg (2018) 8. Blackwell, C., Smith, N.: The Canonical Text Services URN Specification, Version 2.0.rc.1 [CITE / URN] (2015) 9. Cockburn, A.: Writing Effective Use Cases. The Agile Software Development Series, Addison-Wesley, Boston, 16. print edn. (2006) 10. Costa, C., Alvelos, H., Teixeira, L.: The use of Moodle e-learning platform: A study in a Portuguese University. Procedia Technology 5, 334–343 (2012) 11. Dascalu, M.A., Gutu, G.S., Ruseti, S.S., Cristian Paraschiv, I.S., Dessus, P., Mc- namara, D.A., Crossley, S.A., Trausan-Matu, S.A.: ReaderBench: A Multi-lingual Framework for Analyzing Text Complexity. In: Lavoué, É., Drachsler, H., Verbert, K., Broisin, J., Pérez-Sanagustı́n, M. (eds.) Data Driven Approaches in Digital Education, Proc 12th European Conference on Technology Enhanced Learning, EC-TEL 2017. pp. 606–609. Data Driven Approaches in Digital Education 12th European Conference on Technology Enhanced Learning, EC-TEL 2017, Tallinn, Estonia, September 12–15, 2017, Proceedings, Springer, Tallinn, Estonia (2017) 12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec- tional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 13. Dorça, F.A., Lima, L.V., Fernandes, M.A., Lopes, C.R.: Comparing strate- gies for modeling students learning styles through reinforcement learn- ing in adaptive and intelligent educational systems: An experimental anal- ysis. Expert Systems with Applications 40(6), 2092–2101 (May 2013). https://doi.org/10.1016/j.eswa.2012.10.014 14. Eickelmann, B., Bos, W., Labusch, A.: Die Studie ICILS 2018 im Überblick – Zentrale Ergebnisse und mögliche Entwicklungsperspektiven. In: Gerick, J., Gold- hammer, F., Schaumburg, H., Schwippert, K., Senkbeil, M., Vahrenhold, J., Eickel- mann, B., Bos, W. (eds.) ICILS 2018 #Deutschland. Computer- und informations- bezogene Kompetenzen von Schülerinnen und Schülern im zweiten internationalen Vergleich und Kompetenzen im Bereich Computational Thinking, pp. 7–31. Wax- mann (2019), oCLC: 1124310958 A data-driven platform for creating educational content in language learning 11 15. Ferguson, R.: Learning analytics: Drivers, developments and challenges. Interna- tional Journal of Technology Enhanced Learning 4(5/6), 304–317 (2012) 16. Forschungsgemeinschaft, D.: Appell zur Nutzung offener Lizenzen in der Wis- senschaft. Tech. Rep. 68, Deutsche Forschungsgemeinschaft (Nov 2014) 17. Harecker, G., Lehner-Wieternik, A.: Computer-based Language Learning with In- teractive Web Exercises. ICT for Language Learning pp. 1–5 (2011) 18. Helm, F.: Language and culture in an online context: What can learner diaries tell us about intercultural competence? Language and Intercultural Communication 9(2), 91–104 (May 2009). https://doi.org/10.1080/14708470802140260 19. Johansson, V.: Lexical diversity and lexical density in speech and writing: A de- velopmental perspective. Working Papers 53, 61–79 (2008) 20. Kipf, S.: Geschichte des altsprachlichen Literaturunterrichts. In: Lütge, C. (ed.) Grundthemen Der Literaturwissenschaft, pp. 15–46. De Gruyter, Berlin and Boston (2019) 21. König, J.: Die Lektürephase. In: Janka, M. (ed.) Lateindidaktik, pp. 133–155. Cor- nelsen Scriptor, Berlin (2017) 22. Mambrini, F., Passarotti, M.: Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin. In: Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019). pp. 74–81. Paris, France (Aug 2019) 23. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre- sentations in vector space. arXiv preprint arXiv:1301.3781 (2013) 24. Munser-Kiefer, M., Martschinke, S., Hartinger, A.: Subjektive Arbeitsbelastung von Lehrkräften in jahrgangsgemischten dritten und vierten Klassen. In: Miller, S., Holler-Nowitzki, B., Kottmann, B., Lesemann, S., Letmathe-Henkel, B., Meyer, N., Schroeder, R., Velten, K. (eds.) Profession und Disziplin : Grundschulpädagogik im Diskurs, pp. 114–120. Jahrbuch Grundschulforschung, Springer Fachmedien, Wiesbaden (2018) 25. Nation, I.S.: Learning Vocabulary in Another Language. Cambridge University Press, 2 edn. (2013) 26. Schmid, E.C.: Developing competencies for using the interactive whiteboard to implement communicative language teaching in the English as a Foreign Language classroom. Technology, Pedagogy and Education 19(2), 159–172 (2010) 27. Shabani, K., Khatib, M., Ebadi, S.: Vygotsky’s Zone of Proximal Development: In- structional Implications and Teachers’ Professional Development. English language teaching 3(4), 237–248 (2010) 28. Straka, M., Hajic, J., Straková, J.: UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In: LREC. pp. 4290–4297 (2016) 29. Straka, M., Straková, J.: UDPipe. A LINDAT/CLARIN project 30. Tiepmar, J., Teichmann, C., Heyer, G., Berti, M., Crane, G.: A new implemen- tation for canonical text services [CTS]. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaT- eCH). pp. 1–8 (2014) 31. Utz, C.: Mutter Latein und unsere Schüler — Überlegungen zu Umfang und Aufbau des Wortschatzes [BWS]. Antike Literatur–Mensch, Sprache, Welt. Dialog Schule und Wissenschaft 34, 146–172 (2000) 32. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E., Bouw- man, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., 12 K. Schulz et al. Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., ’t Hoen, P.A.C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018 (Mar 2016). https://doi.org/10.1038/sdata.2016.18