Implementing Language Games with NLP Tools: The Greek Case Christos Maria Monica John Aristides Tsalidis Fountana Gavrielidou Stamatopoulos Vagelatos Neurocom S.A. CTI&P CTI&P Neurocom S.A. CTI&P Athens, Greece Athens, Greece Athens, Greece Athens, Greece Athens, Greece tsalidis@neuroco fountana@cti.gr monica@cti.gr stamatop@neuro vagelat@cti.gr m.gr com.gr ABSTRACT trigger of interest, which are considered consistent with positive learning results [6]. Digital games, as a popular technology in youth entertainment, constitute a fast-growing field which has been affecting various In this paper, we examine the needed NLP infrastructure that can aspects of education for several years now. The research project support the dynamic compilation of educational games for the “Lexipaignio” focuses on the development of an innovative and Greek language, within the research project “Lexipaignio”. state-of-the-art NLP (Natural Language Processing) environment for the creation of digital educational games for Greek students. A variety of simple and easy-to-play mini-games has been 2 Educational Games specified aiming to improve students’ linguistic competence by Focusing on Education, the “Lexipaignio” project aims at the developing a better understanding on various grammatical, utilization and further development of a series of Natural morphological and vocabulary related phenomena in general, but Language Processing tools (Morphological Lexicon, Lemmatizer, also in the context of specific subjects (e.g. geology – geography, Mnemosyne language editing system, corpus of Greek school biology, etc.). In this paper, the main functionalities of the NLP subjects, etc.), for the implementation of dynamically created environment will be presented towards the implementation of gamified educational material. The paper highlights the creation mini-games for the Greek language. of mini-games related to the subject of Greek language in schools. Being part of an ongoing project, the development of language CCS CONCEPTS mini-games will provide us with useful feedback regarding the use • Computer games • Natural Language Processing of NLP for the development of dynamic gamified materials in many school subjects. KEYWORDS According to relevant research [2, 8], computer games provide a Educational Games, NLP, Game-based Learning quick and interesting learning pace in contrast to the conventional teaching methods and in this perspective, they can affect the dynamics as far as digital learning is concerned. The purpose of 1 Introduction the ongoing project is the development of an innovative and state- Natural Language Processing (NLP) is not really a new research of-the-art computational environment through the creation of field since the first effort started in the 1950s with the so called digital educational games for students (primary and secondary “Turing test”. Nevertheless, it took more than three decades of level) in order to: a) improve language competence and overall research work in order to have real progress with substantial level of students’ knowledge and b) develop various vocabulary results. Nowadays NLP (which in fact is part of AI) is a research and linguistic skills, while understanding the context of specific area that gains extreme interest mainly due to the enormous school subjects (biology, geography etc.). amount of data that are produced every single minute in digital format: the ability to process information and transform it to The new environment will support the automated production of knowledge is of great value in today’s “information jungle” [1]. questions related to different levels of competency as far as the Greek language structure and its use are concerned in terms of On the other hand, the use of digital games to support learning spelling, morphology, vocabulary, as well as terminology found in (game-based learning) through an alternative, more attractive school textbooks which is integrated into the overall environment way is rapidly developing in both European and worldwide level. and narration of digital educational games. It will also enable Obviously, digital games is a fast developing field, as it is amongst teachers to automatically create a large volume of questions the most popular technologies young people use to amuse through their insertion in educational games (crosswords, match themselves. The educational potential of digital games is games, multiple choice, scrabble games, word search puzzles, etc.) correlated to the properties of motivation, amusement and the and at the same time, it will be possible for them to control GAITECUS0, September 02–04, 2020, Athens, Greece Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). parameters such as: a) school subject (biology, geography, o keyword extraction (application of TF/IDF literature etc.), b) grade, c) grammatical phenomena (conjugation, algorithms) spelling, syntax, vocabulary). We believe that the proposed o extraction of candidate terms (application of environment can become a successful tool in supporting and morphosyntactic patterns that followed by enriching the educational process in an appealing and attractive terms) way. As far as language teaching is concerned, the traditional approach which is mainly restricted to the teaching of rules and At the runtime, the system after modelling the functionality and is exhausted in monotonous exercises for the students on the the data needed by a rich set of games useful in education, offers assumption that the language is a one-dimensional teaching several NLP functions that can feed the games through web object, seems not to convey the expected results and should be services API. Examples of services supported are: redefined on the basis of modern functional and communicative teaching approaches. ● predetermined word lists from preparation phase, ● fuzzy matching using spelling checker engine, 3 NLP Infrastructure ● synonyms and antonyms using thesaurus engine, For the needs of the “Lexipaignio” project various Language ● inflection of nouns, adjectives, verbs, … Resources and NLP technologies are used, to create a Web ● morphology of words with decomposition in Services API to support the operational requirements of hyphenation, formation using morphemes, educational games [10, 11]. In the backend “Mnemosyne” platform [3] is utilized in pre-processing and runtime phase (see Fig. 1). ● grammatical checking using grammar checker, etc. Mnemosyne incorporates a vast number of language resources and technologies including a) many different dictionaries, e.g. spelling vocabularies, morphology, thesaurus, gazetteers, and b) 4 Language Games “classic” NLP technologies like fuzzy matching engines, stemmers, The focus of Lexipaignio educational mini-games relates to the taggers, syntax checkers. Besides the “standard” technologies, the improvement of language competency level and linguistic abilities environment offers “modern” NLP and machine learning of upper primary and lower secondary Greek students. To this, an functionality as classification mechanisms such as K-Means and initial study of categorization of grammatical phenomena and Hierarchical clustering algorithms, keyword extraction and common linguistic errors was conducted. indexing using TF/IDF and BM25 algorithms [7], text production A common mistake in Modern Greek relates to the application of using n-gram language models [9]. At the top of the stack, conjugation rules in adjectives ending in “-ης” and “-ες” (πλήρης Mnemosyne implements several supervised and unsupervised – πλήρες). These adjectives are of increased difficulty level due to machine learning algorithms such as Naïve-Bayes and a particularity in the formation of some masculine and feminine Multinomial Linear Regression, as well as word embeddings using types. Such difficulties are noted in terms of spelling, word CBOW & SKIPGRAM algorithms [4] as well as GLOVE algorithm formation, as well as word use in sentences. [5]. As a step forward, a Greek Language corpus was compiled, which, The above models have been applied on a corpus of more than along with the NLP components served as a basis for the creation 1.5G words collected from electronic news, movies subtitles, of the educational mini-games. The Greek Language corpus literature books, legislation documents, etc. The NLP comprised of all the material included in the Greek Language infrastructure is used in two consequent phases: 1) the books studied in upper primary and lower secondary Greek school preparation phase where the teacher must prepare the data for the in the context of the Modern Greek language course. gamified lessons and 2) the runtime phase when the games run The use of adjectives in “-ης” and “-ες” in NLP educational mini- and asks for data. games resulted from a thorough study on grammar exercise In the preparation phase the functionality supported includes: typologies and their possible applications to the suggested grammatical phenomena and common linguistic mistakes. Next, ● queries to language resources, e.g. morphosyntactic the above typologies were considered regarding mini-game dictionary for adjectives ending in “-ης” and “–ες” (see alternative solutions. following section) In the case of adjectives in “-ης” and “-ες”, the above led to the ● the incorporation of a document collection with construction of a series of mini-games regarding True/False, educational material and extraction of: multiple choice, word creation, list creation, gap filling, text o n-gram language models, processing and text checking. o clusters of similar documents (application of Based on the produced infrastructure, the teacher can create K-Means and hierarchical clustering his/her own games by selecting a) the grammatical phenomenon algorithms) of interest, b) the level of difficulty and c) a certain text that he/she is willing to use. [10] Tsalidis, C., Vagelatos, A., Orphanos, G.: An electronic dictionary as a basis for NLP tolls: The Greek case. In Proc. Of 11th Conference on Natural Language Pro-cessing, Fez, Morocco (2004). [11] Vagelatos, A., Mantzari, E., Pantazara, M., Tsalidis, Ch., Kalamara, C.: Developing tools and resources for the biomedical domain of the Greek language. Health Informatics Journal, 17(2), 127-139 (2011). Figure 1: Mnemosyne platform, with morphosyntactic analysis of a text from the corpus. 5 Conclusions With the aim to deploy Natural Language Processing infrastructure for the creation of educational games in a variety of school subjects (Geography, Modern Greek Language, Biology), so far, the language processing techniques applied in “Lexipaignio” project provide encouraging results. Regarding the implementation of the appropriate infrastructure for dynamic educational games, we hope that soon educators will be able to easily create mini-games according to their students’ needs, by regulating the game content. ACKNOWLEDGMENTS This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE (project code: T1EDK-05094). REFERENCES [1] Alhawiti, KM.: Natural language processing and its use in education. International Journal of Advanced Computer Science and Applications, 5(12), (2014). [2] Gregory, S., Torsten, R., Wood, L., Henderson, M.: Gamification and digital games-based learning in the classroom. In: Teaching and Digital Technologies: Big Issues and Critical Questions. Henderson, M., Romeo, G. Editors, Cambridge University Press (2015). [3] Kokkinos, Th., Gakis, P., Iordanidou, A., Tsalidis, Ch.: Utilising Grammar Checking Soft-ware within the Framework of Differentiated Language Teaching. In Proceedings of the 2020 9th International Conference on Educational and Information Technology, Oxford, United Kingdom, (2020). [4] Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. CoRR, abs/1301.3781. [5] Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global Vectors for Word Representation. EMNLP. [6] Picca, D., Jaccard, D., Eberle, G.: Natural Language Processing in Serious Games: A state of the art. International Journal of Serious Games, 2(3), 77-97 (2015). [7] Ramos, J.: Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 242, 133- 142, (2003). [8] Reinders, H.: Digital Games in Language Learning and Teaching. Palgrave Macmillan Pub-lishing (2012). [9] Singla, A., Karambir, M.: Comparative analysis & evaluation of euclidean distance function and manhattan distance function using k-means algorithm. International Journal, 2(7), (2012).