<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Creating vocabulary exercises through NLP</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Manex</forename><surname>Agirrezabal</surname></persName>
							<email>manex.aguirrezabal@hum.ku.dk</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Nordic Studies and Linguistics</orgName>
								<orgName type="institution">University of Copenhagen</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Begoña</forename><surname>Altuna</surname></persName>
							<email>begona.altuna@ehu.eus</email>
							<affiliation key="aff1">
								<orgName type="laboratory">Ixa Group</orgName>
								<orgName type="institution">University of the Basque Country</orgName>
								<address>
									<addrLine>Manuel Lardizabal 1</addrLine>
									<postCode>20018</postCode>
									<settlement>Donostia</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lara</forename><surname>Gil-Vallejo</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Universitat Oberta de Catalunya</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Josu</forename><surname>Goikoetxea</surname></persName>
							<email>josu.goikoetxea@ehu.eus</email>
							<affiliation key="aff1">
								<orgName type="laboratory">Ixa Group</orgName>
								<orgName type="institution">University of the Basque Country</orgName>
								<address>
									<addrLine>Manuel Lardizabal 1</addrLine>
									<postCode>20018</postCode>
									<settlement>Donostia</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Itziar</forename><surname>Gonzalez-Dios</surname></persName>
							<email>itziar.gonzalezd@ehu.eus</email>
							<affiliation key="aff1">
								<orgName type="laboratory">Ixa Group</orgName>
								<orgName type="institution">University of the Basque Country</orgName>
								<address>
									<addrLine>Manuel Lardizabal 1</addrLine>
									<postCode>20018</postCode>
									<settlement>Donostia</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Creating vocabulary exercises through NLP</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8B0935217728383434978465C97C8DF4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:31+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Natural Language Processing</term>
					<term>Text transformation</term>
					<term>Vocabulary learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The use of technologies in Humanities opens new research opportunities as it allows the access to vast amounts of data such as textual corpora. As, in the Digital Humanities domain, a considerable amount of the research is done on digitised corpora, Natural Language Processing tools can be of much help in their exploitation for they help extracting linguistic information. We present a series of experiments in which we propose text transformations to generate vocabulary learning exercises based on Natural Language Processing. We describe the corpus, databases and tools we have employed in our approach and we offer an overview of a multilingual language processing pipeline. Then we present the experiments and their output. We finally discuss the strengths and shortcomings of our approach.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The increasing use of Information and Communication Technologies has opened new research possibilities and the growing amount of digital data and the development of processing tools have changed the paradigms of many research fields. In the case of Humanities, the so-called Digital Humanities (DH) aim at exploiting the vast amounts of digitised corpora with the help of the Natural Language Processing (NLP) tools among others. In fact, DH and NLP can be considered closely related fields as Humanities are often based on textual data and language knowledge. NLP comprehends a wide range of research interests and approaches that can be useful in DH in the NLP's aim of providing computers with human language knowledge.</p><p>Since the 60's, NLP's challenges have mainly been Speech Recognition, Natural Language Understanding and Natural Language Generation and some of its main purposes are spell checking, parsing, machine translation, information retrieval, question answering. However, it also addresses some more advanced applications such as assistance for human creativity and uses in education. Our work can be placed in those advanced uses of NLP.</p><p>In what concerns enhancing human creativity, the so-called computational creativity is a notably prominent field. Computational creativity aims at modelling, simulating or replicating human creativity using computational models and, hence, it can help to enrich human productiveness providing suggestions, as it is the case of automatic poetry generation <ref type="bibr" target="#b15">[16]</ref>. For example, Agirrezabal et al. <ref type="bibr" target="#b3">[4]</ref> describe word substitutions based on Part-of-Speech and meaning in order to create new poems form existing ones preserving the coherence of the texts. Although that approach is focused on poetry, the substitutions presented can be of much help in other tasks dealing with vocabulary substitution.</p><p>Regarding the educational usage, one should consider that NLP has also largely been used for exercise generation and for student assessment <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b17">18]</ref>. In fact, automatising exercise generation can be considered a hybridisation of both computational creativity and educational use of NLP, as automatic exercise generation may offer suitable options where human creativity struggles -it can save time to teachers and textbooks designers or can be useful for self-learning.</p><p>Our experimentation is centred, thus, on reusing NLP tools for computational creativity and educational purposes. In this paper we present a proposal of a set of NLP resources and tools for text adaptations to be used in the areas of language teaching and inclusion. More precisely, we focus on text transformations that can be used in the language acquisition field e.g. creating vocabulary exercises.</p><p>We strongly aim at the reusability of our proposals and, hence, all the resources and tools used in our work (corpora, databases and tools) are freely available. Moreover, we provide the code we have created <ref type="foot" target="#foot_0">4</ref> under the Creative Commons licence, international version (4.0) and NonCommercial attribution (CC 4.0 BY-NC).</p><p>This paper is structured as follows: in Sections 2 and 3 we describe the resources and tools we use in our experiments; in Section 4 we present the texts transformations we propose and in Section 5 we discuss certain issues arisen from our experiments. Finally, we conclude and outline the future work in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Linguistic Resources: Corpus and Databases</head><p>Our work is prominently based on text and, precisely, our two main resources are a narrative corpus collected by us (Section 2.1) and the WordNet <ref type="bibr" target="#b13">[14]</ref> database (Section 2.2). In addition, we have also taken advantage of the ImageNet <ref type="bibr" target="#b12">[13]</ref> database, in which images are related to concepts and mapped according to those (Section 2.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Fairy tales Corpus</head><p>In our experiment, we have opted for children's literature to build our corpus; more precisely fairy tales we have extracted from Project Gutenberg 5 and Wikipedia. Project Gutenberg offers over 57,000 freely available e-books in 67 languages. In the case of Wikipedia, each tale is indexed as an individual Wikipedia entry which offers the background of the story as well a version of the tale. The reasons for opting for well-known fairy tales are the following:</p><p>-Folktales have been widely employed in education <ref type="bibr" target="#b18">[19]</ref>.</p><p>-They are optimal for language learning for they are commonly told in easy language and are widely known <ref type="bibr" target="#b18">[19]</ref>. -There is a wide range of fairy tales freely available.</p><p>-Versions of those tales can be easily found for a myriad of languages and multilingual approaches can be easily implemented. -Text from Project Gutenberg and Wikipedia can be easily obtained as plain unformatted texts, which simplifies the textual preprocessing stage.</p><p>In order to give an idea of the size of the corpus we have compiled, we have listed the token amounts for each tale and language in Table <ref type="table" target="#tab_0">1</ref>. As can be noticed, we have tried to gather texts of similar length in order to achieve comparable results in different languages. In this paper, we will illustrate our work through selected passages from the Little Red Riding Hood tale by the Grimm brothers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">WordNet</head><p>WordNet<ref type="foot" target="#foot_2">6</ref> is a large lexical database of English <ref type="bibr" target="#b13">[14]</ref> where nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets). For example, the words car, auto, automobile, machine, and motorcar are grouped in a synset which denotes the concept a motor vehicle with four wheels; usually propelled by an internal combustion engine. Moreover, the synsets are related among them. The most important semantic relations are hypernymy-hyponymy, meronymy, troponymy (for verbs) and antonymy. In Figure <ref type="figure" target="#fig_0">1</ref> we present the synset car, auto, automobile, machine, motorcar and its related words. However, it should be taken into account that a word may have more than one meaning. When entering a query for a certain word we find all the word's senses listed. E.g. if we look for the noun hood, we may find these three senses among others: i) a headdress that protects the head and face ii) protective covering consisting of a metal part that covers the engine and iii) the folding roof of a carriage. Hence, it is plausible for a word to appear in more than one synset depending on which sense is considered.</p><p>Following the English WordNet philosophy, WordNets for many languages have been developed. For example, during the EuroWordNet (EWN) project <ref type="foot" target="#foot_3">7</ref> WordNets of several European languages (English, Dutch, Italian, Spanish, German, French, Czech and Estonian) were created. Nevertheless, these WordNets for different languages are not isolated databases. The Inter-Lingual-Index (ILI) was created to provide an efficient mapping across the autonomous WordNets. Via this index, languages are interconnected so that it is possible to go from the words in one language to the equivalent words in any other language listed.</p><p>The list of available WordNets for different languages has been increasing since then. For example, the Open Multilingual WordNet <ref type="bibr" target="#b8">[9]</ref> (OMW)<ref type="foot" target="#foot_4">8</ref> -a product of the Global WordNet Association<ref type="foot" target="#foot_5">9</ref> -provides access to open WordNets in over 150 languages, all linked to the English WordNet.</p><p>In this work, we have chosen WordNet for our experimentation because i) it is a well-known multilingual and ii) a freely available resource and iii) it has been used in many NLP applications. Specifically, we have used the OMW that is included in NLTK <ref type="bibr" target="#b7">[8]</ref>, which is a suite of Python libraries and software for symbolic and statistical NLP, and we have used it to substitute words (the lemmas, exactly) with semantically related concepts or with its equivalents in other languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">ImageNet</head><p>ImageNet 10 [13] is a large-scale image database arranged according to the hierarchy in WordNet. It contains images for nouns and each node of the hierarchy is represented by thousands of images. That is to say, nouns in the English WordNet get images that represent them. More precisely, the ImageNet project aims at offering 500-1000 images for each synset. As WordNet, ImageNet is freely available and ready to use.</p><p>Although ImageNet was initially developed for visual information processing and tasks such as non-parametric object recognition, tree-based image classification and automatic object localization, it has been proven very useful in our experiment. As a matter of fact, in this work we have employed ImageNet for the substitution of nouns in texts for images. By means of that, we have been able to create texts with images similar to texts with pictograms. We further describe that approach in Section 4.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Preprocessing with NLP tools</head><p>Textual processing has been done by some existing off-the-shelf NLP tools. The required processing for most of the languages (Basque, Dutch, English, French, German, Italian and Spanish) has been done through Ixa-pipes<ref type="foot" target="#foot_7">11</ref>  <ref type="bibr" target="#b0">[1]</ref>, whereas the processing for the rest of languages (Galician, Catalan and Portuguese) has been conducted through FreeLing<ref type="foot" target="#foot_8">12</ref>  <ref type="bibr" target="#b10">[11]</ref>. Although we have employed two different processing pipelines, the steps are comparable. As a consequence, we will describe the processing modules relevant to this work based on Ixa-pipes' performance.</p><p>Ixa-pipes is a modular chain of NLP tools (or pipes) which provide easy access to NLP technology for several languages. Modular means that the different processing modules (for specific linguistic analysis tasks) can be chosen according to the needs of each experiment and that new modules can be added to address new needs. We present the processes we have carried out below:</p><p>-Tokenisation: it is the process of splitting sequences of characters into minimal meaningful units. In the tokenisation process, texts are divided into words, numbers, acronyms or punctuation marks. As can be seen in Figure <ref type="figure">2</ref>, the sentence (sent="3") has been split into tokens (wf ) and each token has been assigned an identifier (id). Tokenisation parameters are defined for each language so as to take into account the special characters and singular orthography and punctuation rules each language may have. It is also to be pointed out that punctuation and hyphenation exceptions have been taken into account as can be seen for the Red-Cap token, which has been considered a single unit despite the fact there is a hyphen involved. -Lemmatisation: it consists in removing word inflection to return the dictionary form or lemma of a word. For example, from the verb form is we obtain the lemma be after lemmatisation is done. In Ixa-pipes lemmatisation is performed by lexical look-up methods in which each word in text is checked in a dictionary. The lemma is the basis in our experimentation, since we create the transformations based on it. Consequently, an unknown or incorrect lemma can lead to errors in the following processes. -Part-of-Speech tagging: it consists in assigning a grammatical category to each of the tokens. In Ixa-pipes this is a two step procedure: first, all the possible Fig. <ref type="figure">2</ref>. Tokenisation example for Little Red Riding Hood analysis are assigned to each token and then, the most suitable one is selected. In this process both linguistic knowledge (rules) and statistical methods are combined.</p><p>In Figure <ref type="figure">3</ref>, we present the complete annotation of a segment of the Little Red Riding Hood tale in English. One may notice that each token is presented in blue, lemmas are expressed by the lemma attribute and Part-of-Speech (PoS) information is given in the pos attribute. -Word Sense Disambiguation (WSD) <ref type="bibr" target="#b1">[2]</ref>: it is a NLP task that aims to identify the sense of a word in a sentence when that word has more than one sense. For example, given the sentence We took off the hood, the goal is to assess whether the word hood refers to the headdress or to the car cover. In order to perform WSD in this work, we have used the state-of-the-art tool UKB <ref type="bibr" target="#b2">[3]</ref>, which is also integrated in Ixa-pipes, and works with English, Basque, Bulgarian, Portuguese and Spanish. As an output, UKB offers all the possible WordNet synsets (reference) with a confidence value (confidence attribute) as we show in Figure <ref type="figure">4</ref> for the word hood.</p><p>This is the preprocessing needed in order to carry out the text transformations presented in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Text Adaptations</head><p>Once textual processing has been done, we have profited from the extracted linguistic information to alter texts and generate reading and vocabulary activities automatically. In the following subsections we describe the kind of exercises we have created for helping to acquire language. In fact, frequency of words shows a lot of potential in order to sketch the information in text, as high-frequency items cover a large proportion of words in text.</p><p>Hence, they have been worthy of attention by both language teachers and learners <ref type="bibr" target="#b22">[23]</ref>. Furthermore, word clouds are a widely considered teaching resource <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b28">29]</ref>.</p><p>Taking all that into account, word clouds are convenient for a first introduction of texts as they offer a global view of the plot and can be a good tool to deal with the most relevant or specific vocabulary from a visual and playful approach. Additionally, word clouds can also be useful to compare two different texts, two authors on the same topic, namely.</p><p>In what regards the word cloud generation system, we have developed a prototype of a word cloud generator by combining the packages Matplotlib and Numpy in the Python programming language. All words from the original text are shown in the word cloud, except digits, punctuation marks and the so-called stop words (prepositions, determiners, conjunctions, etc.) that do not convey relevant information on the topic, in order to focus on meaningful words. We generate word clouds with the shape of an input image relevant to the tale we want to deal with so as to make the word clouds more appealing to the language learner.</p><p>In Figure <ref type="figure" target="#fig_2">5</ref> we present the final word cloud we have obtained from the Little Red Riding Hood tale. As can be seen, the word cloud acquires the shape of Little Red Riding Hood and the most frequent words are grandmother, little, red-cap and good. Dealing with those words is useful to start working with the vocabulary and the concepts the readers will find in the text. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Pictotales: working on vocabulary with images</head><p>Pictotales are the tales in which nouns have been replaced with images from ImageNet. This is a single-language textual transformation approach in which some lemmas have been replaced for pictures.</p><p>In order to create the Pictotales, lemmas of some of the concrete nouns have been automatically looked up in ImageNet and an image corresponding to the lemma synset has been randomly chosen. In order to combine text and images, the narratives have been converted to HyperText Mark-up Language (HTML), which allows displaying text and images in web browsers and other visual interfaces.</p><p>As can be seen in Figure <ref type="figure" target="#fig_3">, 6</ref>, we have created a Pictotale from the English version of the Little Red Riding Hood tale. As one can see, concepts such as mother, grandmother, and bottle have been replaced with some relevant images.</p><p>This kind of exercises can be useful to help to learn vocabulary, using the images as contextualized flashcards or helping to evoke the target words in a first reading of the tales. Furthermore, pictotales can also be employed for performing vocabulary revision and memory exercises, naming the elements that appear in the images, all within the context of the tale. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Story revolution: working on vocabulary with related words</head><p>In the Story Revolution experiment, we have aimed at vocabulary learning through meaning-based substitutions. That is to say, we have substituted nouns and adjectives in texts with their antonyms, hyponyms or hypernyms.</p><p>Since we work with lemmatised texts, we have used the WordNet information for those. As WordNet can be understood as a net of semantically-based relations in which concepts are arranged from the most generic to the most specific and opposition relations are also included, obtaining the antonyms, hyponyms or hypernyms of the selected lemmas in text has been a straightforward process. Figure <ref type="figure" target="#fig_4">7</ref> displays a piece of Little Red Riding Hood where some words have been replaced with their antonyms (in red) e.g. few, large, ignore.... Replacing words with their hypernyms may help on text understanding for second language learners. In fact, removing the most obscure terms and offering more generic alternatives may lead, at least, to getting the global sense of the texts. We present a sentence of Little Red Riding Hood's tale in <ref type="bibr" target="#b0">(1)</ref>. As it can be seen, some of the words have been highlighted in different colours. In example (2), we present a sentence formed by the hypernyms of those highlighted words. In this second sentence more generic vocabulary is used and the sentence could presumably be better understood by non-proficient speakers of English.</p><p>(1) When the wolf had appeased his appetite.</p><p>(2) When the canine had calmed his craving.</p><p>On the contrary, when substituting words with their hyponyms, we help enlarging the available vocabulary and learning more specific words. In example (3) we present the outcome of replacing the highlighted words in (1) with their hyponyms.</p><p>(3) When the coyote bear had appeased his stomach.</p><p>Word substitution is a common technique in lexical substitution and text simplification tasks, but generally word replacements are done with synonyms <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b26">27,</ref><ref type="bibr" target="#b14">15]</ref>. Nonetheless, hypernym and hyponym substitution can also be suitable for the task, since replacing all the words referring to the same concept for a single term makes us convey with one of the easy reading principles: "use the same term consistently for a specific thought or object" <ref type="bibr" target="#b24">[25]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Uncovering words: discovering unknown words</head><p>In this experiment we wanted to automatically create one of the typical vocabulary learning exercises. Given a text in the target language, we have substituted some words for their translations in the learner's tongue. However, we have not opted for the individual translations of the words, but we have used WordNet to assign the term, in order to guarantee the terms share the same meaning.</p><p>For example, in Figure <ref type="figure" target="#fig_5">8</ref> we have taken as a basis the English version of the Little Red Riding Hood tale and we have translated some words into Danish (in red, bedstemor, rum...). Consequently, we have obtained a traditional "fill in the gaps" exercise with mother tongue clues. Conversely, this technique can be applied in order to present the text in a language that the learner speaks and translate several words into the target language. For example, the text in Figure <ref type="figure" target="#fig_5">8</ref> could be used by English speakers learning Danish. In this way, learners find a comfortable context in which they can focus on the vocabulary. One possible exercise that can be done is encouraging the learners to make hypothesis about what a word means given its surrounding context in order to uncover the story in text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion</head><p>In this paper we have presented four text transformations that not only can help teachers to create exercises but also writers or editors to create new texts. In fact, our approach may shorten the exercise generation time and can also help the creativity of the professionals, namely when looking for a convenient translation. Nevertheless, we have to underline that these texts need a revision before they are used. In particular, we detail next three main types of shortcomings.</p><p>1. Sometimes the absence of a linguistic item prevents from addressing the transformation query to generate a variant in the text. An illustration of this can be seen in examples ( <ref type="formula">1</ref>) and ( <ref type="formula">3</ref>) in Section 4.3 where the same form (had appeased) is offered for both the target term and its hyponym (autohyponymy). According to the resources used, there is not a more specific way of referring to the event, and probably it will also be difficult for an expert to come up with one. 2. Additionally, as pictures from ImageNet are chosen randomly among all the images linked to a certain synset, the picture selected might not be the most suitable according to the context. For example, in Figure <ref type="figure" target="#fig_3">6</ref> the word bottle is represented by an empty plastic bottle, but could be better represented by a full glass bottle of wine. We reckon an optimal approach to pictotale generation would be a system that takes into account the whole narrative context for image selection. 3. Some other issues, instead, arise from processing errors. In some cases, the incorrect Part-of-Speech tag assignation may lead to the substitution of a wrong word.</p><p>In other cases, incorrect word sense disambiguation may be a source of errors. For example, in (4) we present a passage of the Little Red Hood in Spanish where some words have been translated into Danish (as when uncovering words, Section 4.4). As depicted in the example, we have found that the Spanish word chica (girl) has been substituted with the Danish word dreng (boy). This error seems due to the fact that the Spanish lemmatiser gives the masculine form chico (boy) as lemma. UKB disambiguates the masculine lemma as boy and our script relies on UKB's disambiguation to look up the Danish word.</p><p>( Moreover, in case of the verbs, as our tools substitutes lemmas, it is necessary to fix the conjugation. An example of this is the Danish verb give (give) in ( <ref type="formula" target="#formula_0">4</ref>), which should be corrected to the participle givet (given). In order to overcome this problem, natural language generation techniques that take the syntax of the target language into account should be used. Nonetheless, offering only the lemma does not invalidate the proposed exercise as guessing or generating the right verbal tense from the context is also a possible exercise.</p><p>Despite the shortcomings, this method offers great capacity for text adaptations. In this work we have applied our modifications to all the words, but that can be easily customised. Possible customisations are substituting less frequent words, longer words, complex words and keywords/keyphrases among others.</p><p>For example, regarding frequent and infrequent words, word frequency lists such as Ogden's Basic English word list or the ones that take word distributions into account <ref type="bibr" target="#b6">[7]</ref> can be employed in order to set the threshold of interest (very frequent, frequent, normal, infrequent, not frequent).</p><p>In what concerns word length, the Plain Language guidelines the use short words is recommended (a summary of guidelines can be found in <ref type="bibr" target="#b21">[22]</ref>). Shorter words can be easier to learn; however, the influence of word length does not seem to be the only factor for our memory <ref type="bibr" target="#b16">[17]</ref>.</p><p>These text adaptation processes may also be enhanced with other current NLP techniques, such as complex word identification <ref type="bibr" target="#b23">[24]</ref>. In this preprocessing step for lexical simplification, complex words and expressions are identified in order to replace them later with simpler equivalent alternatives <ref type="bibr" target="#b25">[26]</ref>. Our approach may be integrated in this step so as to replace complex words and expressions with simpler equivalent alternatives.</p><p>Besides, keyphrase detection, which deals with finding most important words in texts, is another NLP task that has been approached in the context of information extraction <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b27">28]</ref> but that can easily help the creation of text adaptations for students with special needs, by helping the readers to get the main ideas of the text.</p><p>Taking all that into account, we reckon that, although we have centred our experimentation in vocabulary learning exercise generation and that our system admits some improvements, the tools and resources proposed in our experimentation can be of much help in other DH research trends in which text transformations play a significant role. Additionally, even if we have presented our transformation proposals as isolated experiments, they can all be combined to address specific needs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion and Future Works</head><p>We have conducted a series of experiments in which we have altered texts through NLP methods in order to create resources for vocabulary learning. Exactly, text adaptations have been i) creating word clouds, ii) texts with images, iii) texts with different but semantically related works and iv) texts with translations. Our main objective was to show a possible support to teachers/educators when creating vocabulary learning or reading comprehension exercises by means of NLP applications. This approach can also be seen as a computational creativity exercise as transformations and suggestions have been automatically generated.</p><p>Our experiments offer just the first insights on what NLP can offer for textual modification. Automatic methods are far from being perfect and still need human supervision, but it is undeniable the fact that they ease the burden of coming up with suitable ideas in certain contexts. Further, taking into account that our approach is a NLP-based one, we consider that in the next steps conducting both qualitative and quantitative evaluations in real scenarios is fundamental so that to measure the actual performance of our implementations and to integrate our preliminary proposal within the scientific framework.</p><p>In the case of educational purposes, a first step on improving our work should be evaluating the text adaptations with target audiences in order to better understand their needs; e.g. which words should be adapted or what new approaches they may require. That is why we encourage the collaboration with other experts in the area of humanities and, specially, within the domain of education.</p><p>Moreover, we think our proposal can be adapted to address more education needs. We foresee the following applications: i) adapting the texts with pictograms, ii) going further than words and substituting phrases iii) creating games removing words from texts or giving their definitions in order to guess them, or iv) giving two words and guessing their relation. Creating an online text adaptation application, where each one can customise its text, is also one of our future goals.</p><p>Out of the education domain, we believe our approach may also have different uses. For example, we think it might be useful for museums or other cultural institutions for the creation of adapted or multimedia and interactive material. In the case of the artistic creation of word clouds automatically extracting the most relevant words can also be of great help. Finally, we want to reinforce the idea that using NLP methods might impulse computational creativity for new kinds artistic expressions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Main semantic relations in WordNet</figDesc><graphic coords="4,181.15,115.84,253.05,101.06" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 3 .Fig. 4 .</head><label>34</label><figDesc>Fig. 3. Lemmatisation and PoS tagging example for Little Red Riding Hood</figDesc><graphic coords="7,180.48,115.83,254.40,264.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Word cloud for Little Red Riding Hood</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 6 .</head><label>6</label><figDesc>Fig. 6. Little Red Riding Hood with images</figDesc><graphic coords="9,172.08,354.02,271.20,192.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 7 .</head><label>7</label><figDesc>Fig. 7. Passage of the English Version of Little Red Riding Hood with Antonyms</figDesc><graphic coords="10,170.20,221.59,274.95,82.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 8 .</head><label>8</label><figDesc>Fig. 8. Passage of Little Red Riding Hood in English with some words in Danish</figDesc><graphic coords="11,152.68,222.28,310.00,89.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Tokens per document and language</figDesc><table><row><cell>Language</cell><cell>Little Red</cell><cell cols="2">Hansel and Total</cell></row><row><cell></cell><cell>Riding Hood</cell><cell>Gretel</cell><cell></cell></row><row><cell>English</cell><cell>1384</cell><cell>2870</cell><cell>4254</cell></row><row><cell>Spanish</cell><cell>564</cell><cell>2637</cell><cell>3201</cell></row><row><cell>Catalan</cell><cell>563</cell><cell>647</cell><cell>1210</cell></row><row><cell>Galician</cell><cell>497</cell><cell>93</cell><cell>590</cell></row><row><cell>French</cell><cell>1432</cell><cell>2680</cell><cell>4112</cell></row><row><cell>German</cell><cell>1257</cell><cell>2663</cell><cell>3920</cell></row><row><cell>Italian</cell><cell>1199</cell><cell>1946</cell><cell>3145</cell></row><row><cell>Portuguese</cell><cell>662</cell><cell>2610</cell><cell>3272</cell></row><row><cell>Dutch</cell><cell>1311</cell><cell>2726</cell><cell>4037</cell></row><row><cell>Basque</cell><cell>274</cell><cell>174</cell><cell>448</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">https://github.com/dss2016eu/codefest/tree/master/nlp lac</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">http://www.gutenberg.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">http://wordnetweb.princeton.edu/perl/webwn</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_3">http://projects.illc.uva.nl/EuroWordNet/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_4">http://compling.hss.ntu.edu.sg/omw/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_5">http://globalwordnet.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_6">http://www.image-net.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_7">http://ixa2.si.ehu.es/ixa-pipes</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_8">http://nlp.lsi.upc.edu/freeling/node/1</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Acknowledgements</head><p>We thank Codefest summer school for providing the infrastructure to begin this work. We also thank Larraitz Uria for her contribution to the early steps of this work.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">IXA pipeline: Efficient and Ready to Use Multilingual NLP Tools</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agerri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bermúdez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rigau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC&apos;14)</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Loftsson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Ninth International Conference on Language Resources and Evaluation (LREC&apos;14)<address><addrLine>Reykjavik, Iceland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3823" to="3828" />
		</imprint>
	</monogr>
	<note>European Language Resources Association (ELRA)</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Agirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Edmonds</surname></persName>
		</author>
		<title level="m">Word sense disambiguation: Algorithms and applications</title>
				<imprint>
			<publisher>Springer Science &amp; Business Media</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">33</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD</title>
		<author>
			<persName><forename type="first">E</forename><surname>Agirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>López De Lacalle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Soroa</surname></persName>
		</author>
		<ptr target="http://aclweb.org/anthology/W18-2505" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of Workshop for NLP Open Source Software (NLP-OSS)</title>
				<meeting>Workshop for NLP Open Source Software (NLP-OSS)</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="29" to="33" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">POS-Tag Based Poetry Generation with WordNet</title>
		<author>
			<persName><forename type="first">M</forename><surname>Agirrezabal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Arrieta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Astigarraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hulden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">14th European Workshop on Natural Language Generation (ACL</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="162" to="166" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Natural Language Processing and its Use in Education</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Alhawiti</surname></persName>
		</author>
		<ptr target="http://thesai.org/Downloads/Volume5No12/Paper10-NaturalLanguageProcessing.pdf" />
	</analytic>
	<monogr>
		<title level="j">International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="72" to="76" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Using wordles to teach foreign language writing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Baralt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pennestri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Selvandin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Learning &amp; Technology</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="12" to="22" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Variation in word frequency distributions: Definitions, measures and implications for a corpus-based language typology</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bentz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Alikaniotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Samardžić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Buttery</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Quantitative Linguistics</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="128" to="162" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">NLTK: the Natural Language Toolkit</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the COLING/ACL on Interactive presentation sessions</title>
				<meeting>the COLING/ACL on Interactive presentation sessions</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="69" to="72" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Linking and Extending an Open Multilingual Wordnet</title>
		<author>
			<persName><forename type="first">F</forename><surname>Bond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Foster</surname></persName>
		</author>
		<ptr target="http://aclweb.org/anthology/P13-1133" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 51st Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1352" to="1362" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Drndarevic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2012: Technical Papers</title>
				<meeting>COLING 2012: Technical Papers</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="357" to="374" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">FreeLing: An Open-Source Suite of Language Analyzers</title>
		<author>
			<persName><forename type="first">X</forename><surname>Carreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Chao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Padró</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Padró</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC&apos;04)</title>
				<meeting>the 4th International Conference on Language Resources and Evaluation (LREC&apos;04)</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="239" to="242" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">eVoc Strategies: 10 Ways to Use Technology to Build Vocabulary</title>
		<author>
			<persName><forename type="first">B</forename><surname>Dalton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Grisham</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The reading teacher</title>
		<imprint>
			<biblScope unit="volume">64</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="306" to="317" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Imagenet: A Large-scale Hierarchical Image Database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2009.5206848</idno>
		<ptr target="https://doi.org/10.1109/CVPR.2009.5206848" />
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="248" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Wordnet: An Electronic Lexical Database</title>
		<author>
			<persName><forename type="first">C</forename><surname>Fellbaum</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>MIT Press Cambridge</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">An Adaptable Lexical Simplification Architecture for Major Ibero-Romance Languages</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ferrés</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">G</forename><surname>Guinovart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems</title>
				<meeting>the First Workshop on Building Linguistically Generalizable NLP Systems</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="40" to="47" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A Survey on Intelligent Poetry Generation: Languages, Features, Techniques, Reutilisation and Evaluation</title>
		<author>
			<persName><forename type="first">Gonçalo</forename><surname>Oliveira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
		<ptr target="http://aclweb.org/anthology/W17-3502" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Conference on Natural Language Generation</title>
				<meeting>the 10th International Conference on Natural Language Generation</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="11" to="20" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Word Length, Set Size, and Lexical Factors: what Causes the Word Length Effect</title>
		<author>
			<persName><forename type="first">D</forename><surname>Guitard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Gabel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Saint-Aubin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Surprenant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Neath</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Experimental Psychology: Learning, Memory, and Cognition</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page">1824</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Natural Language Processing for Enhancing Teaching and Learning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Litman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)</title>
				<meeting>the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="4170" to="4176" />
		</imprint>
	</monogr>
	<note>Association for the Advancement of Artificial Intelligence</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Using Folktales for Language Teaching</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The English Teacher</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="74" to="83" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings</title>
		<author>
			<persName><forename type="first">D</forename><surname>Mahata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kuriakose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zimmermann</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N18-2100</idno>
		<ptr target="http://aclweb.org/anthology/N18-2100" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="634" to="639" />
		</imprint>
	</monogr>
	<note>Short Papers</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Using Wordle as a supplementary research tool</title>
		<author>
			<persName><forename type="first">C</forename><surname>Mcnaught</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The qualitative report</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="630" to="643" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">The fewer, the better? A Contrastive Study about Ways to Simplify</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mitkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Štajner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Automatic Text Simplification-Methods and Applications in the Multilingual Society</title>
				<meeting>the Workshop on Automatic Text Simplification-Methods and Applications in the Multilingual Society<address><addrLine>ATS-MA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="30" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Learning vocabulary in another language</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">S</forename><surname>Nation</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>Ernst Klett Sprachen</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Semeval 2016 Task 11: Complex Word Identification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Workshop on Semantic Evaluation</title>
				<meeting>the 10th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="560" to="569" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><surname>Plainlanguage</surname></persName>
		</author>
		<ptr target="https://plainlanguage.gov/media/FederalPLGuidelines.pdf" />
		<title level="m">gov: Federal Plain Language Guidelines</title>
				<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline</title>
		<author>
			<persName><forename type="first">M</forename><surname>Shardlow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC</title>
				<meeting>the Ninth International Conference on Language Resources and Evaluation (LREC</meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="1583" to="1590" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Semeval-2012 task 1: English Lexical Simplification</title>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Jauhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mihalcea</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task</title>
				<meeting>the First Joint Conference on Lexical and Computational Semantics-Volume 1: the main conference and the shared task</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="347" to="355" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Using Word Embeddings to Enhance Keyword Identification for Scientific Publications</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mcdonald</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Australasian Database Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="257" to="268" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Wordle: A method for analysing MBA student induction experience</title>
		<author>
			<persName><forename type="first">W</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">L</forename><surname>Parkes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Davies</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The International Journal of Management Education</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="44" to="53" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
