<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Hulat -ALexS CWI Task -CWI for Language and Learning Disabilities Applied to University Educational Texts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rodrigo</forename><surname>Alarcon</surname></persName>
							<email>ralarcon@inf.uc3m.es</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">Universidad Carlos III de Madrid</orgName>
								<address>
									<settlement>Leganés, Madrid</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lourdes</forename><surname>Moreno</surname></persName>
							<email>lmoreno@inf.uc3m.es</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">Universidad Carlos III de Madrid</orgName>
								<address>
									<settlement>Leganés, Madrid</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paloma</forename><surname>Martínez</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">Universidad Carlos III de Madrid</orgName>
								<address>
									<settlement>Leganés, Madrid</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Hulat -ALexS CWI Task -CWI for Language and Learning Disabilities Applied to University Educational Texts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B17A3DEDC4BEEBCCD07458F379717833</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:19+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Lexical simplification</term>
					<term>CWI</term>
					<term>Easy to read</term>
					<term>BERT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The number of citizens who face difficulties in reading and understanding written texts is growing. One of the possible cognitive accessibility barriers for cognitive, language and learning disabilities is when the texts contain unusual words. In this sense, there are a range of techniques that can be used to deal with this issue. Complex Word Identification (CWI), which aims to identify unusual words for a target audience, is one such technique. In this paper, a supervised architecture is described for the identification of complex words in university educational texts provided by the ALexS workshop. This architecture is composed of a Linear SVM with context-aware embedding features, provided by a BERT model. Moreover, easyto-read and plain language resources were used. Our system participated in the ALexS CWI task, obtaining the second-best recall mark of 67%. However, low precision was due to, according to the analysis performed, having been trained with resources aimed at improving cognitive accessibility regardless of the domain. The results indicate that the level of readability and understanding is more demanding in informative fields, such as Wikipedia pages, than in the specific domain of university educational texts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the current era of information technology, information is abundant (education, news, social, health, government, etc.) for individuals. However, this information is not accessible to all people. Certain individuals face accessibility barriers when reading texts that contain long sentences, unusual words, complex linguistic structures, etc. Although people with intellectual and learning disabilities are most directly affected, cognitive accessibility barriers affect other user groups such as the deaf, deafblind, elderly, illiterate and immigrants with a different native language <ref type="bibr" target="#b0">[1]</ref>  <ref type="bibr" target="#b1">[2]</ref>. People with reading disabilities can be found even among highly-educated users with specialized knowledge of the subject matter, such as university students. It may be possible to accommodate these users by making texts more readable.</p><p>In order to provide universal access to information and make texts more accessible, certain resources exist which provide helpful documentation, such as Easy-to-Read and plain language guidelines <ref type="bibr" target="#b2">[3]</ref>. However, systematic compliance with these guidelines is complicated, and simplification processes, thus, become essential. Simplified versions are normally created manually. Manual simplification of written documents is quite expensive, particularly considering that information is continually being produced.</p><p>As a solution, Natural Language Processing (NLP) methods, such as text simplification, have been developed to provide systematic support and promote compliance with these cognitive accessibility guidelines, improving the readability and understandability of texts. There is a myriad of approaches to accomplish this goal, one being Complex Word Identification (CWI), which aims to identify words that are perceived as difficult for a given target audience.</p><p>Considering this, a supervised CWI approach in the ALexS workshop is proposed in this paper which aims to identify complex words in university educational texts. The remainder of the paper is organized as follows. Section 2 briefly describes our training/test dataset and the ALexS dataset. In section 3, our system is described. Section 4 presents the task results obtained by our system, both with regards to the training/test stage and in the ALexS task. Finally, Section 5 offers conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Datasets</head><p>In order to follow a supervised approach, annotated data are necessary to identify whether a word is complex or simple. Therefore, our system was trained and tested with the following dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Training/Test data</head><p>The data used was the annotated corpus of Spanish Wikipedia pages proposed in the BEA Workshop 2018 for the Complex Word Identification (CWI) (google.com/view/cwisharedtask2018) task. As shown in Table <ref type="table" target="#tab_0">1</ref>, 17603 instances were annotated by 54 Spanish speakers, most of whom were native <ref type="bibr" target="#b3">[4]</ref>. Each instance contains a target uniword/multiword which is selected by annotators. Said target is marked as complex if at least one annotator designates it as complex. Moreover, each instance is represented by 11 columns which provide a range of different information. The dataset contains information for binary and probabilistic subtasks. For the development of this system, we focus on the binary classification subtask and we use the following information:</p><p>• The Second Column shows the actual sentence where a complex phrase annotation exists.</p><p>• The Third Column shows the start of the target word in the sentence.</p><p>• The Fourth Column shows the end of the target word in the sentence.</p><p>• The Fifth Column shows the target word.</p><p>• The Tenth Column shows the gold-standard label for the binary task (0: simple and 1: complex).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">ALeXs Dataset</head><p>As shown in Table <ref type="table" target="#tab_1">2</ref>, the VYTEDU-CW corpus provided by the ALexS workshop (alexs-sepln-2020.org/) consists of 55 text files containing the video transcripts of classes given at the University of Guayaquil (Ecuador), resulting in a corpus of more than 68000 words, with more than 1200 words per transcription on average and 723 words which were designated as complex. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods and system description</head><p>A supervised approach was proposed which aimed to identify complex words in educational texts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Pre-processing</head><p>The VYTEDU-CW corpus texts described were pre-processed following a series of steps. First, the texts were split into sentences and tokens using Spacy (www.spacy.io/), an opensource library that provides support for texts in different languages, including Spanish. Finally, these tokens are filtered according to the following POS tags:</p><formula xml:id="formula_0">• ADJ: Adjective • ADV: Adverb • NOUN: Noun • PROPN: Proper noun</formula><p>The filtered text was then converted into the same format as that used during the training stage, preparing it for the next step of the process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Supervised classification approach</head><p>To process the text from the previous stage, a supervised approach was followed by training an SVM algorithm due to its successful performance in text classification tasks. Moreover, SVM was also one of the most used algorithms for this task in SemEval2016 <ref type="bibr" target="#b4">[5]</ref>. Specifically, a Linear SVC was chosen as it is much faster <ref type="bibr" target="#b5">[6]</ref>, takes advantage of the fact that SVM has shown good performance in classifying sparse instances <ref type="bibr" target="#b6">[7]</ref> and, finally, had better results than previous tests carried out with a different type of kernel <ref type="bibr" target="#b7">[8]</ref>.</p><p>Using the dataset described in section 3 and in order to train the algorithm, each word (instance) needed to be represented as a set of features to help distinguish between complex and simple words. The proposed features used are described below:</p><p>• Length feature: word length • Boolean feature: if a word is composed of capital letters • E2R feature: a new feature established by creating an Easy-to-Read (E2R) dictionary. • Word2vec feature: pre-trained Word2Vec model vectors.</p><p>• BERT feature: Pytorch pre-trained BERT model vectors.</p><p>In relation to the E2R feature, we proposed a new feature by creating an E2R dictionary that follows E2R guidelines. The goal of this feature was to optimize the detection of simple words. If a target word exists in the E2R dictionary, it receives a 0, otherwise is marked with a 1. The dictionary is fed from different sources that provide E2R texts drafted by experts with the support of the "Plena Inclusion" organization (/www.plenainclusion.org/). Some of these sources were: the Noticias fácil news page (www.noticiasfacil.es/) and the Easy Reading Association (www.lecturafacil.net/es/). Subsequently, this text was "cleaned" in order to preserve only the content words (noun, verbs, adjectives, adverbs). Currently, this dictionary contains 13400 simple words.</p><p>In the Word2vec feature, supported by the genism library, vectors were extracted for each word from a 300-dimension Word2vec model trained on the Spanish Billion Words Corpus <ref type="bibr" target="#b8">[9]</ref>.</p><p>The BERT feature operated in the following manner. Vectors were extracted for each word from a BERT (Bidirectional Encoder Representations from Transformers)</p><p>[10] model (www.github.com/shehzaadzd/pytorch-pretrained-BERT). In order to do this, a 12-layer multilingual BERT pre-trained model was used first before word vectors were extracted by adding the last four layers and using the first 480 dimensions of the model. To do this first we use the stored hidden states of the model that has four dimensions: the layer number, the batch number (one sentence per instance), the token number and the feature number (768 features). Later, our word vectors for each word of the sentence are created by summing the last four layers. These layers are selected because they've shown better results in our tests and it can show different results depending on the task.</p><p>These embeddings are useful for semantic searches and information retrieval. The main difference between this type of embedding and others, such as Word2Vec or FastText, is that BERT produces word representations that are dynamically informed by the words around them, whereas Word2Vec the words are represented as unique indexed values. In the common word embedding models, each word is represented with one single vector, ignoring polysemy words. In a sense, with word embedding, each word could have several vectors, one for each of its possible meanings. Therefore, these models allow us to deal with the task of word disambiguation when we identify complex words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>Due to the fact that a validation dataset from the workshop was not given, the BEA's workshop test dataset was used to validate our features and make adjustments. Table <ref type="table" target="#tab_2">3</ref> shows the results obtained as regards the Train and Train+Developer datasets, which were validated with the test dataset. The results in both cases outperformed the results obtained by other systems from the abovementioned workshop <ref type="bibr" target="#b10">[11]</ref>[12][13] <ref type="bibr" target="#b13">[14]</ref>. Subsequently, a model was trained with the Train+Developer+Test to process the ALexS dataset, which, when evaluated with the test dataset, obtained a result of 0.81. Additionally, to complement previous information, Table <ref type="table" target="#tab_3">4</ref> shows the scores of some combinations between these features, helping us determine which features are more discriminatory. One of the best scores are reached with the help of vectors of the embedding models. Using Word2Vec and BERT models, a F1-score of 0.752 is obtained. Also, evaluating F1-scores independently for each feature, BERT feature shows a F1-score of 0.727, being the best score between all independent features. Likewise, the W2V feature yields a score of 0.70, proving to be a valuable resource for this task. Table <ref type="table" target="#tab_4">5</ref> shows the results of the CWI task in ALexS workshop. Although it gives a lower precision (with a score of 0.9), the system received the second-highest rank in Recall (with a score of 0.67), obtaining good coverage on the detection of words. The generalization of the system seems to be good. However, it needs to improve on specific domains when dealing with technical words. In addition to the previous information, Table <ref type="table" target="#tab_5">6</ref> confirms the previously described information and shows the results on some of the texts of the VYTEDU-CW corpus. For example, in the text of video 41, the system showed good recall on the task by predicting all the complex words. However, at the same time, it presented several false positives due to the generalization issue.</p><p>To illustrate this, take, for example, the word "biodiversidad" (biodiversity) that would be labeled as complex in a generic domain. Nevertheless, this same word, in a university educational domain, would be labeled as simple. This can be confirmed by comparing these annotations with the dataset from the BEA Workshop comprised of annotated Wikipedia pages which contain generic content. In this instance, the word "investigaciones" (investigations) is labeled as simple in the ALexS dataset but complex in the BEA dataset. There are many examples such as this in which the reason why the system shows good recall, but low precision is demonstrated.</p><p>Based on the outcomes, it can be seen that the university educational texts are not easily readable for university students with language and learning disabilities. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and future work</head><p>The main objective of this work is to improve cognitive accessibility by increasing the understanding and readability of texts. In order to accomplish this objective, a supervised algorithm that uses a more refined, context-aware embedding model and Easy-to-Read resources was trained. The experiments showed that the combinations of these features with a Linear SVM outperforms previous systems. However, it also presented difficulties when dealing with specific domains with less of a demand for readability, such as educational texts at a university level. To improve the precision of our system and a obtain a better result in the classification, university educational domain resources should be used. BERT and Word2Vec models with university educational texts can be trained. Additionally, students with language and learning disabilities should not be considered as the target audience however, at the university there are students with language and learning disabilities.</p><p>Regarding the approach followed in our complex word detection system, one of the main contributions of this research work has been the use of BERT embeddings in the prediction. For future work, we plan to explore more features of BERT models. With the extracted vectors, we can evaluate the cosine distance between the target word and the surroundings in the sentence. By giving this additional information, provided by more detailed embedding, a better score in the CWI task can be accomplished. At the same time, we can evaluate the synergy between a wider variety of embeddings, such as Sense2Vec <ref type="bibr" target="#b14">[15]</ref> and Char2Vec.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Spanish</figDesc><table><row><cell>CWI datasets distribution</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2"># Instances</cell><cell># Complex</cell><cell># Simple</cell><cell># Uniwords</cell></row><row><cell>Training set</cell><cell>13748</cell><cell>5455</cell><cell>8293</cell><cell>11931</cell></row><row><cell>Development set</cell><cell>1622</cell><cell>653</cell><cell>969</cell><cell>1408</cell></row><row><cell>Test set</cell><cell>2233</cell><cell>907</cell><cell>1326</cell><cell>1955</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Spanish CWI datasets distribution</figDesc><table><row><cell></cell><cell>Number of words</cell><cell>Number of Paragraphs</cell></row><row><cell>Min</cell><cell>465</cell><cell>5</cell></row><row><cell>Max</cell><cell>2646</cell><cell>18</cell></row><row><cell>Average</cell><cell>1241</cell><cell>907</cell></row><row><cell>Total</cell><cell>68248</cell><cell>613</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Results for the test dataset</figDesc><table><row><cell></cell><cell>Accuracy</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 Score</cell></row><row><cell>TRAIN</cell><cell>0.80</cell><cell>0.79</cell><cell>0.78</cell><cell>0.792</cell></row><row><cell>TRAIN+DEV</cell><cell>0.80</cell><cell>0.80</cell><cell>0.79</cell><cell>0.794</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Results for the test dataset</figDesc><table><row><cell></cell><cell>Accuracy</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 Score</cell></row><row><cell>L+BT</cell><cell>0.79</cell><cell>0.78</cell><cell>0.77</cell><cell>0.778</cell></row><row><cell>L+B+BT</cell><cell>0.79</cell><cell>0.79</cell><cell>0.78</cell><cell>0.783</cell></row><row><cell>L+B+E+BT</cell><cell>0.80</cell><cell>0.80</cell><cell>0.78</cell><cell>0.787</cell></row><row><cell>L+B+E+W+BT</cell><cell>0.80</cell><cell>0.80</cell><cell>0.79</cell><cell>0.794</cell></row><row><cell>W+BT</cell><cell>0.77</cell><cell>0.76</cell><cell>0.75</cell><cell>0.752</cell></row><row><cell>L</cell><cell>0.73</cell><cell>0.74</cell><cell>0.70</cell><cell>0.702</cell></row><row><cell>BT</cell><cell>0.74</cell><cell>0.74</cell><cell>0.72</cell><cell>0.727</cell></row><row><cell>W</cell><cell>0.72</cell><cell>0.71</cell><cell>0.70</cell><cell>0.700</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Results on ALexS task</figDesc><table><row><cell>Participants</cell><cell>Accuracy</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 Score</cell></row><row><cell>Antonio Rico -Method 1</cell><cell>0.98</cell><cell>0.33</cell><cell>0.22</cell><cell>0.26</cell></row><row><cell>Antonio Rico -Method 2</cell><cell>0.98</cell><cell>0.34</cell><cell>0.23</cell><cell>0.27</cell></row><row><cell>Antonio Rico -Method 3</cell><cell>0.98</cell><cell>0.33</cell><cell>0.22</cell><cell>0.26</cell></row><row><cell>Elena Zotova -Method 1</cell><cell>0.91</cell><cell>0.10</cell><cell>0.60</cell><cell>0.17</cell></row><row><cell>Elena Zotova -Method 2</cell><cell>0.89</cell><cell>0.09</cell><cell>0.69</cell><cell>0.16</cell></row><row><cell>Elena Zotova -Method 3</cell><cell>0.91</cell><cell>0.10</cell><cell>0.59</cell><cell>0.17</cell></row><row><cell>George Zaharia</cell><cell>0.91</cell><cell>0.02</cell><cell>0.08</cell><cell>0.03</cell></row><row><cell>(*) Rodrigo Alarcón (HULAT)</cell><cell>0.90</cell><cell>0.09</cell><cell>0.67</cell><cell>0.16</cell></row><row><cell>AlexS 2020 Organizers</cell><cell>0.92</cell><cell>0.12</cell><cell>0.66</cell><cell>0.20</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 .</head><label>6</label><figDesc>Specific results on the ALexS task (Participant: Rodrigo Alarcón)</figDesc><table><row><cell></cell><cell>Accuracy</cell><cell>Precision</cell><cell>Recall</cell><cell>F1 Score</cell></row><row><cell>Video 5</cell><cell>0.92</cell><cell>0.56</cell><cell>0.43</cell><cell>0.49</cell></row><row><cell>Video 41</cell><cell>0.92</cell><cell>0.05</cell><cell>1</cell><cell>0.10</cell></row><row><cell>Video 43</cell><cell>0.86</cell><cell>0.02</cell><cell>1</cell><cell>0.04</cell></row><row><cell>Video 48</cell><cell>0.97</cell><cell>0.04</cell><cell>1</cell><cell>0.08</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgements</head><p>This work was supported by the Research Program of the Ministry of Economy and Competitiveness-Government of Spain, (DeepEMR project TIN2017-87548-C2-1-R)</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An adaptable lexical simplification architecture for major ibero-romance languages</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ferrés</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">G</forename><surname>Guinovart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the first workshop on building linguistically generalizable NLP systems</title>
				<meeting>the first workshop on building linguistically generalizable NLP systems</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="40" to="47" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines</title>
		<author>
			<persName><forename type="first">L</forename><surname>Moreno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martínez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Revert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. XVI Int. Conf. Hum. Comput. Interact. -Interacción &apos;15</title>
				<meeting>XVI Int. Conf. Hum. Comput. Interact. -Interacción &apos;15</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Lexical simplification approach to support the accessibility guidelines</title>
		<author>
			<persName><forename type="first">L</forename><surname>Moreno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Alarcon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martínez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the XX International Conference on Human Computer Interaction (Interacción &apos;19)</title>
				<meeting>the XX International Conference on Human Computer Interaction (Interacción &apos;19)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Multilingual and Cross Lingual Complex Word Identification</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Yimam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Stajner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riedl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Biemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Recent Adv. Nat. Lang. Process</title>
		<imprint>
			<biblScope unit="page" from="813" to="822" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">SemEval 2016 Task 11: Complex Word Identification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 10th Int. Work. Semant. Eval</title>
				<meeting>10th Int. Work. Semant. Eval</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="560" to="569" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Predicting of anaphylaxis in big data EMR by exploring machine learning approaches</title>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Colón-Ruíz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Á</forename><surname>Tejedor-Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Moro-Moro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Biomed. Inform</title>
		<imprint>
			<biblScope unit="volume">87</biblScope>
			<biblScope unit="page" from="50" to="59" />
			<date type="published" when="2018-01">January. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">The Perceptron algorithm versus Winnow : linear versus logarithmic mistake bounds when few input variables are relevant</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kivinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Warmuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Auerc</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="page" from="325" to="343" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Lexical simplification approach using easy-to-read resources</title>
		<author>
			<persName><forename type="first">R</forename><surname>Alarcon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Moreno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Segura-Bedmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Martínez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proces. del Leng. Nat</title>
		<imprint>
			<biblScope unit="page" from="95" to="102" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Spanish {B}illion {W}ords {C}orpus and {E}mbeddings</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cardellino</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Kenton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kristina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mlm</title>
		<imprint>
			<date type="published" when="1953">1953</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">A Report on the Complex Word Identification Shared Task</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Yimam</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Deep Learning Architecture for Complex Word Identification</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">De</forename><surname>Hertog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">U</forename><surname>Leuven</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="328" to="334" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">SB @ GU at the Complex Word Identification 2018 Shared Task</title>
		<author>
			<persName><forename type="first">D</forename><surname>Alfter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="315" to="321" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Complex Word Identification Based on Frequency in a Learner Corpus</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kajiwara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Komachi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="195" to="199" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">sense2vec -A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings</title>
		<author>
			<persName><forename type="first">A</forename><surname>Trask</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Michalak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
