<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Assembly Models for SimpleText Task 2: Results from Wuhan University Research Group</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jianfei</forename><surname>Huang</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Information Management</orgName>
								<orgName type="institution">Wuhan University</orgName>
								<address>
									<addrLine>Bayi Rd 299</addrLine>
									<postCode>430072</postCode>
									<settlement>Wuhan</settlement>
									<region>Hubei</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jin</forename><surname>Mao</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Center for Studies of Information Resources</orgName>
								<orgName type="institution">Wuhan University</orgName>
								<address>
									<addrLine>Bayi Rd 299</addrLine>
									<postCode>430072</postCode>
									<settlement>Wuhan</settlement>
									<region>Hubei</region>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Evaluation Forum</orgName>
								<address>
									<addrLine>September 5-8</addrLine>
									<postCode>2022</postCode>
									<settlement>Bologna</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Assembly Models for SimpleText Task 2: Results from Wuhan University Research Group</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">F9A02AE8AAC6879FA4DAFB4D55C94795</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>term recognition</term>
					<term>lexical features</term>
					<term>syntactic features</term>
					<term>semantic features</term>
					<term>text complexity</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The goal of SimpleText Task 2 is to sort and rank complex terms that are required to be explained, given a passage and a query. To this end, our group applied a pipeline of term recognition and complexity evaluation. Candidate terms are extracted and evaluated according to their similarity with the query and a few rules. We formulate the evaluation of complexity as a classification task. We compile three groups of features for terms, including lexical, syntactic, and semantic features, then, ensemble machine learning models that adopt a soft voting strategy are applied to classify the complexity of the terms. Results of cross-validation on the training set are reported. Potential further improvements about the approach in future are discussed as well.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>SimpleText Task 2 involves identifying what term is unclear and ranking terms that are crucial for readers to understand scientific text, given a passage and a query. In fact, for ranking terms that bother readers without prior domain knowledge, we need to know which terms should be extracted and explained. Further, evaluating term complexity could be a prior step for text simplification according to Shardlow's proposed approaches <ref type="bibr" target="#b0">[1]</ref>, as what to do in SimpleText Task 3.</p><p>Readers who do not understand the background of news articles often need to start with some technical terms. A term may consist of one or many words. It could be a strange word, an uncommon abbreviation, or a phrase. Apparently, a complex term cannot be understood just by its counts in some specific corpus. Its meaning relies on many features and differs according to context. To remove such understanding barriers, the goal of SimpleText Task 2 is to decide which terms need explanation in a passage concerning a query and to rank them by three-level scores and five-level scores <ref type="bibr" target="#b1">[2]</ref>. The task can be divided into two subtasks concerning all the above factors. One is extracting complex terms from a combination of passage and query. The other is evaluating complexity by considering valid influencing factors as much as possible.</p><p>In this paper, we extract key phrases and words based on similarity measures and rules, and present our submission using two ensemble models to complete the complexity classification tasks. The former considers a large set of linguistics features, such as lexical features, syntactic features, and semantic features. The latter has nothing different but adding the prediction result of the former as a feature. In section 2 we introduce previous relevant works. In section 3 we present the main points of our feature engineering. In section 4 we show the basic flow of our model. Finally, in Section 5 we have some discussions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related works</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Term Recognition</head><p>Terminology recognition methods can mainly be divided into traditional algorithms, classical machine learning, and deep learning models. Different methods have different application scenarios according to the tasks and data. Robertson explained the term-weighting function TF-IDF from a theoretical level, which was considered one of the most commonly used baselines for term recognition in information retrieval models <ref type="bibr" target="#b2">[3]</ref>. Some studies have applied PageRank to keyword extraction and achieved good performance <ref type="bibr" target="#b3">[4]</ref>. In addition, some studies focused on the clustering approach and classic machine learning classifiers, such as the Bayesian and support vector machine approaches <ref type="bibr" target="#b4">[5]</ref>. Further, many recent works turned to the black box of deep learning, like using the pre-trained models, e.g., BERT. Deep learning approaches have shown promising results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Term Complexity</head><p>Terminology complexity is closely related to the study of text complexity. In early studies, computational measures of text complexity have been restricted to some heuristic readability formulations, which mainly focus on some shallow features <ref type="bibr" target="#b5">[6]</ref>. The shallow features usually adopt traditional readability metrics by simply counting words and characters <ref type="bibr" target="#b6">[7]</ref>, such as an average number of syllables per word, an average number of words per sentence, Automated Readability Index <ref type="bibr" target="#b7">[8]</ref>, and the Flesch-Kincaid score <ref type="bibr" target="#b8">[9]</ref>. Later, some studies attempted to dig out deeper and more general features to supplement those shallow features.</p><p>In recent years, adopting machine learning or deep learning methods to complete feature learning for text complexity has become a trend. Gooding and Kochmar presented CAMB based on ensemble voting, a system that brings together 27 lexical, morphological, and psycholinguistic features <ref type="bibr" target="#b9">[10]</ref>. Although it achieved state-of-the-art results in the 2018 CWI shared task <ref type="bibr" target="#b10">[11]</ref>, it dismissed the context of the target words. In the SemEval-2021 shared task 1 <ref type="bibr" target="#b11">[12]</ref>, most studies tented to capture extensive information for the target word and its context. Morphosyntax features and pretraining embedding were applied to obtain better feature representation. The model that attained the best performance in the above task, used both token and context features derived from pre-trained models <ref type="bibr" target="#b12">[13]</ref>. However, an expanded version of the CAMB system obtained a similar performance <ref type="bibr" target="#b13">[14]</ref>. It ranks third and is less than a percentage point below the best result on lexical complexity prediction for single words, which showed some feature engineering-based models can outperform most deep learning-based counterparts. Nonetheless, combining various features and machine learning models seems to be a consensus in recent studies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Term Recognition</head><p>To get candidate terms, we first extracted keywords and phrases in the passages via Key-BERT <ref type="foot" target="#foot_0">1</ref> . A few similar algorithms can extract candidate terms, including TF-IDF, Rake, YAKE!. While,KeyBERT computes the cosine similarity of sub-phrases and passages internally, which is more in line with the task description. Then, the candidate terms were filtered by calculating the similarity scores between the terms and the query with PhraseSimilarity<ref type="foot" target="#foot_1">2</ref> . And we excluded those starting with a, an, the, or digit in the candidate terms. We also detected the capitalization of terms to extract acronyms. The terms obtained include words, compound words, phrases, etc. We then removed the punctuations and reverted the terms to lowercase except for acronyms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Feature Extraction</head><p>We designed a few lexical features, syntactic features, and semantic features for the terms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Lexical Features</head><p>These are features based on lexical information about the term:</p><p>• length: Length of the term.</p><p>• zipf frequency<ref type="foot" target="#foot_2">3</ref> : To make word frequency norms comparable, Brysbaert Marc et al provide the Zipf Scale, which is independent of corpus size <ref type="bibr" target="#b14">[15]</ref>. Zipf frequency exactly aims to return the term's frequency on a human-friendly logarithmic scale via that. • tf-idf score: We calculated tf-idf score based on PhraseFinder. PhraseFinder is a search engine for the Google Books Ngram Dataset (version 2) that features a wildcard-supporting query language and outstanding retrieval performance. • acronym: Check if all letters are uppercase. Because acronyms are often difficult to understand. • number of subwords<ref type="foot" target="#foot_3">4</ref> , syllables<ref type="foot" target="#foot_4">5</ref> , phonemes<ref type="foot" target="#foot_5">6</ref> : Morphological awareness is an understanding of how words can be broken down into smaller units <ref type="bibr" target="#b15">[16]</ref>. The number of subwords is expected as a complementary feature to the length of the term and we get it via BPEmb, which is trained on Wikipedia and using the Byte-Pair Encoding algorithm. Similarly, the other two features are well-represented in speech synthesis and are widely incorporated into measures or feature sets in other studies on lexical complexity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Syntactic Features</head><p>Complex terms may have some special syntactic roles in the sentences. We coined a few syntactic features from the syntactic structure of a term's context. We used stanza <ref type="foot" target="#foot_6">7</ref> for part-of-speech recognition and dependency parsing.</p><p>• depth of the term: It means the distance between the term and the parse tree's root.</p><p>• number of the dependencies: We count all words that depend on or are depended on by the term, as this feature. • part-of-speech: We use a 17-dimension one-hot vector to represent it, and each dimension represents one kind of part-of-speech tag. Some words have simple meanings, but when combined into phrases their meanings are elusive. Prepositional phrases, verb phrases, noun phrases, and adjective phrases have subtle differences in our understanding of the meaning of phrases. For phrases, what we do is add the vectors together, therefore we put both single words and phrases in the 17-dimensional vectors for comparison.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Semantic Features</head><p>• glove embedding<ref type="foot" target="#foot_7">8</ref> : We extract 300-dimension embeddings pre-trained on Common Crawl. Further, we use the zero vector to fill missing values and reduce the dimensions to 30 by PCA. • fasttext embedding<ref type="foot" target="#foot_8">9</ref> : Fasttext embedding is considered as an alternative semantic feature. The dimensions are reduced to 30 by PCA as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Model Design</head><p>We formulated the complexity evaluation of terms as two classification tasks with 3 classes and 5 classes respectively. For the former, we concatenate all features and get 86 dimensional vector as the input vector. We put the predicted label of the three-classification model and all features together for the latter. Considering a large number of features and the small training set, we trained a few state-of-the-art base models, including LightGBM, CatBoost, XGBoost, Random Forest, Support Vector Machine, and then assembled these models using a soft voting strategy. On the one hand, the ensemble model consists of multiple classifiers, which improves the accuracy of the classification task. On the other hand, ensemble models reduce the occurrence of special cases, such as predicting difficult terms into simpler ones. Figure <ref type="figure" target="#fig_0">1</ref> gives an overview of the model design. Hyperparameter settings either use grid search or follow default values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>The terms provided in the training samples are not independent, in other words, a term can correspond to multiple passages. We deduplicated records and obtained 250 independent sentence-term pairs as the final dataset. Then, we performed five-fold cross-validation on the dataset. According to the model design, we first verified the models for the three-class task, and the results are shown in Table <ref type="table" target="#tab_0">1</ref>. The star represents our proposed integrated model. It is shown that the integrated model is superior to the base models in terms of accuracy and AUC. Intuitively, five grading scales are more difficult, which require a more precise assessment of complexity. We take the prediction results of the three-class models as the extended input feature, which can improve the performance. We also obtained the accuracy, F1 score, and AUC value for the ensemble models of the five-class task, as shown in Table <ref type="table">2</ref>.</p><p>The accuracy metrics of the two ensemble models we designed outperform the other base models. On F1 scores and AUC metrics, they also achieved almost the best performance in the experiment. Furthermore, according to the subset of the test set consisting of 592 sentences manually annotated, our submissions are ranked second(2/4) on the scale 1-3 and first(1/4) on the scale 1-5, based on the proportion of successful matches of all participants. In the subset consisting of 167 common sentences, we ranked second in both tasks. <ref type="bibr" target="#b1">[2]</ref> However, the evaluation results of all participating teams performed poorly. One reason for this could be that the term extraction process is not proper. Many terms are manually annotated as requiring no explanation during the evaluation process and assigned a new difficulty score of 0, whereas they are assigned a difficulty score of 1 in our submissions, implying that they belonged to the easiest terms. Admittedly, the values of all these metrics are not high, indicating that the tasks of identifying terms and predicting term complexity are difficult.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>In this paper, we applied a pipeline for the term complexity prediction tasks, which consists of term recognition, feature extraction, training models, and assembling models. The ensemble models show improved performance than the base models.</p><p>As a preliminary study, a few limitations have been identified, which could guide our future refinement for our approach. The pre-trained embedding we choose is trained on Common Crawl, which is from the public domain. There can be pre-trained word embeddings for technology and medical fields, as are the domains covered by the task corpus. Thus, one work direction is to fine-tune a pre-trained model based on transformer architecture on a specific corpus of the target domain and to extract the learned embeddings as a complement to semantic features. Furthermore, our method takes into account some insignificant features, and there may be some important features that have not been identified. Evaluating the importance of features and emphasizing significant features in the learning models could further improve the approach.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overall model design: term complexity assessment at simpletext task 2.</figDesc><graphic coords="5,127.56,84.19,340.25,99.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Cross-validation results of the three-class task.</figDesc><table><row><cell>Three-classification</cell><cell cols="2">Accuracy</cell><cell cols="2">F1 Score</cell><cell cols="2">AUC</cell></row><row><cell>Model</cell><cell>mean</cell><cell>std</cell><cell>mean</cell><cell>std</cell><cell>mean</cell><cell>std</cell></row><row><cell cols="4">* (Integrated Model) 0.684 0.062 0.583</cell><cell cols="3">0.093 0.635 0.059</cell></row><row><cell>LightBGM</cell><cell cols="3">0.652 0.063 0.586</cell><cell cols="2">0.089 0.624</cell><cell>0.062</cell></row><row><cell>* -LightBGM</cell><cell cols="3">0.660 0.083 0.565</cell><cell cols="2">0.089 0.607</cell><cell>0.069</cell></row><row><cell>CatBoost</cell><cell cols="3">0.636 0.069 0.551</cell><cell cols="2">0.064 0.615</cell><cell>0.058</cell></row><row><cell>* -CatBoost</cell><cell cols="3">0.672 0.079 0.583</cell><cell cols="2">0.093 0.611</cell><cell>0.069</cell></row><row><cell>XGBoost</cell><cell cols="5">0.656 0.093 0.593 0.112 0.591</cell><cell>0.061</cell></row><row><cell>* -XGBoost</cell><cell cols="3">0.668 0.084 0.576</cell><cell cols="2">0.096 0.631</cell><cell>0.064</cell></row><row><cell>RandomForest</cell><cell cols="3">0.656 0.097 0.556</cell><cell cols="2">0.080 0.590</cell><cell>0.098</cell></row><row><cell>* -RandomForest</cell><cell cols="3">0.664 0.066 0.581</cell><cell cols="2">0.090 0.626</cell><cell>0.055</cell></row><row><cell>SVM</cell><cell cols="3">0.672 0.079 0.557</cell><cell cols="2">0.080 0.576</cell><cell>0.098</cell></row><row><cell>* -SVM</cell><cell cols="3">0.660 0.072 0.582</cell><cell cols="2">0.101 0.621</cell><cell>0.062</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/MaartenGr/KeyBERT</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/franplk/PhraseSimilarity</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://pypi.org/project/wordfreq/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://github.com/bheinzerling/bpemb</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://github.com/Kyubyong/g2p</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://pypi.org/project/syllables/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://stanfordnlp.github.io/stanza/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://nlp.stanford.edu/projects/glove/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">https://fasttext.cc/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A survey of automated text simplification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Shardlow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="58" to="70" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts, Experimental IR</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ermakova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bellot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nurbakova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ovchinnikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sanjuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mathurin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hannachi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Araujo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth International Conference of the CLEF Association</title>
				<meeting><address><addrLine>CLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page">13390</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Understanding inverse document frequency: on theoretical arguments for idf</title>
		<author>
			<persName><forename type="first">S</forename><surname>Robertson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of documentation</title>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Keyword extraction based on pagerank</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Pacific-Asia Conference on Knowledge Discovery and Data Mining</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="857" to="864" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Text document preprocessing with the bayes formula for classification using the support vector machine</title>
		<author>
			<persName><forename type="first">D</forename><surname>Isa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kallimani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rajkumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data engineering</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="1264" to="1272" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Validating coh-metrix</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Mcnamara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ozuru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Graesser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Louwerse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th annual conference of the cognitive science society</title>
				<meeting>the 28th annual conference of the cognitive science society</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="573" to="578" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A component based approach to measuring text complexity</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jönsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rennes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Falkenjack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jönsson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Seventh Swedish Language Technology Conference (SLTC-18)</title>
				<meeting><address><addrLine>Stockholm, Sweden</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-11-09">7-9 November 2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Automated readability index</title>
		<author>
			<persName><forename type="first">R</forename><surname>Senter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1967">1967</date>
			<pubPlace>Cincinnati Univ OH</pubPlace>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Assessing readability</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">R</forename><surname>Klare</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Reading research quarterly</title>
		<imprint>
			<biblScope unit="page" from="62" to="102" />
			<date type="published" when="1974">1974</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">CAMB at CWI shared task 2018: Complex word identification with ensemble-based voting</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gooding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kochmar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics</title>
				<meeting>the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics<address><addrLine>New Orleans, Louisiana</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="184" to="194" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Yimam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Biemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Specia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Štajner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1804.09132</idno>
		<title level="m">A report on the complex word identification shared task</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Shardlow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">H</forename><surname>Paetzold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.00473</idno>
		<title level="m">Semeval-2021 task 1: Lexical complexity prediction</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">DeepBlueAI at SemEval-2021 task 1: Lexical complexity prediction with a deep ensemble approach</title>
		<author>
			<persName><forename type="first">C</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Luo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Association for Computational Linguistics</title>
				<meeting>the 15th International Workshop on Semantic Evaluation (SemEval-2021), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="578" to="584" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Alejandro mosquera at semeval-2021 task 1: Exploring sentence and word features for lexical complexity prediction</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mosquera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Workshop on Semantic Evaluation</title>
				<meeting>the 15th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="554" to="559" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">The zipf-scale: A better standardized measure of word frequency</title>
		<author>
			<persName><forename type="first">M</forename><surname>Brysbaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Keuleers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stevens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Haegen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Verma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Callens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Tops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Khare</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mandera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Vander Beken</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Update</title>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The relation between morphological awareness and reading comprehension: Evidence from mediation and longitudinal models</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Deacon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Kieffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Laroche</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific Studies of Reading</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="432" to="451" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
