<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Key Environmental Lexicon Extraction Using Generative Transformer (Short Paper)</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tomara</forename><surname>Gotkova</surname></persName>
							<email>tomara.gotkova@univ-lorraine.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Université de Lorraine</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">ATILF</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexander</forename><surname>Shvets</surname></persName>
							<email>alexander.shvets@upf.edu</email>
							<affiliation key="aff2">
								<orgName type="institution" key="instit1">Pompeu Fabra University</orgName>
								<orgName type="institution" key="instit2">NLP Group (Barcelona</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<settlement>Nancy</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Key Environmental Lexicon Extraction Using Generative Transformer (Short Paper)</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">EA7A192333D762487304DF685C333F23</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-12-29T05:44+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>environmental terminology</term>
					<term>deep generative models</term>
					<term>keyword extraction</term>
					<term>specialized corpus</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents a study of the core environmental lexicon at the intersection of the fields of terminology and natural language processing. The goal was to find a way of automatizing the expansion of the preselected keyword list, and, in particular, to evaluate the ability of generative transformers to extract keywords unseen during the training phase. As a starting point, we collected keywords pertinent to the environmental discourse. Additionally, we compiled a corpus of texts on current and emerging environmental issues. These materials were used to train deep generative models of two types: a T5 transformer and a pointer-generator network pretrained for concept extraction as a baseline. We show that T5 significantly outperforms the baseline in detecting unseen keywords. We further provide qualitative analysis of the outcome of the resulting model applied to weakly annotated texts and confirm that the model helps to discover more keywords pertinent to the environmental topic.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Our primary objective is rooted in terminology: we aim to identify the core environmental terminology which we see as a set of central terms that shape the modern environmental discourse. As a first step towards this objective, we opt for a supervised machine learning approach that consists in training deep generative models with preselected lexical material and a specialized corpus of environmental texts. In the following sections, we comment on the theoretical framework that underlies our terminological tasks, describe the dataset, the selection of generative models, preliminary extraction results and points for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Theoretical framework 2.1. The notion of "environmental coreness"</head><p>Environmental terminology is a patchwork of terms which belong to different disciplines (anthropology, chemistry, biology, ecology, physics) and topics (renewable energy, ocean pollution, biodiversity). Due to such heterogeneity, environmental terminology defies clear-cut segmentation when it comes to certain tasks. While it is relatively easy to discern terms specific to a given environmental subtopic or subdiscipline, detecting terms which are relevant for most of the subtopics or subdisciplines at once remains a challenging (but feasible) task. For instance, <ref type="bibr" target="#b0">[1]</ref> proposes a method of identifying general environmental lexicon which "cuts across the entire field of the environment", e.g., biologist, ecosystem, green.</p><p>Previous research explored the notion of "coreness" as applied to both general and specialized lexicon. Depending on the purpose of a given core wordlist, core words can be defined by such properties as frequency, commonness, universality, semantic primitiveness, etc. <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. We focus on the notion of environmental coreness in the specialized texts on the current and emerging environmental issues, e.g., air pollution, loss of biodiversity, waste management, etc. A given environmental term is considered core if it meets the following criteria: (i) it refers to the most essential environmental concept (sustainable), (ii) it is pertinent to several environmental subtopics at once (ecosystem), (iii) it exhibits strong semantic connections with other environment-related terms, (iv) it is not specific to specialized environmental discourse only as it is diffused in general language discourse as well (mass media texts, general public communication, etc.).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Term vs. keyword</head><p>We advocate for the lexico-semantic approach to terminology which treats terms as lexical units <ref type="bibr" target="#b4">[5]</ref>. According to the integral component of the Meaning-Text theory -the Explanatory Combinatorial Lexicology -a lexical unit is a word which corresponds to one specific sense <ref type="bibr" target="#b5">[6]</ref>. Hence, the term carbon implies a pointer to a specific sense, e.g., 'chemical element C'. As regards to machine learning tasks, however, we deliberately refrain from using the notion of "term" to demonstrate that we stay at the level of abstract units with no clear terminological status. Instead, we use the notion of keyword which is a wordform devoid of clear semantic features as there is no direct reference to a specific sense. For instance, the keyword carbon<ref type="foot" target="#foot_0">1</ref> per se does not refer to any specific sense but it can acquire semantic features in context. It should be noted that our notion of keyword is different from the uses which refer to concepts rather than semantically ambiguous wordforms, e.g., keywords of a scientific paper.</p><p>The remaining 164 keywords were categorized as supplementary <ref type="foot" target="#foot_2">3</ref> . We considered the following keywords as supplementary: complex keywords built with Ckeywords (air pollution, anthropogenic carbon dioxide, atmospheric warming) and keywords which would not satisfy the criteria of coreness but are nevertheless important for environmental discourse (ice, Earth). Specialized corpus. Our specialized corpus is a monolingual English domain-specific corpus composed of 44 reports issued by international environmental organizations such as European Environmental Agency, Intergovernmental Panel on Climate Change, United Nations Environmental Program and World Meteorological Organization. These reports give a comprehensive overview of the current and emerging environmental issues.</p><p>We converted documents to plain text excluding figures and tables and manually cleaned artifacts remained after the conversion. Consider an example of a sentence with Ckeywords given in bold and Skeywords underlined:</p><p>Moreover, the degradation of wetlands releases stored carbon, fuelling climate change.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Automatic and gold-standard annotation</head><p>We designed a simple procedure to annotate the entire corpus of about 30K sentences to have enough data samples for training a neural network. The first step is to parse the corpus using UDPipe <ref type="foot" target="#foot_3">4</ref> , while the second is to consider all the sequences of tokens of lengths from one to six (i.e., up to the maximum number of words in keywords in our lists) taking the normal forms of lexical items using their lemmas, and looking them up (with conditions on part-of-speech tags) in the lists of keywords which we automatically expanded with alternatives beforehand (e.g., for biodiversity conservation we added conservation of the biodiversity, for bio-basedbiobased, etc.). Finally, each sentence with the corresponding found items made a single data sample. The obtained samples cover 103 out of 104 Ckeywords (carbon-free was not found in this corpus) and, in total, 255 out of 268 keywords. The search procedure took into account many possible occurrences including the cases of overlapping and discontinuous keywords such as soil pollution and air pollution in a phrase "soil and air pollution".</p><p>Resulting samples were shuffled and split into the training, development (dev), and test subsets in the proportion 80/10/10. We performed shuffling several times until the examples were distributed among the subsets in such a way that only 80% of the keywords are used for training (they also appear in two other subsets), while other 10% and 10% are used exclusively in the dev and test subsets without intersections <ref type="foot" target="#foot_4">5</ref> . We preserve these 20% of keywords to assess the ability of the model to extract "new" keywords unseen during the training. We leverage the dev set to select the most prominent intermediate states of the model obtained during the training, and the test set -for the final evaluation. Sentences without keywords were also added proportionally to the subsets to guide the model when it should not extract anything.</p><p>In addition to our simple automatic annotation, we manually selected and examined 200 sentences from the corpus (excluded from the subsets) and created fully annotated samples (with some keywords beyond the existing lists) that we refer to as a gold standard. The size of the overall dataset is shown in Table <ref type="table" target="#tab_0">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ckey+Skey</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Generative extraction models</head><p>The overlapping and discontinuous keywords in environmental texts create a problem in applying traditional sequence labelling-based extractors. Instead, in this work, we opt for deep neural generative models that are capable of translating a sentence into an arbitrary sequence of words (not necessarily coherently connected) like T5 <ref type="bibr" target="#b7">[8]</ref> as we would like a model to output keywords in the form appeared in a sentence one after another separated by a reserved symbol <ref type="foot" target="#foot_5">6</ref> .</p><p>In our experiments, we worked with two versions of the pretrained transformer T5, T5-small and T5-large <ref type="foot" target="#foot_6">7</ref> . We also chose a pretrained pointer-generator-based concept extraction model (CE-PGN) <ref type="bibr" target="#b9">[10]</ref> as an alternative that we successfully applied for public discourse analysis in the domain of interior and urban design <ref type="bibr" target="#b10">[11]</ref>. Originally, this model was designed to extract concepts mainly in a form of nominal phrases which is not the exclusive form for the keywords considered in this work. Still, we assumed that tuning it on our data could change its behaviour.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>We report on the precision 𝑃 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑃 ) and recall 𝑅 = 𝑇 𝑃/𝑁 𝑃 scores for different types of keywords (Skeywords, Ckeywords, and Ckeywords new -Ckeywords unseen during the training) in Table <ref type="table">2</ref> where 𝑇 𝑃 is the number of correctly extracted mentions of the scored type, 𝐹 𝑃 -the number of extracted mentions out of all ground-truth mentions (𝐹 𝑃 does not depend on the type under scoring), and 𝑁 𝑃 -the number of ground-truth mentions of the scored type.</p><p>As expected, the original CE-PGN model extracts a small number of keywords with a very low precision as it tends to find all the concepts independently of the domain. The fine-tuning re-oriented it towards the environmental domain -both scores were significantly improved. However, it performed poorly on extracting unseen keywords. T5-large performed better than other models apart from when the small version gained a slightly higher recall score on the unseen Ckeywords of the test set. Interestingly, the annotation with Skeywords helped to detect Ckeywords better. The model that was trained only to extract Ckeywords (T5-large c-tuned) generalized poorer and missed many more unseen Ckeywords.</p><p>For the quality check of the extraction results, we manually checked 171 non-annotated keywords extracted from dev set using T5-large. As a result, 70 novel keywords (41%) were obtained, other 32 (19%) corresponded to existing keywords missing in automatic annotation due to mistakes of the parser, and only the rest 69 (40%) were false negatives, i.e., not keywords. 45 keywords out of the 70 novel were combinations of already existing keywords in our lists (ecological drought, biomass contaminant), the remaining 25 keywords were new to us (smog, renewable electricity, biomethane). This result is linguistically valuable for us: all 25 new keywords are pertinent to the environmental topic. Although, some keywords are too specific (cryosphere), all of them are considered as an important addition to our list.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and Future Work</head><p>Results of our experiments provided several valuable insights as regards both linguistics and information extraction areas. First, the preselected keywords proved pertinent to the environmental topic and, in particular, to the vocabulary of the current and emerging environmental issues. More specifically, only 13 keywords out of 268 were not present in our corpus (5%). Second, tests performed with T5-large demonstrated that supplementary lexical material (Skeywords) enhanced the model's ability to detect Ckeywords. Therefore, as the list of Ckeywords used to train the model grows, it is necessary to add to the list of Skeywords as well. Third, we consider it now important to increase the number of manually annotated samples to improve the gold standard dataset and this will allow us to train the model on annotated data of high quality in addition to automatically annotated sets. Fourth, T5-large model proved efficient for extracting unseen keywords: it detected 50-70% of them in a set (62% across all the evaluation sets). Finally, we extracted 70 novel keywords pertinent to the topic of current and emerging environmental issues which were not present in our preselected keyword list.</p><p>The ultimate goal of our research, which goes beyond this study, is multifold. The finalized keyword list will be used to scrape data from social networks, namely Twitter and Reddit, to monitor the general public's perception of core environmental terms. As a parallel task, both preselected and extracted keywords will be subject to a lexicographic analysis in order to convert them into meaningful lexical units and describe them in a lexicographic resource. In some cases, a given complex keyword may be decomposed into several terms. For example, the keyword climate pollutant should be converted to and lexicographically described as two separate terms climate and pollutant, for phraseological reasons. Additionally, the obtained list of terms will be analyzed according to the criteria of environmental coreness discussed in 2.1. If a given term satisfies all four criteria, it can be validated as a core environmental term. We would also like to explore the differences in terminology in multilingual material with mT5 and study the transferability of the obtained extractive models to other domains.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Statistics over the dataset. Ckey+Skey/Ckey/Ckey new -number of keyword occurrences in a subset (number of unique keywords in parenthesis); # pos, # neg -number of samples w/ and w/o keywords</figDesc><table><row><cell></cell><cell></cell><cell>Ckey</cell><cell cols="3">Ckey new # pos # neg</cell></row><row><cell cols="3">Training 24,565 (206) 17,618 (80)</cell><cell cols="2">-10,301</cell><cell>8,803</cell></row><row><cell>Dev</cell><cell>3,711 (184)</cell><cell>2,723 (83)</cell><cell>183 (9)</cell><cell>1,449</cell><cell>1,149</cell></row><row><cell>Test</cell><cell>3,703 (172)</cell><cell>2,737 (80)</cell><cell>183 (9)</cell><cell>1,505</cell><cell>1,117</cell></row><row><cell>Gold</cell><cell>592 (238)</cell><cell>192 (37)</cell><cell>35 (9)</cell><cell>100</cell><cell>100</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Terms are written in italics; keywords are written in teletype.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The process included both manual and automatic selection and was partly done in collaboration with an expert in green chemistry and an expert in lexicology<ref type="bibr" target="#b6">[7]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Further in the text, core-candidate keywords and supplementary keywords are called Ckeyword and Skeyword respectively.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://ufal.mff.cuni.cz/udpipe</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">A couple of thousands of samples were removed from the dataset to meet the condition of exclusiveness.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">E.g., Sustainable forest management can maintain... → Sustainable * Sustainable forest management * forest</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">For languages other than English, mT5 shall be used as it allows for cross-lingual transfer learning<ref type="bibr" target="#b8">[9]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">The model was tuned on the same training set but annotated only with Ckeywords.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This research was funded by the EC-funded research and innovation programme Horizon Europe under the grant agreement number 101070278 and by the French PIA project "Lorraine Université d'Excellence", reference ANR-15-IDEX-04-LUE.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Lexical profiling of environmental corpora</title>
		<author>
			<persName><forename type="first">P</forename><surname>Drouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>-C. L'homme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Robichaud</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA)</title>
				<editor>
			<persName><forename type="first">N</forename><forename type="middle">C C</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">)</forename></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cieri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Goggi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Hasida</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Isahara</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Tokunaga</surname></persName>
		</editor>
		<meeting>the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA)<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3419" to="3425" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Is there a Core Vocabulary? Some Implications for Language Teaching</title>
		<author>
			<persName><forename type="first">R</forename><surname>Carter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Linguistics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="178" to="193" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Core vocabulary, borrowability and entrenchment: A usage-based onomasiological approach</title>
		<author>
			<persName><forename type="first">E</forename><surname>Zenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Speelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Geeraerts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Diachronica</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="74" to="105" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Is There a Core General Vocabulary? Introducing the New General Service List</title>
		<author>
			<persName><forename type="first">V</forename><surname>Brezina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gablasova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Linguistics</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="1" to="22" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Lexical semantics for terminology : an introduction, Terminology and lexicography research and practice</title>
		<author>
			<persName><forename type="first">M.-C</forename><surname>L'homme</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>John Benjamins Publishing Company</publisher>
			<pubPlace>Amsterdam Philadelphia</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">A</forename><surname>Mel'čuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Clas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Polguère</surname></persName>
		</author>
		<title level="m">Introduction à la lexicologie explicative et combinatoire</title>
				<meeting><address><addrLine>Duculot</addrLine></address></meeting>
		<imprint>
			<publisher>Universités francophones</publisher>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Public perception and usage of the term : Linguistic analysis in an environmental social media corpus</title>
		<author>
			<persName><forename type="first">T</forename><surname>Gotkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Chepurnykh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychology of Language and Communication</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="297" to="312" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Exploring the limits of transfer learning with a unified text-to-text transformer</title>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1" to="67" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">mt5: A massively multilingual pre-trained text-to-text transformer</title>
		<author>
			<persName><forename type="first">L</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Constant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siddhant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="483" to="498" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Concept extraction using pointer-generator networks and distant supervision for data augmentation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shvets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wanner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Knowledge Engineering and Knowledge Management</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="120" to="135" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Social media and web sensing on interior and urban design</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Stathopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shvets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Carlini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Diplaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vrochidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wanner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kompatsiaris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Computers and Communications (ISCC), IEEE</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
