<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Philo of Alexandria at Touché: A Cascade Model Approach to Human Value Detection Notebook for the Touché Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Víctor</forename><surname>Yeste</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">PRHLT Research Center</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<postCode>46022</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Universidad Europea de Valencia</orgName>
								<address>
									<postCode>46010</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mariona</forename><surname>Coll-Ardanuy</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">PRHLT Research Center</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<postCode>46022</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
							<email>prosso@dsic.upv.es</email>
							<affiliation key="aff0">
								<orgName type="department">PRHLT Research Center</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<postCode>46022</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Valencian Graduate School and Research Network of Artificial Intelligence (ValgrAI)</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Philo of Alexandria at Touché: A Cascade Model Approach to Human Value Detection Notebook for the Touché Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">32BD4860402E89AF60F725B5914CDEA2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>human value detection</term>
					<term>text classification</term>
					<term>multi-label classification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes our contribution to the Human Value Detection shared task at CLEF 2024. Our submitted system approaches the task of human value detection and attainment using a sequence of two models: a multilabel text classifier based on DeBERTa is used first to predict the human values present in the text. Then, a follow-up natural language inference binary classifier based on DeBERTa is applied to discern whether the values that are present in the text are attained or constrained. This cascade model approach improves the granularity of text classification. Our approach outperforms all baselines, achieving a Macro F1-score of 0.28 on sub-task 1 (human value detection) and a Macro F1-score of 0.82 on sub-task 2 (value attainment prediction).</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The task of human value detection involves applying natural language processing to identify whether human values are present in texts, and to determine whether such values appear as attained or constrained. These values have been ordered in a circular motivational continuum by Schwartz et al. (2012) <ref type="bibr" target="#b0">[1]</ref>, in which 19 values were defined based on their compatible and conflicting motivations, expression of self-protection vs. growth, and personal vs. social focus.</p><p>The Human Value Detection at CLEF 2024 task (ValueEval'24) <ref type="bibr" target="#b1">[2]</ref> consists of two sub-tasks: the first is to detect the presence or absence of each of these 19 values, while the second is to detect whether the value is attained or constrained. The dataset provided for both tasks consist of approximately 3000 human-annotated texts between 400 and 800 words created by the ValuesML project <ref type="bibr" target="#b2">[3]</ref>. The data is provided at the sentence-level (44,758 sentences for training, 14,904 sentences for validation, and 14,569 sentences are kept for testing), in which each sentence is annotated in a multi-label setting and a single-level taxonomy consisting of 38 labels, expressing each human value's attained and constrained versions. As the original dataset is multilingual and contains texts in several languages, an automatically translated version to English of the training, validation, and test dataset was provided for every team that wished to create an approach without a multilingual perspective.</p><p>The present work includes a cascade model approach consisting of two consecutive models: a multilabel text classifier used to predict which of the 19 human values are present in the text, followed by a binary classifier which treats the task of determining the attainment or not of the value as a stance classification problem, in which both the text and the value are passed as input, and the expected output is whether the value appears as attained or constrained. Our approach outperforms all the baselines provided by the organizers, including a baseline based on BERT. This paper includes a detailed system overview, the experiments we have performed, the results and discussion, and some conclusions and future studies that could continue this work. The code for the proposed system, as well as for all our experiments, is available on GitHub. <ref type="foot" target="#foot_0">1</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">System Overview</head><p>This section presents our cascade model approach, in which two models are dedicated to each of the proposed sub-tasks, and combined to achieve the prediction in the required format. Our approach uses the automatic translated texts to English. Our system introduces a cascade model approach for the detection and stance classification of the predefined set of human values. It consists of two subsystems: one for detecting the presence of each human value and another for establishing the stance (if the sentence attains or constrains) of each human value. Each subsystem is fine-tuned separately, in both cases using a DeBERTa model<ref type="foot" target="#foot_1">2</ref>  <ref type="bibr" target="#b3">[4]</ref> as base, for the task of sequence classification using the HuggingFace implementation. <ref type="foot" target="#foot_2">3</ref>• Subsystem 1: Its primary function is to identify the presence of human values within sentences.</p><p>By combining the 'attained' and 'constrained' labels to indicate an overall presence, it streamlines the multi-label classification task, simplifying it to a binary classification for each of the 19 human values (presence vs. absence). The model for the proposed subsystem is available at HuggingFace. <ref type="foot" target="#foot_3">4</ref>• Subsystem 2: it receives the outputs of subsystem 1 and classifies the stance towards each present human value in a binary classification (attained vs. constrained). This system transforms the sentences dataset into premise-hypothesis pairs, where each sentence is the premise, a value is the hypothesis, and the 'attained' and 'constrained' labels are the stance. The model for the proposed subsystem is available at HuggingFace. <ref type="foot" target="#foot_4">5</ref>Given that subsystem 1 focuses on detecting the presence of human values in the text, and subsystem 2 focuses on the stances towards each detected human value, this cascade model approach improves the granularity of text classification. As can be seen in the Results section, it also enhances the performance of the final predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments</head><p>Experiments were carried out on Google Colab in Python 3.10.12 and Nvidia Tesla, as well as 12.7 GB of System RAM and 15 GB of GPU RAM. HuggingFace transformers <ref type="bibr" target="#b4">[5]</ref> have been used as frameworks for all the experiments in this study. Training has been designed with flexibility and performance, and evaluation metrics have been calculated upon training completion and validation with the task validation dataset. F1 scores for each label and a macro-average F1 score were used to evaluate each experiment, enabling a comprehensive analysis of individual and overall effectiveness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Preliminary Experiments</head><p>Our initial experiments involved using a single model approach to classify each text into the predefined set of human value stance labels (i.e., the 38 labels determining whether the sentence attains or constrains each of the 19 human values). The objective was to leverage the powerful features of well-known transformer models for this purpose, and to determine which was the best suited for the task. We experimented with the following pretrained models: google-bert/bert-base-uncased <ref type="bibr" target="#b5">[6]</ref>, <ref type="foot" target="#foot_5">6</ref> FacebookAI/roberta-base<ref type="foot" target="#foot_6">7</ref> </p><p>[7], microsoft/deberta-base 8 <ref type="bibr" target="#b3">[4]</ref>, google/electra-base-discriminator 9 [8] and xlnet-base-cased 10 <ref type="bibr" target="#b8">[9]</ref>. These pre-trained models were initialized for sequence classification and, for task 1, configured for the multi-label classification setting. Each selected model was fine-tuned on the task training dataset and validated with the task validation dataset. The sentences were tokenized using the specific tokenizer from Huggingface Transformers for each model. All models were fine-tuned with a batch size of 8, for 5 training epochs, a learning rate of 2𝑒−5, and a weight decay 0.01. A linear learning rate scheduler was implemented using 0 warmup steps on BERT and RoBERTa, using Adam as an optimizer and incorporating weight decay directly to improve regularization and prevent overfitting. The final model, DeBERTa, was selected based on the fact that it produced the highest macro F1-score on the training and validation dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">System Experiments</head><p>Our cascade models approach has been developed by fine-tuning two DeBERTa models in sequence, therefore converting the approach of dividing the challenge into two sub-tasks into reality. In both cases, we used the same experimental settings as described in the preliminary experiments section.</p><p>First, we transformed each pair of attained and constrained labels into presence labels, understanding presence as an OR operation between both labels. The DeBERTa model was fine-tuned for multi-label classification of the 19 available human values, and was trained on the task training dataset and validated on the task validation dataset to evaluate the effectiveness of this subsystem alone to detect the presence of human values. This step ensures the ability to answer the first sub-task of the challenge with a significantly reduced complexity as the output space is 19-dimensional instead of 38-dimensional, translated into a smaller number of possible label combinations.</p><p>Second, subsystem 2 receives subsystem 1 results as inputs and applies an approach of natural language inference, where each sentence is considered a premise, human values labels are considered different hypotheses, and "attained" and "constrained" are the labels. With this technique, the model tries to determine a logical entailment relationship between this pair of sequences. This inference establishes the stance of the sentence toward each human value, which answers sub-task 2 of the proposed challenge.</p><p>Finally, it is important to note that, in order to adjust the predictions of our cascade approach to the format required by the shared task, we had to do one small modification to our system. While our system is conceived to apply the second model only for those values that have been found to be present in the text, the format required to participate in both tasks 11 meant that, in order to produce our results file, we applied the subsystem 2 model to each sentence-value pair, instead of only those values that have been predicted to be in the sentence. To ensure that values detected as absent remain below the 0.5 threshold that is used by the evaluator to determine that the value is not present, in those cases in which the value has not been predicted by the first model, we multiply the second model prediction score by the first model prediction score, divided by two.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head><p>In our preliminary experiments, our models were trained and evaluated with the provided training and validation datasets, generating an individual F1-score for every human label and a generic Macro F1-score, which were used to compare the effectiveness of the different models. The model with 8 https://huggingface.co/microsoft/deberta-base 9 https://huggingface.co/google/electra-base-discriminator 10 https://huggingface.co/xlnet/xlnet-base-cased 11 Only one file had to be submitted for both tasks, with 38 columns for each of the 38 labels (i.e. 19 human value pairs). Task 1 was evaluated based on the sum of values between the attained and constrained columns of the value (which should be larger than 0.5 if the value is present), and task 2 was evaluated based on which of the two columns ('attained' or 'constrained') had the larger value. The organizers recommended avoiding setting the same number for both attained and constrained, even if our system predicted that the value was not referenced in the text.</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/VictorMYeste/touche-human-value-detection</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/microsoft/deberta-base</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSequenceClassification</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://huggingface.co/VictorYeste/deberta-based-human-value-detection</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://huggingface.co/VictorYeste/deberta-based-human-value-stance-detection</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://huggingface.co/google-bert/bert-base-uncased</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://huggingface.co/FacebookAI/roberta-base</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Work for this paper was conducted as part of the PhD Program in Computer Science at the Universitat Politècnica de València. The work of Mariona Coll Ardanuy and Paolo Rosso was funded by the research project FairTransNLP, grant PID2021-124361OB-C31, funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Achieved F 1 -score (0.score) of each submission on the test dataset for subtask 1. A ✓ indicates that the submission used the automatic translation to English. Baseline submissions shown in gray. the highest effectiveness was found to be DeBERTa with a Macro F1-Score of 0.20. However, while DeBERTa presented the highest Macro F1-score, some models achieved higher individual F1-scores for some human values: BERT was better on 'tradition attained'; RoBERTa on 'achievement attained', 'security: societal constrained', 'universalism: concern attained', and 'universalism: nature attained'; Electra on 'power: dominance attained', 'power: resources constrained', 'security: societal attained', 'universalism: concern attained', and 'universalism: concern constrained'; and XLNet on 'power: resources attained', 'power: resources constrained', 'security: societal attained', 'conformity: rules constrained', 'benevolence: dependability attained', 'universalism: concern attained', and 'universalism: concern constrained'. These results could indicate that using a different model for each human value could be an interesting approach. As DeBERTa was selected as the best overall model, our system was developed using two cascade DeBERTa models. Table <ref type="table">1</ref> shows the results of our system for subtask 1. As it can be seen, our system outperforms all baselines, including the BERT-based baseline, by 0.04 in terms of F1-score. It is interesting to note that both our approach and the BERT baseline generally perform similarly well on the same values (such as 'security: societal', 'tradition', 'conformity: rules', and 'universalism: nature'), and similarly bad on other values (such as 'self-direction: thought', 'conformity: interpersonal', and 'humility'), while some other values have significant increases with our approach (such as 'universalism: tolerance' and 'face'). Overall, our approach matches or outperforms the BERT baseline for all values, except for 'power: resources'.</p><p>Table <ref type="table">2</ref> shows the results of our system for subtask 2. While our approach outperforms the BERT baseline, the F1-score is only slightly higher (0.82 over 0.81). Our approach only outperforms the BERT baseline on 12 of the 19 possible values. Our model is best at predicting 'hedonism' and 'benevolence: caring', and significantly worse than the baseline in predicting 'humility', with which our first model also failed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>This work proposes a system to resolve the challenge sub-tasks related to human values detection. Our approach uses cascade DeBERTa models, where the first detects the presence of each human value, and the second detects if the sentence attains or constrains the present human values in each sentence. The latter approach improves the effectiveness of the baseline at the test dataset by 4 on sub-task 1 and by 1 on sub-task 2. These models were trained on a subset of 44,758 sentences in English, validated on a subset of 14,904 sentences, and tested on a separate subset of 14,569 sentences. Future work could involve implementing a separated detection model for each human value, adapting each model to its characteristics depending on which model performs better in each case. Considering the complexity and subtlety of this task, adding linguistic and statistical characteristics to texts could enrich their context and improve the effectiveness of the models.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Refining the theory of basic individual values</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Schwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cieciuch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vecchione</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Davidov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Beierlein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Verkasalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-E</forename><surname>Lönnqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Demirutku</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of personality and social psychology</title>
		<imprint>
			<biblScope unit="volume">103</biblScope>
			<biblScope unit="page">663</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of Touché 2024: Argumentation Systems</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ç</forename><surname>Çöltekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Heinrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alshomary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">D</forename><surname>Longueville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Erjavec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Handke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kopp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ljubešić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Meden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mirzakhmedova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Morkevičius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Reitis-Münstermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Scharfbillig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Stefanovitch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wachsmuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mulhem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Quénot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>De Herrera</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<idno type="DOI">10.5281/zenodo.10663363</idno>
		<title level="m">The ValuesML Team</title>
				<imprint>
			<publisher>Touché24-ValueEval</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Deberta: Decoding-enhanced bert with disentangled attention</title>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 conference on Empirical Methods in Natural Language Processing: system demonstrations</title>
				<meeting>the 2020 conference on Empirical Methods in Natural Language Processing: system demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized bert pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Electra: Pre-training text encoders as discriminators rather than generators</title>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Xlnet: Generalized autoregressive pretraining for language understanding</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
