<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Edward Said at Touché: Human Value Detection Using Transformers and Upsampling Notebook for the Touché Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Aisha</forename><forename type="middle">Nur</forename><surname>Aydin</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Cornell University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Shaden</forename><surname>Shaar</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Cornell University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Claire</forename><surname>Cardie</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Cornell University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Edward Said at Touché: Human Value Detection Using Transformers and Upsampling Notebook for the Touché Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">30AA4AE32E90468EF43545C35CA8DCFC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>human-value-classification</term>
					<term>Touché</term>
					<term>CLEF</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we tackle both subtasks of the proposed shared task Human Value Classification at Touché-that aims to classify dialogue speech into one of 19 human values determined by Schwartz's Refined Theory of Basic Individual Values. We fine-tune models like DeBERTa and RoBERTa with F1-loss to handle multi-label settings. We additionally test different sampling strategies to accommodate for data imbalance. We found that by training on the English-translated utterances, we beat the baselines by at least 2 F1 points across both subtasks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The Human Value Detection task aims to identify the values humans express through words. Eight languages are included in the task, so the human value is meant to be identified in a multilingual context. Upon identification of the values, the value can be classified as having been attained, constrained, or neither attained nor constrained. The first subtask pertains to classifying a sentence to contain a specific human value. In contrast, the second subtask identifies whether the sentence constrains or attains the given set of human values <ref type="bibr" target="#b0">[1]</ref>.</p><p>We fine-tuned pre-trained RoBERTa and DeBERTa models based on the data from the ValuesML dataset. 1 We observed that the class imbalance in the data could affect the classification of the human values, so we attempted various combinations of upsampling the data to address this issue. We found that upsampling the lowest-performing categories by 4 folds yields the best results. Similar to other tasks (e.g., emotion detection <ref type="bibr" target="#b1">[2]</ref>), we use only the English models and train using English and translated English utterances. Using the same model trained for both tasks, we were able to beat all the proposed baselines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>There are nineteen human values in Schwartz's Refined Theory of Basic Individual Values <ref type="bibr" target="#b2">[3]</ref>. There is a wide range of values in this taxonomy. Additionally, since human values can be addressed implicitly, it can be challenging to identify which value is being used in a given text <ref type="bibr" target="#b3">[4]</ref>. The human value detection task for 2023 had three components that made up each argument: a conclusion, a stance, and a premise.</p><p>Here is an example argument from the 2023 Human Values Task:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>We should ban human cloning Stance in favor of Premise</p><p>We should ban human cloning as it will only cause huge issues when you have a bunch of the same humans running around all acting the same.</p><p>CLEF 2024: Conference and Labs of the Evaluation Forum, September 09-12, 2024, Grenoble, France ana72@cornell.edu (A. N. Aydin); ss2753@cornell.edu (S. Shaar); ctc9@cornell.edu (C. Cardie)</p><p>This year, the task takes a sentence as an input and outputs its human value, whether it is (even partially) attained, constrained, or neither. A value is attained if the text supports it and is constrained if it hinders it. This required us to treat the problem as a multi-class, multi-label classification problem with 38 labels consisting of 19 human values that are attained or constrained.</p><p>An example sentence with a human value is the following. The input is:</p><p>"Young women examining their options for third level education have been urged to consider careers in science, technology, engineering or maths (STEM). "</p><p>The value referred to in this sentence (the output) is Achievement Attained. These sentences are provided by the ValueML dataset , which contains text and their respective human values from articles and political text. data consists of 3000 texts in over eight languages. 20% of the data is used for validation, 20% is part of testing, and the remaining 60% is for training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">System Overview</head><p>For all our experiments, we chose to combine all languages in the dataset into one by keeping the English utterances only. We use RoBERTa Large, specifically "FacebookAI/roberta-large" as the base model for our submission. We first experimented with the original RoBERTa and DeBERTa models but found the F1 scores on the validation dataset lower than the BERT baseline provided by the task organizers. We then fine-tuned RoBERTa-Large and DeBERTa-Large and observed improved results. There was an improvement, but the overall F1 score appeared to be affected by the class imbalance. Human Values that appeared less in the dataset had noticeably lower F1 scores than those that seemed more.</p><p>We attempted multiple configurations of upsampling but ended up using an upsampling technique where the human values with lower performance metrics were chosen to be upsampled by a factor of four.</p><p>To determine which human values to upsample, we fine-tuned the RoBERTa large model on the training data and analyzed the evaluation results. If the F1 score was 0.15 or less for subtask 1, we upsampled the attains and constrains form of that human value. We also looked at the metrics for subtask 2 to determine whether we should upsample only one of the attains or constrains of that value. If the recall for one of the constrains or attains was 50% or less than its counterpart, we only increase the underperforming one. For example, if the recall of the attains version of a human value was 50% or less than the constrains version, we only choose to upsample the attains version and vice versa. Based on the metrics and the unevenness of the dataset, we upsampled these values by a factor of four: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Setup</head><p>To fine-tune the model, we set the learning rate to 2e-5, used a warm-up ratio of 0.2, set the batch size to 8, and used four epochs. We put the random seed to 42, used the AdamW optimizer, and used a linear scheduler. We used one A100 GPU to run the experiments for the final submitted models. Our experiments used pre-trained RoBERTa <ref type="bibr" target="#b4">[5]</ref> and DeBERTa <ref type="bibr" target="#b5">[6]</ref> models. For evaluation, we use Precision, Recall, and the macro F1-score. We fine-tuned four models: RoBERTa and DeBERTa large on the upsampled data, and RoBERTa and DeBERTa large on the regular data, with no upsampling. For our final submission, we chose the model fine-tuned with the upsampled data using RoBERTa large as the base. This had the best metrics among all four models on the validation dataset. All four models can be found on Hugging Face<ref type="foot" target="#foot_0">2</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Achieved F 1 -score of each submission on the test dataset for subtask 1. A ✓ indicates that the submission used the automatic translation to English. Baseline submissions shown in gray. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head><p>The upsampling methods resulted in a 4% increase for the macro F1 score of subtask 1 and a 2% increase for the macro F1 score of subtask 2. Of the upsampled human values, "Benevolence: caring" and "Tradition" were the only ones that did not have an improved F1-score for subtask 1. For the other upsampled values, there were improved F1-scores, some marginal and others significant. For subtask 2, the upsampled values performed only marginally better except for "Self-direction: thought" and "Humility", which all had worse metrics, and "Benevolence: dependability" and "Universalism: tolerance", which stayed the same.</p><p>After the task deadline, we looked at the performances of the other three models on the test data. Those models were DeBERTa large fine-tuned on upsampled data and DeBERTa large and RoBERTa large fine-tuned on the original training data. The results comparing all of these models are on Table <ref type="table">1</ref> and Table <ref type="table" target="#tab_1">2</ref>. The RoBERTa large model with upsampling performs better than the other three models in subtask 1, but ends up being the worst-performing model on subtask 2. The best performing model for subtask 2 is DeBERTa large without upsampling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Work</head><p>We were able to beat the BERT baselines by incorporating the F1-Loss function and up-sampling lower-performing categories. This entails that the data benefits from knowledge transfer obtained from the different categories. In the future, we would also like to consider different languages as just using the translation could have led to missed cultural nuances expressed in language.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>•</head><label></label><figDesc>Figure 1 illustrates the impact of upsampling on the dataset, showing the distribution of data before and after upsampling.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Comparison of data distribution before and after upsampling. Left: Before Upsampling, Right: After Upsampling.</figDesc><graphic coords="3,72.00,65.60,216.60,186.93" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>F 1 -score Submission EN All Self-direction: thought Self-direction: action Stimulation Hedonism Achievement Power: dominance Power: resources Face Security: personal Security: societal Tradition Conformity: rules Conformity: interpersonal Humility Benevolence: caring Benevolence: dependability Universalism: concern Universalism: nature Universalism: tolerance</head><label></label><figDesc></figDesc><table><row><cell cols="2">muted-glacier-2024-05-07-02-06-56 (RoBERTa large with Upsampling) ✓</cell><cell>28 05 17 11 15 25 31 34 16 32 41 45 44 06 05 10 23 41 57 27</cell></row><row><cell cols="2">other-models-2024-07-05-04-47-24 (DeBERTa large with Upsampling) ✓</cell><cell>26 03 14 26 18 30 12 20 21 16 43 52 46 06 07 09 13 36 58 19</cell></row><row><cell>other-models-2024-07-05-04-47-48 (DeBERTa large no Upsampling)</cell><cell>✓</cell><cell>26 00 17 07 09 38 29 27 19 24 44 48 45 00 00 18 11 42 58 11</cell></row><row><cell>other-models-2024-07-05-04-48-29 (RoBERTa large no Upsampling)</cell><cell>✓</cell><cell>23 00 12 04 26 27 26 18 07 18 41 39 44 00 00 16 04 39 57 06</cell></row><row><cell>valueeval24-bert-baseline-en</cell><cell>✓</cell><cell>24 00 13 24 16 32 27 35 08 24 40 46 42 00 00 18 22 37 55 02</cell></row><row><cell>valueeval24-random-baseline</cell><cell></cell><cell>06 02 07 05 02 11 08 10 04 05 13 03 11 03 00 04 04 09 04 02</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Achieved F 1 -score of each submission on the test dataset for subtask 2. A ✓ indicates that the submission used the automatic translation to English. Baseline submissions shown in gray.</figDesc><table><row><cell>F 1 -score</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://huggingface.co/collections/aishanur/human-value-detection-668c4548607e863cc5cebd58</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of Touché 2024: Argumentation Systems</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ç</forename><surname>Çöltekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Heinrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alshomary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">D</forename><surname>Longueville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Erjavec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Handke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kopp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ljubešić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Meden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mirzakhmedova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Morkevičius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Reitis-Münstermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Scharfbillig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Stefanovitch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wachsmuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mulhem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Quénot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>De Herrera</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Cross-lingual emotion detection</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hassan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Darwish</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.lrec-1.751" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Béchet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Blache</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cieri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Goggi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Isahara</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="6948" to="6958" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Refining the Theory of Basic Individual Values</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">H</forename><surname>Schwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cieciuch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vecchione</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Davidov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fischer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Beierlein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Verkasalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-E</forename><surname>Lönnqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Demirutku</surname></persName>
		</author>
		<idno type="DOI">10.1037/a0029393</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of personality and social psychology</title>
		<imprint>
			<biblScope unit="volume">103</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">SemEval-2023 Task 4: ValueEval: Identification of Human Values behind Arguments</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alshomary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mirzakhmedova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Heinrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Handke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wachsmuth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.semeval-1.313</idno>
	</analytic>
	<monogr>
		<title level="m">17th International Workshop on Semantic Evaluation (SemEval 2023), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Ojha</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Doğruöz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">D S</forename><surname>Martino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Madabushi</surname></persName>
		</editor>
		<meeting><address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="2287" to="2303" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<title level="m">RoBERTa: A robustly optimized BERT pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.03654</idno>
		<title level="m">DeBERTa: Decoding-enhanced BERT with disentangled attention</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
