<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Team OpenFact at PAN 2024: Fine-Tuning BERT Models with Stylometric Enhancements Notebook for the PAN Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ewelina</forename><surname>Księżniak</surname></persName>
							<email>ewelina.ksiezniak@ue.poznan.pl</email>
							<affiliation key="aff0">
								<orgName type="institution">Poznań Uniwersity of Economics and Business</orgName>
								<address>
									<addrLine>Al. Niepodległości 10</addrLine>
									<postCode>61-875</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Krzysztof</forename><surname>Węcel</surname></persName>
							<email>krzysztof.wecel@ue.poznan.pl</email>
							<affiliation key="aff0">
								<orgName type="institution">Poznań Uniwersity of Economics and Business</orgName>
								<address>
									<addrLine>Al. Niepodległości 10</addrLine>
									<postCode>61-875</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marcin</forename><surname>Sawiński</surname></persName>
							<email>marcin.sawinski@ue.poznan.pl</email>
							<affiliation key="aff0">
								<orgName type="institution">Poznań Uniwersity of Economics and Business</orgName>
								<address>
									<addrLine>Al. Niepodległości 10</addrLine>
									<postCode>61-875</postCode>
									<settlement>Poznań</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Team OpenFact at PAN 2024: Fine-Tuning BERT Models with Stylometric Enhancements Notebook for the PAN Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">EA806AE9A48C02A86C3E8CB88A16C8A5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:54+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Stylometric Analysis</term>
					<term>Style Change Detection</term>
					<term>BERT Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents our solution for the Multi-Author Style Change Detection task at PAN 2024. The task involves detecting paragraph-level writing style changes in texts, with datasets classified into easy, medium, and hard difficulty levels. We incorporated stylometric tags directly into the text to enhance the sensitivity of BERT-family models to stylistic features. Our approach aimed to improve the model's detection of authorship changes by adding these tags to the training dataset and model sensivity to stylometric features. The results showed F1 improvements when training on smaller datasets, indicating the method's potential for hard-to-obtain data types.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Multi-author style change detection has been a task organized by PAN since 2016 <ref type="bibr" target="#b0">[1]</ref>. Prior to the advent of BERT models, style-change detection techniques predominantly relied on traditional stylometric features, including lexical elements (n-grams), word frequencies, and syntactic characteristics like parts of speech or syntactic trees. For instance, in 2018, the leading approach for cross-domain authorship attribution task involved text distortion and extraction of character n-grams to emphasize punctuation, numbers, and diacritic characters <ref type="bibr" target="#b1">[2]</ref>, as demonstrated by <ref type="bibr" target="#b2">[3]</ref>. The top performance in the 2018 style detection task was achieved by <ref type="bibr" target="#b3">[4]</ref>, who utilized features such as repetition, contracted wordforms, frequent words, quotation marks, vocabulary richness, and readability to train an ensemble classifier. However, starting from 2020, most participants have submitted solutions by fine-tuning pre-trained models <ref type="bibr" target="#b4">[5]</ref>. For example, in the 2023 edition, the highest accuracy on easy and medium datasets was achieved using BERT, RoBERTa, and ELECTRA combined with a binary classification layer <ref type="bibr" target="#b5">[6]</ref>.</p><p>This paper describes the solution submitted for the Multi-Author Style Change Detection task, part of the PAN 2024 workshop series. This task aims to identify paragraph-level writing style changes in a given text between consecutive paragraphs. It includes three levels of difficulty: easy, medium and hard <ref type="bibr" target="#b0">[1]</ref>. For each subtask, distinct datasets were provided: 1) Easy -paragraphs cover various topics, allowing topic information to aid in detecting authorship changes; 2) Medium -there is limited topical variety, requiring a greater emphasis on stylistic differences; 3) Hard -all paragraphs cover the same topic, thus relying solely on stylistic cues to identify changes. The entire dataset was in English and sourced from comments on Reddit. It included metadata indicating the points of author change between paragraphs, as well as the total number of authors within each set <ref type="bibr" target="#b6">[7]</ref>.</p><p>Given the stylometric nature of the task and the importance of stylistics in detecting authorship changes, we decided to employ a method that directly adds stylometric tags to the texts in the training dataset used for training models from BERT-family. Our approach aims to enhance the model's sensitivity to stylistic features, acknowledging that in authorship change detection, semantic content alone could be insufficient. This paper presents our methodology and findings, offering insights into the effectiveness of using proposed stylometric enhancements in training language models for authorship change detection. Additionally, it presents background studies aimed at determining whether the proposed method enhances the sensitivity of BERT-family language models to stylometric features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Background studies</head><p>To explain the rationale behind the proposed method, we conducted an additional experiment to determine which stylometric features are important in the author style change detection task. We calculated stylometric features that describe text complexity, text formality, text fogginess, and patterns related to punctuation and grammar. Subsequently, we computed the absolute values of differences for specific features between text pairs created from consecutive paragraphs. We hypothesized that absolute differences between pairs of texts would be smaller when there is no change in authorship and larger in the case of authorship change.</p><p>To assess the statistical significance of the differences in data distribution for each specific feature, we employed the Mann-Whitney U test. This is a non-parametric test used to determine whether there is a difference between two groups (i.e., these with changing and non-changing authorship) by comparing the rank sums of the two samples rather than the means. The null hypothesis (H0) states that the distributions of the two groups are identical, while the alternative hypothesis (H1) suggests that the distributions differ <ref type="bibr" target="#b7">[8]</ref>. We chose this test because our data did not follow a normal distribution which is required for the t-test for independent samples. Furthermore, we analyzed differences across two dimensions: real labels (0 and 1), predicted labels (0 and 1).</p><p>The background experiment was conducted on the validation datasets for the specific subtasks, using fine-tuned RoBERTa models, which served as our internal baselines and based on the following features: the number of sentences in the text, the average number of words per sentence, the count of punctuation marks, the count of personal words, the count of reported speech, the formality score, the Flesch Reading Ease score, the SMOG index, the Flesch-Kincaid Grade level, the Coleman-Liau index, the Automated Readability Index, the count of difficult words, and the frequencies of nouns, verbs, adjectives, adverbs, and prepositions.</p><p>Table <ref type="table" target="#tab_0">1</ref> presents normalized mean absolute differences obtained for easy, medium, and hard validation datasets for features that showed statistically significant differences across the authorship change (label: 1) and no authorship change (label: 0) groups within actual and predicted labels. Most measures consistently exhibited higher absolute differences when there was a change in authorship, evident in both real and predicted label distributions. Surprisingly, for the medium and hard datasets, higher diversity was observed for some features without author changes. For the medium dataset, this was seen in sentence complexity, and the frequency of nouns and verbs. For the hard dataset, it was noted in the frequency of nouns and adverbs, as well as the Coleman-Liau readability index. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">System Overview</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Methodology</head><p>Based on the results presented in <ref type="bibr" target="#b8">[9]</ref>, there are indications that models from BERT-family capture certain stylometric features. Building on these observations, we developed a method that enriches the text by directly incorporating stylometric tags. This approach aims to determine if this enhancement can improve model classification and make BERT family models more sensitive to stylometric characteristics. The experiment was carried out in four phases:</p><p>1. Fine-tuning of models and selection of baseline models.</p><p>2. Feature engineering.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.</head><p>Training models with data augmented by stylometric tags on entire dataset and subsamples. 4. Conducting experiments by combining multiple tags within a single text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Baseline models selection</head><p>For each dataset variant (easy, medium, hard), we fine-tuned the RoBERTa base and DeBERTa v3 base models in the initial phase and selected baseline models based on the results obtained. We experimented with hyperparameters, including learning rates of 1e-5, 2e-5, and 3e-5, and seed values of 42, 100, and 1111. The default AdamW optimizer and a batch size of 4 were used. Additionally, we tested an approach implementing layer-wise decay in the RoBERTa base model, applying different learning rates to each layer to capture either general language information or task-specific details. Each model was trained for 5 epochs using the original dataset provided by the task organizers. The data preparation involved concatenating two consecutive paragraphs with a separator token. After completing this phase, we chose the fine-tuned RoBERTa-base model as the baseline for the second part of the experiment. The training was conducted on server equipped with four NVIDIA GeForce RTX 2080 Ti GPU cards, each with 11 GB of memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Features Engineering</head><p>In the second iteration, we enhanced the original datasets by integrating stylometric feature tags. The features were chosen based on a manual review of prediction errors from the baseline models.</p><p>We decided to augment the dataset by adding tags related to the following stylometric dimensions:</p><p>• text complexity • text formality • punctuation.</p><p>To quantify text complexity, we developed two metrics: text length, determined by counting the number of sentences, and sentence complexity, calculated as the average number of words per sentence.</p><p>Text formality was evaluated using the method proposed by <ref type="bibr" target="#b8">[9]</ref>. This method introduced "seed words" with the same semantic meaning but different levels of formality. The seed pairs included among others: my gosh -Jesus, breathing -respiratory, yeah -yes, ten years -decade, first of all -foremost, a whole bunch -full, and my dad -father. The method involved calculating the mean difference between the word embeddings of each pair to create a "stylometric embedding". The formality level of a text was then determined by measuring the cosine similarity between the input text embedding and the "stylometric embedding". We also created features to measure text formality by analyzing reported speech occurrences and personal style. The degree of personal style was measured by the frequency of words used to express personal opinions or experiences (e.g., I, me, my).</p><p>We also analyzed punctuation patterns, focusing specifically on infrequently used punctuation marks: the ampersand, ellipsis, question mark, quotation mark, and semicolon.</p><p>To generate tags for the dataset based on text length, sentence complexity, and formality, we computed descriptive statistics: mean, standard deviation, and quantiles across the entire training dataset. These statistics were then used to establish thresholds for embedding stylometric tags into the original text.</p><p>For text length, we prefixed the original text with The text is long. if it contained at least three sentences and with The text is short. if it contained only one sentence. For sentence complexity, we added phrase The text contains long sentences. at the beginning if the average sentence length exceeded 21 words, and The text contains short sentences. if it was below 15 words. For formality, we used the phrase The text is highly informal. if the formality measure exceeded 0.2, and The text is formal. if it was below 0.05. Additionally, we added the tag This text contains reported speech if any reported speech patterns were detected.</p><p>To generate tags related to punctuation and personal style, we added specific tag when a designated word or punctuation mark occurred in the text. For personal style, we identified words such as "I", "me" and "my". For punctuation, we looked for marks including the ampersand, ellipsis, question mark, quotation mark, and semicolon.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Training models with data augmented by stylometric tags on entire dataset and subsamples</head><p>In the subsequent phase, we trained models using datasets augmented with specific tags. To evaluate the impact of these tags, we used the same hyperparameters as those employed for the baseline models.</p><p>To expedite the training process, initial experiments were conducted on a randomly reduced dataset comprising of 10,000 observations. Following these preliminary experiments, we proceeded to train on the entire dataset, exploring several variants: datasets modified by the addition of tags: the number of sentences, average words per sentence, punctuation, personal style, reported speech, and formality level, each tested separately. Additionally, models were trained on data incorporating various combinations of these tags.</p><p>Initially, we combined all engineered tags; however, this approach introduced excessive noise into the data. Given the RoBERTa base model's token limit of 512, we then tested combinations of tags related to personal words and punctuation, which were relatively short. As a result, three additional models were trained for each task level (easy, medium, and hard) in this phase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Text Augmentation</head><p>Using TIRA <ref type="bibr" target="#b9">[10]</ref>, we opted to make only the final submission for models that demonstrated the best performance during the internal testing phase. The software used for the final submission processes a pair of texts as input and adds different tags depending on the dataset level:</p><p>• Hard dataset: The input text pairs are modified by adding a punctuation tag.</p><p>• Medium dataset: The input text pairs are modified by adding a tag related to sentence complexity (average words per sentence). • Easy dataset: The input text pairs are modified by adding both the punctuation tag and the tag related to personal words</p><p>Here are examples of original and tagged text using the proposed approach for easy, medium, and hard datasets:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Modification by adding punctuation and personal style tag</head><p>Original text: Why did I think hemp production started earlier? I wonder if it was just government controlled back then and the 2014 farm bill opened it up more for private business...</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tagged text: Why did I (personal style) think hemp production started earlier? (question mark) I (personal style) wonder if it was just government controlled back then and the 2014 farm bill opened it up more for private business. . . (ellipse mark)</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Modification by adding only a sentence complexity tag</head><p>Original text: If the Russian soldiers (or their wives back in Russia) hear this, it could keep the already low morale of the Russian solders low. . . .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tagged text:</head><p>The text contains long sentences. If the Russian soldiers (or their wives back in Russia) hear this, it could keep the already low morale of the Russian solders low. . . .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Modification by adding only a punctuation tag</head><p>Original text:</p><p>The tu quoque defense (Latin for ýou too ) asserts that the authority trying a defendant has committed the same crimes of which they are accused. It is related to the legal principle of clean hands, reprisal, and "an eye for an eye". The tu quoque defense does not exist in international criminal law and has never been accepted by an international court.</p><p>Tagged text:</p><p>The tu quoque defense (Latin for ýou too ) asserts that the authority trying a defendant has committed the same crimes of which they are accused. It is related to the legal principle of clean hands, reprisal, and (quotation mark) "an eye for an eye" (quotation mark). The tu quoque defense does not exist in international criminal law and has never been accepted by an international court.</p><p>After preprocessing the text pairs according to a specific schema, the system makes predictions using models trained on tagged versions of the dataset. Each subtask utilizes a separate model, and the training methodology is detailed in Section 3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Internal testing phase</head><p>During our internal testing phase, we trained the models on subsamples and the entire dataset (separately for easy, medium, and hard tasks) on original and tagged data. Table <ref type="table" target="#tab_1">2</ref> presents the macro F1 scores for hard, medium, and easy datasets obtained by training on a randomly selected subsample (10,000 observations). The results indicate that incorporating stylometric tags -in most cases -led to an improvement in the F1-macro score, with the best results achieved by adding reported speech tag and text length tag for hard, and easy datasets, respectively. Augmenting the dataset with stylometric tags improved the macro F1 score by 2% and 1% for the hard, and easy datasets, respectively. Adding tags for the medium dataset does not affect the results, except for the sentence complexity tag, which decreases the macro F1 score by approximately 1%. Table <ref type="table" target="#tab_2">3</ref> presents the classification results for the hard, medium, and easy validation datasets trained on the entire tagged dataset, alongside the baseline model outcomes. Notably, the impact of adding tags was significantly lower than the results achieved by training on dataset subsamples. The best results for the hard dataset were obtained by adding the punctuation tag, achieving an macro F1 score of 0.813 on validation dataset, while adding other tags surprisingly worsened the baseline results. For the medium dataset, the most significant improvement was observed compared to the baseline, with the highest difference from 0.819 to 0.836. Each tag or combination of tags improved over the baseline results. For the easy dataset, the best results were achieved by adding personal style, formality level, and punctuation tags, as well as a combination of punctuation and personal tags. However, the improvement over the baseline was only 0.001.</p><p>Building on previous studies, we aimed to determine whether incorporating stylometric tags directly into the text enhances model sensitivity to stylometric features. Our objective was to establish whether the observed improvements from adding stylometric features were genuinely due to increased model sensitivity to specific stylometric aspects (e.g., better understanding of punctuation marks and their significance in detecting author style changes) or if other factors, such as randomness, played a role.</p><p>To test this, we validated the distribution of mean absolute differences between the real labels for no authorship change group and the correct predictions for no authorship change group in both baseline models and those trained on the tagged dataset. The assumption was that if adding tags enhances the model's sensitivity to stylometric features, the difference between the trend observed for actual labels and model predictions would be smaller for the model trained with tags. Table <ref type="table" target="#tab_3">4</ref> presents the results of the mean absolute differences across two dimensions: real labels (0) and correct predictions (0) from baseline models and models trained on tagged data. Surprisingly, models trained on tagged datasets showed higher differences between predictions and real labels compared to baseline models. This suggests that the observed improvement in macro F1 scores may not be due to increased sensitivity to stylometric features, but rather other factors. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Final submission</head><p>Table <ref type="table" target="#tab_4">5</ref> presents the results from the wary-pita system, which was our final submission via Tira, evaluated on training, validation, and test datasets. The results on the unseen test dataset are slightly lower than on the validation set. However, our internal testing indicates that the method has potential for further improvements. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>This study introduced a method for directly integrating text with stylometric information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Main Findings:</head><p>• The method yielded the most significant F1 improvements when training on smaller datasets, suggesting its potential use for data types that are difficult to obtain (e.g., authorship for insurance claims). While there were tags that improved F1 when training on the entire dataset, the improvements over the baseline were minimal, especially for the hard and easy datasets. • An attempt was made to determine if adding tags with stylometric information genuinely increased the model's sensitivity to specific stylometric features. However, the analysis did not confirm this hypothesis, indicating the need for further exploration in this area.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Future Work:</head><p>• Given the observed improvements on the medium dataset and the training on subsamples, we see potential in the proposed method. Future research should focus on a detailed analysis of which stylometric features are significant. We hypothesize that adding tags may be particularly beneficial for stylometric features that BERT-family models cannot "learn" independently by itself. • Additionally, BERT-family models typically have a limited number of tokens they can process.</p><p>Therefore, future work should focus on constructing tags in a concise or implicit manner to accommodate this limitation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparison of mean absolute diffrence for groups: no authorsip change (label: 0), authorship change (label: 1)</figDesc><table><row><cell>Dataset</cell><cell>Features</cell><cell cols="2">Real label</cell><cell>Prediction</cell></row><row><cell></cell><cell></cell><cell>0</cell><cell>1</cell><cell>0</cell><cell>1</cell></row><row><cell>Easy</cell><cell>sentence complexity</cell><cell cols="3">0.047 0.090 0.047 0.090</cell></row><row><cell></cell><cell>punctuation</cell><cell cols="3">0.018 0.027 0.017 0.027</cell></row><row><cell></cell><cell>nouns frequency</cell><cell cols="3">0.132 0.191 0.133 0.191</cell></row><row><cell></cell><cell>difficult words frequency</cell><cell cols="3">0.079 0.140 0.074 0.140</cell></row><row><cell></cell><cell>verbs frequency</cell><cell cols="3">0.175 0.242 0.177 0.242</cell></row><row><cell></cell><cell>adverbs frequency</cell><cell cols="3">0.132 0.149 0.131 0.149</cell></row><row><cell></cell><cell>coleman liau index</cell><cell cols="3">0.003 0.008 0.003 0.008</cell></row><row><cell></cell><cell cols="4">automated readability index 0.006 0.019 0.006 0.019</cell></row><row><cell></cell><cell>text length</cell><cell cols="3">0.028 0.065 0.027 0.065</cell></row><row><cell></cell><cell>personal words</cell><cell cols="3">0.021 0.052 0.020 0.052</cell></row><row><cell></cell><cell>adjectives frequency</cell><cell cols="3">0.096 0.114 0.098 0.114</cell></row><row><cell cols="2">Medium sentence complexity</cell><cell cols="3">0.086 0.078 0.085 0.078</cell></row><row><cell></cell><cell>punctuation</cell><cell cols="3">0.041 0.062 0.042 0.064</cell></row><row><cell></cell><cell>nouns frequency</cell><cell cols="3">0.226 0.250 0.225 0.254</cell></row><row><cell></cell><cell>verbs frequency</cell><cell cols="3">0.276 0.208 0.262 0.213</cell></row><row><cell></cell><cell>adverbs frequency</cell><cell cols="3">0.136 0.182 0.146 0.178</cell></row><row><cell></cell><cell>coleman liau index</cell><cell cols="3">0.134 0.186 0.140 0.187</cell></row><row><cell></cell><cell>personal words</cell><cell cols="3">0.020 0.038 0.021 0.039</cell></row><row><cell></cell><cell>formality</cell><cell cols="3">0.198 0.230 0.194 0.239</cell></row><row><cell></cell><cell>prepositions frequency</cell><cell cols="3">0.205 0.228 0.209 0.227</cell></row><row><cell>Hard</cell><cell>sentence complexity</cell><cell cols="3">0.121 0.111 0.225 0.231</cell></row><row><cell></cell><cell>punctuation</cell><cell cols="3">0.055 0.063 0.055 0.062</cell></row><row><cell></cell><cell>nouns frequency</cell><cell cols="3">0.167 0.161 0.166 0.163</cell></row><row><cell></cell><cell>verbs frequency</cell><cell cols="3">0.202 0.210 0.150 0.131</cell></row><row><cell></cell><cell>adverbs frequency</cell><cell cols="3">0.182 0.180 0.180 0.182</cell></row><row><cell></cell><cell>coleman liau index</cell><cell cols="3">0.176 0.157 0.175 0.161</cell></row><row><cell></cell><cell>personal words</cell><cell cols="3">0.049 0.050 0.050 0.049</cell></row><row><cell></cell><cell>formality</cell><cell cols="3">0.203 0.220 0.202 0.218</cell></row><row><cell></cell><cell>prepositions frequency</cell><cell cols="3">0.175 0.181 0.175 0.180</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Comparison of macro F1 score achived on the validation datasets by training on the subsamples of training datasets for easy, medium and hard subtasks</figDesc><table><row><cell></cell><cell cols="3">hard medium easy</cell></row><row><cell>baseline</cell><cell>0,75</cell><cell>0,82</cell><cell>0,97</cell></row><row><cell>personal</cell><cell>0,76</cell><cell>0,82</cell><cell>0,97</cell></row><row><cell cols="2">sentence complexity 0,76</cell><cell>0,81</cell><cell>0,96</cell></row><row><cell>text length</cell><cell>0,74</cell><cell>0,82</cell><cell>0,98</cell></row><row><cell>formality</cell><cell>0,76</cell><cell>0,82</cell><cell>0,96</cell></row><row><cell>reported speech</cell><cell>0,77</cell><cell>0,82</cell><cell>0,97</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Comparison of macro F1 score achived on the validation datasets by training on the entire training datasets for easy, medium and hard subtasks</figDesc><table><row><cell>hard medium</cell><cell>easy</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Comparison of Mean Absolute Differences between Correct Predictions and Real Labels for Tagged and Standard Datasets</figDesc><table><row><cell>Dataset/Feature</cell><cell cols="3">0 (actual label) 0 (correct prediction) Difference</cell></row><row><cell>Hard/punctuation tagged</cell><cell>0.663</cell><cell>0.666</cell><cell>0.003</cell></row><row><cell>Hard/punctuation standard</cell><cell>0.663</cell><cell>0.661</cell><cell>0.002</cell></row><row><cell>Medium/sentence complexity tagged</cell><cell>11.050</cell><cell>11.463</cell><cell>0.414</cell></row><row><cell>Medium/sentence complexity standard</cell><cell>11.050</cell><cell>11.078</cell><cell>0.028</cell></row><row><cell>Easy/personal style tagged</cell><cell>0.413</cell><cell>0.375</cell><cell>0.038</cell></row><row><cell>Easy/personal style standard</cell><cell>0.413</cell><cell>0.388</cell><cell>0.024</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5</head><label>5</label><figDesc>Performance metrics across different datasets and difficulty levels</figDesc><table><row><cell>Dataset</cell><cell cols="3">Easy Medium Hard</cell></row><row><cell>Train dataset</cell><cell>0.999</cell><cell>0.896</cell><cell>0.922</cell></row><row><cell cols="2">Validation dataset 0.983</cell><cell>0.836</cell><cell>0.813</cell></row><row><cell>Test dataset</cell><cell>0.981</cell><cell>0.821</cell><cell>0.805</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The research is supported by the project "OpenFact -artificial intelligence tools for verification of veracity of information sources and fake news detection" (INFOSTRATEG-I/0035/2021-00), granted within the INFOSTRATEG I program of the National Center for Research and Development, under the topic: Verifying information sources and detecting fake news.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of PAN 2024: Multi-Author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bevendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">B</forename><surname>Casals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dementieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elnagar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Freitag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korenčić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Panchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taulé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ustalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mulhem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Quénot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">M D</forename><surname>Nunzio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>De Herrera</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the author identification task at pan-2018: cross-domain authorship attribution and style change detection</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Specht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF 2018 Evaluation Labs</title>
				<meeting><address><addrLine>Avignon, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">September 10-14, 2018/Cappellato</date>
		</imprint>
	</monogr>
	<note>Linda</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Each-usp ensemble cross-domain authorship attribution</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Custódio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Paraboni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An ensemble-rich multi-aspect approach for robust style change detection</title>
		<author>
			<persName><forename type="first">D</forename><surname>Zlatkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kopev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mitov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Atanasov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hardalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Koychev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nakov</surname></persName>
		</author>
		<idno>CEUR-WS. org</idno>
	</analytic>
	<monogr>
		<title level="m">CLEF 2018 Evaluation Labs and Workshop-Working Notes Papers</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Overview of the multi-author writing style analysis task at pan 2023</title>
		<title level="s">Working Notes of CLEF</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Enhancing writing style change detection using transformer-based models and data augmentation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hashemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Shi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Overview of the Multi-Author Writing Style Analysis Task at PAN 2024</title>
		<author>
			<persName><forename type="first">E</forename><surname>Zangerle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mayerl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Mann-whitney u test</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">E</forename><surname>Mcknight</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Najab</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Corsini encyclopedia of psychology</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1" to="1" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">Q</forename><surname>Lyu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Apidianaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Callison-Burch</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.18657</idno>
		<title level="m">Representation of lexical stylistic features in language models&apos; embedding space</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Continuous Integration for Reproducible Shared Tasks with TIRA</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fröbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wiegmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kolyada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Grahm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Elstner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Loebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hagen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-28241-6_20</idno>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023)</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Maistro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Joho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Davis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Gurrin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Caputo</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="236" to="241" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
