<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Impact of Data Augmentation on Hate Speech Detection in Roman Urdu</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Fariha</forename><surname>Maqbool</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di informatica</orgName>
								<orgName type="institution">sistemistica e comunicazione University of Milano</orgName>
								<address>
									<addrLine>Bicocca Viale Sarca 336</addrLine>
									<postCode>20126</postCode>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Blerina</forename><surname>Spahiu</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di informatica</orgName>
								<orgName type="institution">sistemistica e comunicazione University of Milano</orgName>
								<address>
									<addrLine>Bicocca Viale Sarca 336</addrLine>
									<postCode>20126</postCode>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Maurino</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di informatica</orgName>
								<orgName type="institution">sistemistica e comunicazione University of Milano</orgName>
								<address>
									<addrLine>Bicocca Viale Sarca 336</addrLine>
									<postCode>20126</postCode>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Impact of Data Augmentation on Hate Speech Detection in Roman Urdu</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">96359B4B12C0B56A07A31BBB8A3AE365</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The prevalence of hate speech leads to an increase in hate crimes, online violence, and serious harm to social safety, physical security, and cyberspace. To address this issue, several studies have been conducted on hate speech detection in European languages, whereas little attention has been paid to low-resource South Asian languages, making social media vulnerable for millions of users. Due to the scarcity of the datasets and the samples available, there is a need to apply some strategies to increase the data samples. In this paper, we improved the performance of the already fine-tuned m-Bert model by applying data augmentation techniques to one of the datasets on hate speech on tweets in Roman Urdu language. F1-score and accuracy matrix have been used to compare the results. We also experiment to determine the optimal percentage of augmented data to be included and the percentage of words augmented in each instance of data. The new RUHSOLD++ Dataset containing the augmented data has also been published publicly. The improvement in hate speech detection of the model proved that the performance of the models can be improved by applying data augmentation techniques to the dataset with a limited number of instances.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The exponential growth of social media platforms like Facebook 1 , x (formally Twitter) 2 , and YouTube 3 has provided a global stage for individuals from diverse cultures and social backgrounds to communicate and share their opinions on a myriad of topics. While these platforms uphold the principle of freedom of speech, some users also negatively exploit this freedom and abuse other users on the basis of gender, religion or race. This surge in harmful content has underscored the need for increased research in natural language processing (NLP) to effectively detect instances of hate speech. The consequences faced by victims of targeted hate speech are not limited to physical harm; they also experience a profound sense of dread and rejection within their communities. Recognizing the urgency to create online spaces free of racism and hate speech, researchers emphasize the importance of early detection mechanisms to mitigate the pervasive harm caused by such content <ref type="bibr" target="#b0">[1]</ref>. This challenge extends beyond the English language, as millions of users worldwide employ diverse languages as vehicles for spreading f.maqbool@campus.unimib.it (F. Maqbool); blerina.spahiu@unimib.it (B. Spahiu); andrea.maurino@unimib.it (A. Maurino) hate. Despite extensive research in English, there exists a noticeable dearth of datasets and studies on languages like Urdu. Urdu, spoken by over 170 million people globally, faces unique challenges in written communication, with an alphabet that cannot be easily mapped onto an English keyboard. The Urdu language comprises 40 characters, yet English keyboards, accommodating only 26 letters, face limitations in fully supporting Urdu script. Attempting to map the Urdu alphabet directly onto an English keyboard proves impractical due to these constraints. Consequently, the predominant approach among Urdu speakers involves the use of Roman Urdu particularly on social media platforms, Roman Urdu is a transliteration version of Urdu using English letters. The use of Roman Urdu has sharply expanded as a result of social media's rising adoption <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. Users of these tools regularly utilize these platforms to share their opinions about a variety of products, services, politics, and other items. However, despite its widespread use, Roman Urdu (RU) encounters challenges such as a lack of linguistic resources, annotated datasets, and dedicated language models <ref type="bibr" target="#b3">[4]</ref>. To address these limitations and enhance model performance, researchers have explored data augmentation techniques <ref type="bibr" target="#b2">[3]</ref>. In a notable application, simple data augmentation techniques were employed on a low-resourced language dataset with a limited number of samples. The results demonstrated a significant improvement in model performance, underscoring the potential of augmentation in mitigating the challenges posed by scarce linguistic resources. Additionally, experiments aimed to identify the optimal percentage of augmented data to be integrated with the original dataset, aiming to boost model performance while minimizing training time. This multifaceted approach contributes to ongoing efforts to combat hate speech across various languages, fostering inclusivity and positive discourse in online spaces. In this paper we make the following contributions: (i) enrich RUHSOLD dataset <ref type="bibr" target="#b2">[3]</ref> and create a new RUHSOLD++ dataset; (ii) provide a new custom function in Python to dynamically alter the spelling of selected words within a sentence; and (iii) provide an empirical analysis of the different data augmentation methods. The paper is structured as follows: Section 2 discusses approaches to detect hate speech in the Roman Urdu language. In Section 3 we provide the methodology to augment our initial dataset. Section 4 provides the analysis and findings by applying different methods for data augmentation while conclusions and future work end the paper in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The issue of abusive speech has been a longstanding focus within the research community. In earlier investigations, attempts to identify abusive users primarily relied on lexical and syntactic features extracted from their posts <ref type="bibr" target="#b4">[5]</ref>. However, recent advancements in automated hate speech detection have witnessed substantial growth. The availability of large datasets has prompted a shift in academic research towards more data-intensive and sophisticated models, notably leveraging deep learning techniques <ref type="bibr" target="#b5">[6]</ref> and graph embedding methods <ref type="bibr" target="#b6">[7]</ref>. Notably, transformer-based language models like BERT <ref type="bibr" target="#b7">[8]</ref> have gained immense popularity in various downstream tasks, proving to be particularly effective in surpassing traditional deep learning models such as CNN-GRU <ref type="bibr" target="#b8">[9]</ref>, LSTM <ref type="bibr" target="#b9">[10]</ref>, etc., for the detection of abusive language <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b11">[12]</ref>. This evolution highlights the dynamic nature of research in addressing the complexities of abusive speech detection. Detecting hate and abusive speech in low-resourced languages presents a formidable challenge, as exemplified in the context of Roman Urdu. The scarcity of linguistic resources for Roman Urdu spurred the creation of the RUHSOLD dataset by H. Rizwan et al. <ref type="bibr" target="#b2">[3]</ref>. Comprising 10,012 tweets, this dataset stands out for its dual approach, offering both coarse-grained and fine-grained labeling of hate speech instances. In their research, Rizwan and colleagues not only curated this valuable dataset but also proposed a deep learningbased architecture specifically tailored for hate speech detection in Roman Urdu. Addressing the broader landscape of multilingual abusive speech, M. Das et al. <ref type="bibr" target="#b12">[13]</ref> conducted an indepth investigation into the performance of multilingual models across eight distinct Indic languages. In a noteworthy application, they employed m-BERT <ref type="bibr" target="#b7">[8]</ref> and MuRIL <ref type="bibr" target="#b13">[14]</ref> models on the RUHSOLD dataset <ref type="bibr" target="#b2">[3]</ref> to gauge their efficacy in detecting abusive speech. Through a series of meticulously designed experiments, encompassing various settings, Das and team explored the nuances of multilingual hate speech detection. Their findings underscored the effectiveness of model transfers, revealing that transferring knowledge from one language to another enhances the overall performance of the models. This body of research not only contributes to the evolving field of hate speech detection but also illuminates the specific challenges associated with low-resourced languages like Roman Urdu. By providing a robust dataset and proposing dedicated architectures, these studies lay essential groundwork for future endeavors aimed at combating hate speech across diverse linguistic landscapes. The insights gained from these investigations, especially regarding the transferability of models, offer valuable guidance for the development of more inclusive and effective hate speech detection systems in multilingual contexts. In a meticulous analysis, M. M. Khan. et al. <ref type="bibr" target="#b14">[15]</ref> delved into the complexities of hate speech detection in Roman Urdu, manually examining over 90,000 tweets to curate a substantial corpus of 5,000 Roman Urdu tweets. Their significant contribution extended beyond dataset creation, as they systematically employed five supervised learning approaches, including a sophisticated deep learning technique, to rigorously evaluate and compare their effectiveness in hate speech detection. The results of their comprehensive study revealed that, across two levels of categorization, logistic regression outperformed all other techniques, opening up a viable path for robust of hate speech detection in Roman Urdu. Recognizing the challenges posed by the low resources of Roman Urdu, Azam et al. <ref type="bibr" target="#b15">[16]</ref> undertook a proactive exploration of data augmentation strategies. Leveraging both easy data augmentation and transformer-based augmentation approaches, they aimed to enhance hate speech detection capabilities in Roman Urdu. The researchers conducted experiments using existing datasets in Roman Urdu and baseline models to meticulously assess the impact of augmentation techniques. Their findings unequivocally demonstrated that the performance of hate speech detection models could indeed be significantly improved by the strategic application of augmentation techniques to the dataset. This research not only contributes to the optimization of hate speech detection in low-resourced languages like Roman Urdu but also highlights the potential of augmentation strategies as valuable tools in mitigating the impact of resource constraints, providing valuable insights for the ongoing evolution of hate speech detection methodologies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset</head><p>We employed the RUHSOLD dataset, a comprehensive collection of tweets in Roman Urdu created by H. Rizwan et al. <ref type="bibr" target="#b2">[3]</ref>. The authors meticulously established a gold standard for two distinct sub-tasks. Our focus centered on the first sub-task, which involves binary labels categorizing content as either Hate-Offensive (labeled as 0) or Normal (labeled as 1), representing inoffensive language. The dataset comprises a total of 10,000 tweets, thoughtfully partitioned into training, testing, and validation sets in a ratio of 70%, 20%, and 10%, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data Augmentation</head><p>In our quest to boost our dataset's size and improve the overall performance, we strategically employed Noise-based Data Augmentation techniques on training data. We dove into a detailed exploration, trying out different percentages for augmenting the dataset to strike the right balance. Moreover, we played around with the augmentation process by adjusting the percentage of words in each tweet that underwent these augmentation techniques. This nuanced approach was not just about expanding the dataset; it was about delicately adjusting the augmentation impact and finding the sweet spot between quantity and quality to strengthen our model's resilience. Through methodical experimentation, we aimed to uncover the most effective configurations that could genuinely enhance the overall performance of our model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Random Swapping</head><p>Random swapping is an effective technique in the realm of noise-based data augmentation. This technique is based on randomly swapping the words or tokens from a tweet. For example: "hum kisi se km nhi" becomes "km kisi se hum nhi".</p><p>This operation adds variances to the dataset without changing the overall sentiment or context of the text. In order to help the model generalize and function well on a variety of linguistic patterns, it is intended to be exposed to various word configurations.The percentage of word augmentation in random swapping directly controls the degree of variability injected into the dataset. Therefore, it is essential to find the optimal percentage of words which should be swapped during augmentation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Random Deletion</head><p>Another noise-based data augmentation is Random Deletion. This strategy involves the deliberate and random removal of words or tokens from a given sentence, introducing an element of unpredictability and variability. We designate a specific percentage of words within the sentence for potential removal, aiming to strike a careful balance between introducing noise for robustness and preserving the coherence of the text. By implementing this intentional randomness, we infuse the dataset with a dynamic quality, fortifying the model's adaptability to diverse linguistic nuances. This method serves as a potent instrument, enriching our model's adaptability and efficacy across a wide range of textual inputs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Spelling Augmentation</head><p>Spelling Augmentation adds a layer of complexity by altering the spelling of words within a sentence. This process entails replacing one or more characters in a word with randomly chosen alternatives and deliberately introducing a controlled amount of noise into the data. The purpose here is to diversify the linguistic patterns in the dataset, enhancing the model's ability to handle variations in spelling and promoting resilience against potential inconsistencies in user-generated content. This meticulous introduction of noise through character substitution serves as a strategic maneuver, refining our model's capacity to adapt to a wide array of spelling idiosyncrasies. For example: "chal ja tujhy maaf kia" becomes "chal aa tujha maaf kia".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">The new RUHSOLD++ Dataset</head><p>After employing the data augmentation techniques, we created a new RUHSOLD++ Dataset <ref type="foot" target="#foot_0">4</ref>which can be accessed publicly to promote future work. The dataset consists of 3 types of data with swap, delete and spelling augmentation applied. The dataset contains the augmented data in train and validation sets distributed uniformly and unaltered test data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Model</head><p>The m-BERT model has garnered significant attention in the realm of abusive speech research. Its efficacy is attributed to being pre-trained on a comprehensive dataset, comprising the extensive content of Wikipedia<ref type="foot" target="#foot_1">5</ref> , employing a masked language modeling (MLM) <ref type="bibr" target="#b16">[17]</ref> objective across 104 languages. This pre-training involves 12 fully connected transformer encoder layers, incorporating a self-attention mechanism to efficiently process contextual information. It is worth noting that m-BERT, while powerful, has a token limit of 512, necessitating the use of a fine-tuned variant introduced by Das, M. et al. <ref type="bibr" target="#b6">[7]</ref>. Das and colleagues enhanced the original m-BERT by incorporating a fully connected layer, aligning its output with the CLS (classification) token in the input. This added layer introduces a level of specificity, with the output reflecting the model's interpretation of the input sentence, often represented by the CLS token output. This nuanced modification allows the model to capture and interpret complex contextual nuances within the given token limit, contributing to its efficacy in understanding and classifying abusive speech patterns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimentation and Results</head><p>In this section, we describe the experimentation conducted on the RUHSOLD <ref type="bibr" target="#b2">[3]</ref> dataset using fined tuned m-Bert model, a refinement proposed by M. Das et al. <ref type="bibr" target="#b6">[7]</ref>. In our implementation, we use the PyTorch library <ref type="foot" target="#foot_2">6</ref> in Python, configuring each model to run for 10 epochs with an Adam optimizer and a batch size of 16.</p><p>The experimentation extended to the exploration of data augmentation techniques, aiming to strike an optimal balance between computational efficiency and accuracy. The primary focus was on determining the most suitable percentage of the dataset for applying augmentation. To achieve this, we selected a portion of the original training dataset that underwent Random Swap Augmentation. Leveraging the NLPAug<ref type="foot" target="#foot_3">7</ref> library in Python, a random percentage of 30% for words was chosen, indicating that 30% of the total words in a tweet would undergo swapping with each other. Following the generation of augmented data, it was seamlessly integrated with the original dataset and subsequently shuffled to mitigate overfitting concerns. The outcomes of this experiment are succinctly presented in Table <ref type="table" target="#tab_0">1</ref>, showcasing the impact of Random Swap at varying percentages of the dataset. This experimentation highlights the strategic choices made in augmenting the data to achieve an optimal trade-off between efficiency and accuracy. We employed the Macro F1 score (mF1-score) as a performance metric, along with other evaluation measures. The Macro F1 score allows to assess the performance of each class individually while giving equal weight to all classes. Examining the results, we observe a consistent improvement in both accuracy and mF1-score as the model is trained on augmented data. However, a notable finding emerges: the highest accuracy is attained when augmentation is applied to 50% of the data. Beyond this threshold, further increasing the size of augmented data leads to diminishing returns, resulting in a decline in accuracy and mF1-score. This suggests that the model tends to overfit the training data when subjected to an excessive amount of augmented information. While the accuracy of the validation data may show promising signs, the model's performance on unseen data, specifically the test data, begins to decrease.</p><p>With the optimal augmented data percentage identified, our exploration extends to varying the percentage of words swapped in each iteration. The results, as depicted in Table <ref type="table" target="#tab_2">2</ref>, indicate that the overall accuracy and mF1-score exhibit minimal fluctuations with changes in the word augmentation percentage. However, precision and recall values do showcase variations corresponding to alterations in the word augmentation percentage. This nuanced observation underscores the importance of fine-tuning not only the quantity of augmented data but also the specific aspects of augmentation. Subsequently, we implemented the Delete Data Augmentation on 50% of the original training dataset. Leveraging the NLPAug library in Python, we conducted experiments to generate new data by selectively removing certain words in each row. This augmented dataset was seamlessly integrated with the original data, effectively amplifying the training set by 50%. Our exploration further extended to varying the percentage of words designated for deletion in each iteration. The outcomes of this experiment are presented in Table <ref type="table" target="#tab_3">3</ref>. The results demonstrate that this approach not only diversifies the training data but also involves fine-tuning the augmentation to achieve improvements in model performance. In implementing spelling augmentation, we crafted a custom function in Python to dynamically alter the spelling of selected words within each sentence. This function provides the flexibility to adjust the percentage of words in each row subject to augmentation. The outcomes of this experiment are presented in Table <ref type="table" target="#tab_5">4</ref>. Notably, the results reveal an improvement in accuracy as we incrementally raise the percentage of augmented words in each row. However, a cautious approach was adopted, refraining from further increasing the percentage to prevent potential distortion of the sentence's meaning. Beyond a certain threshold, excessive alterations could compromise the contextual integrity of the sentence, potentially undermining the model's overall performance.</p><p>After conducting a comprehensive array of experiments, our findings show that the most favorable accuracy was achieved through swap data augmentation. The optimal model, exhibiting the highest accuracy, emerged from training with an additional 50% of data, where 30% of words in each row were subject to swapping. This configuration demonstrated the finest balance between data enrichment and model performance enhancement.</p><p>To provide a visual representation of the model's performance, Figure <ref type="figure" target="#fig_1">1</ref> presents the confusion matrix for this optimal model, showcasing the details of how well the model navigated and classified instances with the applied swap data augmentation. This synthesis of experimentation outcomes reinforces not only the efficacy of swap data augmentation but also the significance of precise configurations in achieving the model's peak performance. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Work</head><p>Recognizing the pressing need to combat hate speech within the constraints of limited resources, our paper lies in the strategic application of data augmentation techniques to linguistic datasets on social media. We assert that applying data augmentation techniques to the dataset helps to increase the dataset size and improves the overall model performance. We experimented with determining the ideal percentage of augmented data to seamlessly integrate with the original dataset. This exploration aimed not only to enhance model training efficiency but also</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy * Corresponding author.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Confusion Matrix for best model with swap data augmentation</figDesc><graphic coords="8,151.80,349.45,291.69,163.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Random Swap with Varying Overall Augmented Data</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>% of original data augmented Precision Recall mF1-score Accuracy</head><label></label><figDesc></figDesc><table><row><cell>0</cell><cell>0.873</cell><cell>0.876</cell><cell>0.863</cell><cell>0.875</cell></row><row><cell>20</cell><cell>0.910</cell><cell>0.907</cell><cell>0.902</cell><cell>0.903</cell></row><row><cell>30</cell><cell>0.913</cell><cell>0.917</cell><cell>0.909</cell><cell>0.909</cell></row><row><cell>50</cell><cell>0.914</cell><cell>0.925</cell><cell>0.913</cell><cell>0.913</cell></row><row><cell>60</cell><cell>0.927</cell><cell>0.891</cell><cell>0.904</cell><cell>0.905</cell></row><row><cell>80</cell><cell>0.922</cell><cell>0.893</cell><cell>0.902</cell><cell>0.903</cell></row><row><cell>100</cell><cell>0.917</cell><cell>0.883</cell><cell>0.898</cell><cell>0.898</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Random Swap with Varying Word Swap Rate</figDesc><table><row><cell cols="5">Words swapped per instance (%) Precision Recall mF1-score Accuracy</cell></row><row><cell>20</cell><cell>0.909</cell><cell>0.916</cell><cell>0.906</cell><cell>0.907</cell></row><row><cell>30</cell><cell>0.914</cell><cell>0.925</cell><cell>0.913</cell><cell>0.913</cell></row><row><cell>40</cell><cell>0.929</cell><cell>0.907</cell><cell>0.913</cell><cell>0.913</cell></row><row><cell>50</cell><cell>0.926</cell><cell>0.910</cell><cell>0.913</cell><cell>0.913</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Random Delete with Varying Word Deletion Rate</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Words deleted per instance (%) Precision Recall mF1-score Accuracy</head><label></label><figDesc></figDesc><table><row><cell>20</cell><cell>0.92</cell><cell>0.88</cell><cell>0.89</cell><cell>0.90</cell></row><row><cell>30</cell><cell>0.92</cell><cell>0.88</cell><cell>0.895</cell><cell>0.895</cell></row><row><cell>40</cell><cell>0.916</cell><cell>0.894</cell><cell>0.899</cell><cell>0.90</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4</head><label>4</label><figDesc>Spelling Augmentation with Varying Word Augmentation Rate</figDesc><table><row><cell cols="5">Words augmented per instance (%) Precision Recall mF1-score Accuracy</cell></row><row><cell>30</cell><cell>0.921</cell><cell>0.885</cell><cell>0.898</cell><cell>0.898</cell></row><row><cell>50</cell><cell>0.928</cell><cell>0.898</cell><cell>0.908</cell><cell>0.908</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">https://github.com/fariha231/impact-of-augmentation-ruhsoldplusplus</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">https://www.wikipedia.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">https://pytorch.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_3">https://pypi.org/project/nlpaug/0.0.5/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>to circumvent the pitfalls of model overfitting. Our experimentation involved the application of three distinct data augmentation techniques: random swap, random deletion, and spelling augmentation. Notably, the results underscore the prowess of swap data augmentation, exhibiting the highest accuracy at 91.3%. This achievement was realized with a 30% word augmentation rate and a 50% augmented data incorporation. We also published the new RUHSOLD++ dataset containing the augmented data. For future work we envision the exploration of additional augmentation techniques, setting the stage for a comprehensive comparison of model performances. This will improve and add more tools to the ongoing fight against hate speech on social media especially for under-resourced languages such as Roman Urdu.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A survey on automatic detection of hate speech in text</title>
		<author>
			<persName><forename type="first">P</forename><surname>Fortuna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nunes</surname></persName>
		</author>
		<idno type="DOI">10.1145/3232676</idno>
		<ptr target="https://doi.org/10.1145/3232676.doi:10.1145/3232676" />
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page">30</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Hate speech detection in roman urdu</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shahzad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Malik</surname></persName>
		</author>
		<idno type="DOI">10.1145/3414524</idno>
		<ptr target="https://doi.org/10.1145/3414524.doi:10.1145/3414524" />
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Asian Low Resour. Lang. Inf. Process</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page">19</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Hate-speech and offensive language detection in roman urdu</title>
		<author>
			<persName><forename type="first">H</forename><surname>Rizwan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Shakeel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Karim</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/2020.EMNLP-MAIN.197</idno>
		<ptr target="https://doi.org/10.18653/v1/2020.emnlp-main.197.doi:10.18653/V1/2020.EMNLP-MAIN.197" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020</title>
				<editor>
			<persName><forename type="first">B</forename><surname>Webber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020</meeting>
		<imprint>
			<date type="published" when="2020">November 16-20, 2020. 2020</date>
			<biblScope unit="page" from="2512" to="2522" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Development of computational linguistic resources for automated detection of textual cyberbullying threats in roman urdu language, 3C TIC: Cuadernos de desarrollo aplicados a las</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dewani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Memon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhatti</surname></persName>
		</author>
		<idno type="DOI">10.17993/3ctic.2021.102.101-121</idno>
	</analytic>
	<monogr>
		<title level="j">TIC</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="101" to="121" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Detecting offensive language in social media to protect adolescent online safety</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.1109/SOCIALCOM-PASSAT.2012.55</idno>
		<ptr target="https://doi.org/10.1109/SocialCom-PASSAT.2012.55.doi:10.1109/SOCIALCOM-PASSAT.2012.55" />
	</analytic>
	<monogr>
		<title level="m">2012 International Conference on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 International Confernece on Social Computing, SocialCom 2012</title>
				<meeting><address><addrLine>Amsterdam, Netherlands</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2012">September 3-5, 2012. 2012</date>
			<biblScope unit="page" from="71" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Deep learning for hate speech detection in tweets</title>
		<author>
			<persName><forename type="first">P</forename><surname>Badjatiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Varma</surname></persName>
		</author>
		<idno type="DOI">10.1145/3041021.3054223</idno>
		<idno>doi:10.1145/3041021.3054223</idno>
		<ptr target="https://doi.org/10.1145/3041021.3054223" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th International Conference on World Wide Web Companion</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Barrett</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Cummings</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Agichtein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Gabrilovich</surname></persName>
		</editor>
		<meeting>the 26th International Conference on World Wide Web Companion<address><addrLine>Perth, Australia</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2017">April 3-7, 2017. 2017</date>
			<biblScope unit="page" from="759" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">You too brutus! trapping hateful users in social media: Challenges, solutions &amp; insights</title>
		<author>
			<persName><forename type="first">M</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dutt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mathew</surname></persName>
		</author>
		<idno type="DOI">10.1145/3465336.3475106</idno>
		<idno>doi:10.1145/3465336.3475106</idno>
		<ptr target="https://doi.org/10.1145/3465336.3475106" />
	</analytic>
	<monogr>
		<title level="m">HT &apos;21: 32nd ACM Conference on Hypertext and Social Media, Virtual Event</title>
				<editor>
			<persName><forename type="first">O</forename><surname>Conlan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Herder</surname></persName>
		</editor>
		<meeting><address><addrLine>, Ireland</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2021-08-30">30 August 2021 -2 September 2021. 2021</date>
			<biblScope unit="page" from="79" to="89" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">BERT: pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/N19-1423</idno>
		<ptr target="https://doi.org/10.18653/v1/n19-1423.doi:10.18653/V1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Burstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Doran</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</editor>
		<meeting><address><addrLine>Minneapolis, MN, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">June 2-7, 2019. 2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Proceedings of the 2019 NAACL-HLT 2019</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Detecting hate speech on twitter using a convolutiongru based deep neural network</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Robinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Tepper</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-93417-4_48</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-319-93417-4\_48" />
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -15th International Conference, ESWC 2018</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">A</forename><surname>Gangemi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Vidal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Hollink</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Tordai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Alam</surname></persName>
		</editor>
		<meeting><address><addrLine>Heraklion, Crete, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">June 3-7, 2018. 2018</date>
			<biblScope unit="volume">10843</biblScope>
			<biblScope unit="page" from="745" to="760" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A review on the long short-term memory model</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">V</forename><surname>Houdt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mosquera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nápoles</surname></persName>
		</author>
		<idno type="DOI">10.1007/S10462-020-09838-1</idno>
		<ptr target="https://doi.org/10.1007/s10462-020-09838-1.doi:10.1007/S10462-020-09838-1" />
	</analytic>
	<monogr>
		<title level="j">Artif. Intell. Rev</title>
		<imprint>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page" from="5929" to="5955" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Detection of hate speech using BERT and hate speech word embedding with deep model</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Alatawi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Alhothali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Moria</surname></persName>
		</author>
		<idno>CoRR abs/2111.01515</idno>
		<ptr target="https://arxiv.org/abs/2111.01515.arXiv:2111.01515" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Roman urdu hate speech detection using transformer-based model for cyber security applications</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bilal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Musa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ali</surname></persName>
		</author>
		<idno type="DOI">10.3390/S23083909</idno>
		<ptr target="https://doi.org/10.3390/s23083909.doi:10.3390/S23083909" />
	</analytic>
	<monogr>
		<title level="j">Sensors</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page">3909</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Data bootstrapping approaches to improve low resource abusive language detection for indic languages</title>
		<author>
			<persName><forename type="first">M</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<idno type="DOI">10.1145/3511095.3531277</idno>
		<idno>doi:10.1145/3511095.3531277</idno>
		<ptr target="https://doi.org/10.1145/3511095.3531277" />
	</analytic>
	<monogr>
		<title level="m">HT &apos;22: 33rd ACM Conference on Hypertext and Social Media</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Bellogín</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Boratto</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Cena</surname></persName>
		</editor>
		<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2022-06-28">28 June 2022-1 July 2022. 2022</date>
			<biblScope unit="page" from="32" to="42" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Khanuja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mehtani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khosla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gopalan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">K</forename><surname>Margam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Aggarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">T</forename><surname>Nagipogu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C B</forename><surname>Gali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">P</forename><surname>Talukdar</surname></persName>
		</author>
		<idno>CoRR abs/2103.10730</idno>
		<ptr target="https://arxiv.org/abs/2103.10730.arXiv:2103.10730" />
		<title level="m">Muril: Multilingual representations for indian languages</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Hate speech detection in roman urdu</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shahzad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Malik</surname></persName>
		</author>
		<idno type="DOI">10.1145/3414524</idno>
		<ptr target="https://doi.org/10.1145/3414524.doi:10.1145/3414524" />
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Asian Low Resour. Lang. Inf. Process</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page">19</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Exploring data augmentation strategies for hate speech detection in roman urdu</title>
		<author>
			<persName><forename type="first">U</forename><surname>Azam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Rizwan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Karim</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.lrec-1.481" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022</title>
				<meeting>the Thirteenth Language Resources and Evaluation Conference, LREC 2022<address><addrLine>Marseille, France</addrLine></address></meeting>
		<imprint>
			<publisher>European Language Resources Association</publisher>
			<date type="published" when="2022-06-25">20-25 June 2022. 2022</date>
			<biblScope unit="page" from="4523" to="4531" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Masked language model scoring</title>
		<author>
			<persName><forename type="first">J</forename><surname>Salazar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Q</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kirchhoff</surname></persName>
		</author>
		<idno type="DOI">10.18653/V1/2020.ACL-MAIN.240</idno>
		<ptr target="https://doi.org/10.18653/v1/2020.acl-main.240.doi:10.18653/V1/2020.ACL-MAIN.240" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020</meeting>
		<imprint>
			<date type="published" when="2020">July 5-10, 2020. 2020</date>
			<biblScope unit="page" from="2699" to="2712" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
