<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Hromenko);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Fine-Tuning BERT-Based Model for Detecting Social Media Manipulation in Low-Resource Settings</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          ,
          <addr-line>Viktor Shevchenko</addr-line>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In this study, the multi-label classification of manipulative techniques in Ukrainian and Russian social media texts is investigated using transformer-based language models. A dataset comprising approximately 3,700 training posts, annotated with 10 distinct manipulation techniques per post, characterises a lowresource, class-imbalanced setting. A pre-trained Ukrainian RoBERTa model (“youscan/ukr-roberta-base”) is fine-tuned, and a range of performance enhancement strategies were evaluated, including advanced tokenization, countermeasures for class imbalance: loss weighting, different loss functions, data augmentation via back-translation, layer-wise learning rate decay for stable fine-tuning, and post-training threshold optimization for prediction calibration. Experimental results indicate that the optimal model achieves a macro-averaged F1 score of approximately 0.40 on the validation set - a marked improvement over the presented by dataset publishers baseline of approximately 0.24. Detailed per-technique analysis reveals that while frequently occurring techniques (e.g., Loaded Language, F1 ≈ 0.70) are reliably detected, performance declines for rarer techniques (e.g., Bandwagon and Straw Man, F1 &lt; 0.25). Although data augmentation and rebalancing strategies modestly enhance recall for under-represented techniques, they also contribute to an increase in false positives. Common error patterns, such as the confusion between related techniques, are discussed along with the limitations imposed by the small dataset. These findings offer valuable insights into effective practices for multi-label classification in low-resource settings and present the first results on the automated detection of manipulation techniques in Ukrainian texts, with significant implications for disinformation monitoring.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Social Media Analysis</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Transformer Fine-Tuning</kwd>
        <kwd>MultiLabel Classification</kwd>
        <kwd>Ukrainian</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The proliferation of propaganda and manipulative content on social media has spurred research
into automatic detection of specific manipulation techniques used to mislead or influence readers.
Given a text (e.g. a Telegram post), the task is to identify which rhetorical or stylistic manipulation
techniques (if any) are present – e.g. Loaded Language, Whataboutism, Straw</p>
      <sec id="sec-1-1">
        <title>Man, etc. This is</title>
        <p>
          inherently a multi-label classification problem: a single post may employ multiple such techniques.
Prior work on propaganda detection has mostly focused on English news articles [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Notably, the
SemEval-2020 Task 11 defined an inventory of propaganda techniques and provided an English
dataset for identifying them. Top-performing systems there leveraged pre-trained Transformer
language models and ensemble methods. However, for Ukrainian – which has become a hotspot for
information warfare – there has been little to no existing data or models for this task. The UNLP
2025 Shared Task [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] addressed this gap by releasing a dataset of Ukrainian social media posts
annotated with 10 manipulation techniques, such as Appeal to Fear, FUD (Fear, Uncertainty,
Doubt), Glittering Generalities, etc. The language setting is particularly challenging: some posts are
in Ukrainian and others in Russian (both written in Cyrillic), requiring models to handle
multilingual inputs.
        </p>
        <p>
          In this paper, an approach to the multi-label classification of manipulation techniques in
Ukrainian/Russian dataset is presented. This small dataset (only 3,800 training examples) poses
significant challenges of low data regime and severe class imbalance – some techniques appear in
hundreds of posts while others in only a few dozen. These issues can cause standard fine-tuning of
large language models to overfit or to predict only the majority classes. Therefore several strategies
were explored to address these challenges:
• Fine-tune a pre-trained Ukrainian RoBERTa transformer (125M parameters) as our base
model, taking advantage of prior language knowledge. Also, there were experiments with a
multilingual model to handle Russian text.
• Employ imbalance mitigation techniques, including algorithm-level methods like
classweighted loss and other types of loss functions that focus learning on rare positive labels
by down-weighting easy negatives.
• Data augmentation was utilized to expand the effective training set. In particular,
paraphrases of minority class examples were generated via back translation, in which
posts were translated to another language and then back to Ukrainian/Russian [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
• To fine tune the transformer on limited data without overfitting, a layer wise learning
rate decay (LLRD) was applied [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], training higher layers more aggressively than lower
ones (thereby preserving general linguistic features in early layers).
• Although experiments with per label threshold optimization were conducted—where
decision thresholds on the sigmoid outputs were tuned on a validation set in an attempt to
maximize macro F1 — with grid search for best thresholds no significant improvements
were observed overusing a fixed threshold of 0.5; threshold optimization was therefore not
included in the final system.
        </p>
        <p>An extensive set of experiments was conducted to quantify the impact of these techniques.
Results are reported in terms of macro F1 (the official metric), as well as micro F1, precision, and
recall. The best configuration achieved a macro F1 of approximately 0.40, which—although
modest in absolute terms—is notable given the small size of the training data and the difficulty of
the task (a baseline with no special handling scored below 0.30 macro F1). An analysis of the
techniques revealed which were most and least accurately detected and examples of typical errors
were provided.</p>
        <p>Finally, we discuss the limitations of our approach and promising directions for future
improvements, such as leveraging unlabeled data or multi-task learning. In summary, our
contributions are:
1. an effective fine-tuning approach for multi-label propaganda technique classification in a
low-resource setting (Ukrainian/Russian).
2. an empirical study of imbalance mitigation, data augmentation, and thresholding
techniques for the multi-label classification of manipulation techniques task.
3. the reported results on the UNLP 2025 Shared Task data, provide a benchmark macro-F1 ≈
0.40.</p>
        <p>We hope that our findings will inform future work in low-resource multi-label text
classification and aid the development of tools to automatically monitor manipulation in
information warfare contexts.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Propaganda and Manipulation Detection.</title>
        <p>
          Propaganda and manipulation detection in text have become increasingly important research
areas within Natural Language Processing (NLP), particularly in the context of political discourse,
fake news, and social media. Early approaches in this field primarily relied on hand-crafted
linguistic features and traditional machine learning classifiers such as Support Vector Machines
(SVM), Naive Bayes, and Decision Trees. These systems often utilize syntactic and stylistic features
like part-of-speech tags, sentiment polarity, and rhetorical structure to identify persuasive or
manipulative cues [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          However, with the rise of deep learning and transformer-based language models, the paradigm
has shifted significantly. Pre-trained models such as BERT [19] and RoBERTa have demonstrated
superior performance in domain-specific classification tasks, outperforming traditional methods
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. For instance, in the SemEval 2023 Task 5 on clickbait spoiler classification, fine-tuned RoBERTa
models significantly outperformed classifiers trained on hand-crafted features, demonstrating the
capability of transformers in manipulation-related text classification [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          Ensemble-based approaches have also been explored to further enhance performance by
combining multiple transformer models. For example, combining RoBERTa, Transformer-XL, and
XGBoost has been shown to outperform individual models in sentiment and manipulation-related
tasks such as tweet classification [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Similarly, BoostingBERT integrates multi-class boosting
techniques with BERT and RoBERTa to address difficult classification instances in NLP tasks,
achieving state-of-the-art performance in multiple benchmarks [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Other studies have addressed the application of domain-specific transformer models, such as
BERTweet and Bio_ClinicalBERT, in tasks involving informal or health-related language
manipulation. These domain-tuned models have shown improved results over general-purpose
models, highlighting the importance of language context and corpus similarity in manipulation
detection [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Furthermore, text classification tasks targeting misinformation and deception—closely tied to
propaganda—have benefited from fusion-based transformer architectures. For instance, fusion
models combining BERT, RoBERTa, and XLNet outperformed both traditional and earlier deep
learning models in detecting prescription medication abuse on Twitter, a context where language is
often manipulated to conceal intent [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>In sum, while early systems for propaganda and manipulation detection in NLP leveraged
manual feature engineering, recent advancements have firmly established transformer-based and
ensemble methods as the new standard, significantly improving performance in detecting subtle
and complex forms of linguistic manipulation across diverse domains.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Transformer Fine Tuning in Low-resource languages.</title>
        <p>Fine-tuning pre-trained transformer models for classification tasks in the Ukrainian language is
a growing area within natural language processing (NLP). This trend is driven by the need to adapt
advanced methods to low-resource languages. Recent studies have focused on customizing
multilingual and general-purpose transformer models specifically for Ukrainian language tasks
through supervised fine-tuning.</p>
        <p>
          A significant contribution includes fine-tuning the Gemma and Mistral large language models
(LLMs) using Ukrainian datasets. This approach has shown considerable improvements in various
classification and instruction-following tasks. Additionally, the development of the Ukrainian
Knowledge and Instruction Dataset (UKID) has significantly expanded available resources for
training and evaluating models for the Ukrainian language [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Transformer-based methods have also been effective in grammatical error classification tasks. A
two-stage fine-tuning approach, using synthetic data first and subsequently gold-standard data,
was successfully applied to multilingual models such as mT5 and smaller seq2seq transformers,
resulting in a strong performance on grammatical classification tasks [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          For stance and argument classification, researchers have utilized adapter-based fine-tuning on
multilingual transformers. Coupled with few-shot learning, this method effectively handled
classification tasks related to political discussions about Ukraine, demonstrating its effectiveness
even with limited labelled data [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          Question answering (QA) has similarly been approached as a classification problem for
Ukrainian. A multilingual BERT model was fine-tuned on a translated dataset similar to SQuAD,
effectively demonstrating the potential of transfer learning by accurately identifying answers
within Ukrainian Wikipedia articles, even in the absence of native QA datasets [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Further foundational research has improved technical text preprocessing methods for the
Ukrainian language. Techniques like Cyrillic normalization, abbreviation handling, and compound
word segmentation have been developed to significantly enhance the quality of input data for
transformer models in domain-specific tasks [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>Overall, these research efforts illustrate that fine-tuning transformer models for Ukrainian
classification tasks is becoming increasingly practical. Strategies like synthetic data generation,
adapter tuning, and targeted linguistic preprocessing continue to contribute to improved model
performance.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data overview</title>
      <p>
        The experiments were conducted on a dataset of Ukrainian text samples annotated for
propaganda techniques, formulated as a multi-label classification task. Each text may employ one
or more of 10 distinct propaganda techniques (e.g., loaded language, bandwagon, whataboutism,
appeal to fear, straw man, etc.) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], following a taxonomy similar to that used in propaganda
detection tasks in news articles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The dataset consists of 3,822 samples in total (after filtering),
with an average of approximately 1.3 techniques labelled per sample. The class distribution is
highly imbalanced, which represents a common challenge in multi-label datasets [17]. For instance,
the most frequent label, loaded_language, appears in 1,973 samples (over 50% of all instances),
whereas the rarest technique, straw_man, is found in only 138 samples (~3.6%). Such imbalance
may bias models towards consistently predicting the majority class, resulting in neglect of minority
classes. To mitigate this, stratified data splitting and customized loss functions are employed, as
described below.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Model Development</title>
      <sec id="sec-4-1">
        <title>4.1. Preparation</title>
        <p>The dataset was split into training (80%) and validation (20%) subsets using a
multilabelstratified shuffle split, ensuring that each propaganda technique was represented proportionally
across both subsets. This stratification prevented the exclusion of rare classes from the validation
set. Minimal text preprocessing was performed, given that the pre-trained model is capable of
handling raw text; only excessive whitespace and line breaks were removed. Emojis and other
symbols present in the texts were intentionally retained, as these could convey sentiment or
emphasis pertinent to specific propaganda techniques. Although the average text length was
relatively short— typically 256 tokens — a maximum sequence length of 512 tokens was set to
accommodate longer examples with maximum pre-trained model capacity. Labels for each sample
were binarized into a 10-dimensional vector for model training.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Pre-trained language model</title>
        <p>As base model pre-trained RoBERTa-base transformer model, specifically the publicly available
youscan/ukr-roberta-base was chosen, originally trained on the Fill-Mask task, with an initial
dataset consisting of 85 million lines of Ukrainian texts from Wikipedia, social networks and
Ukrainian OSCAR deduplicated dataset [18]. The model architecture aligns with the RoBERTa base
model [20], featuring 12 transformer layers, 768 hidden units per layer, 12 attention heads, and
approximately 125 million parameters [18].</p>
        <p>Initially, other pre-trained models were explored to compare performance, such as:</p>
        <p>The "unknown token ratio" was quantified for several models, resulting in values around 1–2%
for models such as XLM-RoBERTa, whereas the selected monolingual model had a ratio of 0.0%.
This result indicated vocabulary coverage by the youscan/ukr-roberta-base model, likely because
its pre-training data closely matched the domain, including informal social media language.
Consequently, and to maintain a relatively compact model size (125 million parameters), the
monolingual RoBERTa-base model was chosen for all subsequent experiments.</p>
        <p>Transfer learning was leveraged by adding a classification head to the pre-trained model, which
consists of a feed-forward layer producing one logit per propaganda technique, with a sigmoid
activation applied to each logit to obtain independent probabilities for each label. This
configuration frames the task as 10 parallel binary classification problems, a standard approach in
multi-label text classification.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Training details</title>
        <p>
          The HuggingFace Trainer API was utilized to manage the training loops [27], with
modifications introduced to incorporate various loss functions — namely, weighted Binary
CrossEntropy (BCE) Loss, Focal Loss [28], and unboundF1 Loss [29] — as detailed in the experiments
section. A custom Trainer subclass was developed to implement weighted binary cross-entropy
that redefines the loss computation by employing a Binary Cross-Entropy Loss with logits loss
function, adjusted with a pre-computed weight tensor. Similarly, Focal Loss and unboundF1 Loss
were implemented as separate modules and integrated into the training process [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. During
training, macro-F1 was adopted as the primary evaluation metric, reflecting the emphasis of the
UNLP Shard Task competition on treating all techniques with equal importance, in contrast to
micro-F1, which tends to be influenced by the majority class. Micro-F1 and per-class metrics were
also recorded for further analytical purposes, although they were not used in the model selection
process.
        </p>
        <p>All layers of the RoBERTa model were fine-tuned on the training set. Training was conducted
for up to 5 epochs, more epochs give worse results on validation performance. Evaluation of the
validation set was performed at the end of each epoch, and the best model—defined as the model
with the highest validation macro-F1 — was saved. All experiments were carried out on a single
NVIDIA 2060 RTX GPU using mixed precision (fp16) and train batch size from 8 to 12 to accelerate
training and utilize all available memory (6 GB).</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Experiments</title>
      </sec>
      <sec id="sec-4-5">
        <title>4.4.1. Addressing class imbalance</title>
        <p>To mitigate strong class imbalance, cost-sensitive learning and alternative loss functions were
experimented with. Initially, class-weighted Binary Cross-Entropy was implemented, in which each
label’s positive examples were assigned a weight inversely proportional to the label’s frequency.
Specifically, for each technique label a weight was computed as the ratio of the number of negative
samples to the number of positive samples, as shown in Equation 1.</p>
        <p>where:
•
•
•
•</p>
        <p>At last, the UnboundedF1 Loss was implemented, according to the Equation 4, as a custom loss
function to directly optimize macro-F1 in a differentiable manner.
(4)
where:
• is the true label,
• is the predicted probability,</p>
        <p>While the true F1 score is non-differentiable due to its reliance on discrete predictions, this
approach uses a smooth approximation by treating the sigmoid-activated outputs as continuous
probabilities. For each class, it computes a rough surrogate of true positives, false positives, and
false negatives, enabling the calculation of a differentiable proxy for the F1 score as the harmonic
mean. This method is particularly useful in multi-label classification, where aligning the training
objective more closely with the evaluation metric can improve performance on underrepresented
classes. In prior literature, such approaches are often referred to as F1 surrogate losses or
soft/unbounded F1 objectives. [29].</p>
        <p>Finally, Table 2 summarizes the loss function comparison results. All runs in this comparison
used the same base model (youscan/ukr-roberta-base) and hyperparameters (5s epochs, learning
rate equals 2e-5, max token length 256) varying only the loss function.</p>
        <p>As shown in Table 2, A macro-F1 score of only 0.25 was achieved by the baseline model trained
with standard binary cross-entropy, reflecting very poor performance on under-represented
techniques. In this run, the majority class was nearly always predicted, and an F1 score close to 0
was obtained on the rare classes. When class weights were incorporated, Macro F1 was
dramatically improved to 0.38, representing a substantial absolute gain of over 0.13. The weighted
loss enabled the recovery of some of the minority classes, as predictions for these classes were
increased, thereby improving recall. In contrast, the focal loss approach underperformed, given a
macro-F1 of approximately 0.27, only slightly above the baseline. Additional experiments were
conducted with set to 0.5 and 1.5, and no improvements were observed. It is suspected that
further tuning of the focal loss hyperparameter, , might be necessary to achieve better results,
given the sensitivity of [28]. The UnboundedF1 loss achieved a macro-F1 of 0.34, outperforming
the plain baseline but not reaching the effectiveness of the simpler weighted binary cross-entropy.
Although the custom loss explicitly attempted to optimize macro-F1, it may have been more
difficult to optimize or required more epochs. Studies on differentiable F1 losses have reported
mixed results, sometimes necessitating careful calibration to outperform binary cross-entropy [29].
In this case, weighted binary cross-entropy proved to be the most reliable and effective approach
for addressing class imbalance, following common practice for imbalanced data. Based on these
results, weighted binary cross-entropy was adopted as the primary loss function for the remaining
experiments.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.4.2. Hyperparameters tuning</title>
        <p>Firstly, The effect of sequence length was examined. The RoBERTa tokenizer is capable of
truncating inputs that exceed a set maximum length, potentially resulting in the loss of important
information. In initial experiments, a maximum length of 256 tokens was employed, which covered
most of the data. It was found that approximately 16.46% of the samples exceeded 256 tokens, with
the longest cases reaching around 1468 tokens. Consequently, the maximum sequence length was
increased to 512 tokens (the maximum available length for the chosen model) and the model was
re-trained with the weighted loss. A slight improvement in validation performance was observed,
with macro-F1 increasing from approximately 0.38 to 0.396 and micro-F1 rising from roughly 0.47
to 0.474. These modest gains suggest that, for a few samples, retaining the full text rather than a
truncated 256-token segment aided in the correct identification of additional manipulation
techniques. A maximum length of 512 tokens was therefore adopted for subsequent experiments, as
the increase in computational cost was manageable.</p>
        <p>After the best base loss function was identified, training hyperparameters were further tuned to
boost performance using detailed hyperparameters and architectural adjustments. Discriminative
fine-tuning was applied by assigning distinct learning rates to different layers of the model; a lower
learning rate of 1×10^−5 was allocated to the pre-trained Transformer layers, while a higher
learning rate of 1×10^−3 was assigned to the newly added classifier layer. This strategy, inspired
by fine-tuning practices such as those employed in ULMFiT and BERT adaptations [31], enabled
the classifier to rapidly adapt to class-specific nuances while preventing excessive updates to the
sensitive lower layers. In parallel, the weight decay was increased from 0.01 to 0.1 to impose
stronger regularization on the model’s weights, thereby aiming to reduce overfitting. With the
weighted BCE and a maximum sequence length of 512 tokens in place, training was extended to 8
epochs, during which validation loss and macro-F1 scores were continuously monitored and early
stopping was implemented if performance deteriorated. Under these settings, training gives the
best validation macro-F1 of approximately 0.41 after 6 epochs, after which performance plateaued.
For comparison, training the same model with a uniform learning rate of 2×10^−5 for all layers
over 7 epochs resulted in a macro-F1 of around 0.40, indicating that the layer-wise learning rate
provided a measurable improvement.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.4.3. Final Model</title>
        <p>
          A final model was configured with weighted BCE, a maximum sequence length of 512 tokens,
discriminative learning rates, and a weight decay of 0.1, and a macro-F1 of approximately 0.41 was
achieved on the validation set, with the corresponding micro-F1 reaching around 0.48. The gap
between micro and macro F1 is attributed to class imbalance, as micro-F1 is influenced more by the
numerous easy negatives and the single frequent class, whereas macro-F1 is reduced by the poor
performance on rarer classes [32]. This final model was also evaluated on the held-out test set,
which exhibited a similar class distribution, and a macro-F1 of approximately 0.40 was obtained on
the test data. Precision-recall breakouts and further analysis are provided in the next section.
Overall, it is demonstrated by these results that, with appropriate handling of imbalance and
careful tuning of hyperparameters, a relatively compact monolingual transformer is capable of
achieving around 0.4 macro-F1 on this challenging multi-label task. For context, it should be noted
that this performance is competitive with systems from related manipulation detection
competitions; for instance, in a recent English-language propaganda detection task, macro-F1
scores in the range of 0.5–0.6 were achieved by the best systems through the use of significantly
more training data [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>4.4.4. Additional experiments</title>
        <p>Back-translation augmentation was applied by translating training examples from rare classes
into English language using Helsinki-NLP Opus MT translation models [33] and then back to
Ukrainian, thereby creating paraphrased variants of the original text, which potentially may lead to
better model results [34]. The five most under-represented classes (appeal_to_fear, bandwagon,
straw_man, whataboutism, and fud) were augmented by roughly doubling their positive samples
through back-translation, resulting in an augmented training set of slightly larger size. However,
when the model was fine-tuned on this augmented data using the same weighted loss setup, no
improvement was observed in the validation macro-F1; in one trial, a slight decrease to
approximately 0.36 was noted. It is believed that the additional synthetic examples did not enhance
the model’s ability to discriminate those techniques. A possible explanation is that while the
backtranslated texts exhibited a range of vocabulary, they did not introduce new propaganda signals
and may have increased language variation among the under-represented classes, thereby making
it harder for the model to learn clear patterns. Additionally, inaccuracies in back-translation may
have led to a loss of meaning or the omission of important content, which can be critical for
detecting manipulation techniques. Examples of such mistranslations are provided in Table 3.</p>
        <sec id="sec-4-8-1">
          <title>Original part</title>
        </sec>
        <sec id="sec-4-8-2">
          <title>Translated part</title>
        </sec>
        <sec id="sec-4-8-3">
          <title>Back-translated part</title>
          <p>Finally, although previous research found back-translation to be useful to be useful in certain
text classification scenarios [34], the experiments conducted with social media news in Ukrainian
and Russian using the Helsinki-NLP Opus MT translation models were not as effective.</p>
          <p>Threshold tuning was also experimented with, since the default decision threshold of 0.5 on the
sigmoid outputs may not be optimal for each class. A grid search was performed to determine
perlabel thresholds that maximised macro-F1 on the validation set. Although this process showed a
slight increase in validation F1, the application of these optimized thresholds to the test set resulted
in a decrease in macro-F1 by approximately 0.04. In other words, the threshold tuning with the grid
search approach was found to have overfit the validation idiosyncrasies. Consequently, a uniform
threshold of 0.5 was reinstated for final evaluation, as it proved to be more robust.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results evaluation</title>
      <p>To better understand the model’s behaviour, its performance on each propaganda technique
label was examined. In Table 4, the per-technique precision, recall, and F1 scores on the validation
set for the best model are provided.</p>
      <p>A clear correlation between a label's frequency and the model's F1 score is observed from these
results. Best performance is recorded in frequent classes such as Loaded Language, which receives
the highest support. For loaded_language, an F1 of approximately 0.72 is achieved, with a precision
of about 0.79 and a recall of approximately 0.66. This indicates that both high sensitivity and
specificity in detecting loaded language are attained, likely due to the abundance of examples from
which the associated patterns—such as strong emotional or exaggerated wording—could be
learned. A decent F1, approximately 0.61, is also observed for another relatively frequent class,
glittering_generalities. In contrast, much lower F1 scores, roughly 0.20–0.35, are obtained for rare
techniques such as bandwagon, whataboutism, and straw_man. For instance, the bandwagon,
which accounts for only about 3–4% of samples, is recorded with an F1 of about 0.2, a precision of
0.14, and a recall of 0.35. Some bandwagon instances are identified, as evidenced by a recall of
nearly 35%, but this is achieved at the cost of a large number of false positives, resulting in a
precision of only 14%. A similar trend is observed for whataboutism (F1 ~0.24) and appeal_to_fear
(F1 ~0.29), where a moderate recall (in the range of 0.4–0.5) is accompanied by very low precision,
indicating that many segments are flagged as these techniques, albeit often incorrectly. This
behaviour is attributed directly to the Weighted BCE Loss strategy, in which high weight is
assigned to minority classes during training. Consequently, a recall-oriented approach is adopted
for these classes, with a preference for predicting a rare technique—even at the cost of triggering
some false alarms—in order to avoid missing true instances. From a macro-F1 perspective, this
trade-off is acceptable because a slight increase in recall can boost the F1 score as long as precision
is not drastically reduced; however, it does imply that further post-processing or human vetting
would be required in practical applications where false positives are of concern.</p>
      <p>Relatively strong performance on a few medium-frequency classes was achieved by the model.
For example, F1 scores of 0.47 were recorded for cherry_picking and fud (fear, uncertainty, doubt),
which, while not high, were superior to those of the rare classes. A few hundred training instances
were provided for these techniques, which appeared to be sufficient for the model to capture
distinguishing features In the case of cherry_picking (selective truth), which is often characterized
by numerical or factual claims, these patterns were captured, as demonstrated by a precision of 0.38
and a recall of 0.63. This high recall suggests that sensitivity to any pattern resembling a factual
claim or data point has been developed, although some non-cherry-picking content was also
mislabeled. Conversely, slightly lower F1 scores of approximately 0.33 for cliché and 0.4 for
euphoria were observed, despite similar representation, possibly because these classes are more
abstract or subjective, making consistent learning more difficult.</p>
      <p>Overall, a macro-averaged precision of approximately 0.33 was obtained across classes, while a
macro-averaged recall of about 0.57 (resulting in a macro-F1 of roughly 0.40) was recorded. This
imbalance between precision and recall confirms that, under the influence of a weighted BCE loss,
an over-prediction of minority labels is favored to capture as many true instances as possible. In
contrast, different results are shown by the micro-averaged scores: a micro-precision of about 0.4
and a micro-recall of approximately 0.62 were observed, reflecting performance across all label
decisions collectively. The higher micro-precision compared to the precision of many individual
rare classes indicates that, when all negative examples are considered, the model is correct in most
cases by not predicting a rare label where it is not present. A micro-recall of 0.62 was recorded,
reflecting the overall proportion of true labels that were correctly predicted. This value was largely
influenced by the very high recall achieved on the fud, glittering_generalities and loaded_language
classes, which contributed a significant number of true labels.</p>
      <p>In summary, these metrics indicate that high effectiveness is achieved for the majority class and
reasonable performance is obtained for several mid-frequency classes, while the rarest techniques
continue to pose challenges, with some instances being detected (non-zero recall) but many false
positives being produced, thereby resulting in low precision and F1.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In conclusion, it was demonstrated that fine-tuning a pre-trained transformer model, when
combined with strategies for addressing class imbalance and careful hyperparameter tuning, can
provide a strong baseline for multi-label propaganda technique classification in Ukrainian/Russian
languages. A macro-F1 score of approximately 0.41 was attained on the evaluation set and 0.4 on
test set, representing a significant improvement over baseline methods. It was shown that transfer
learning from a large unlabeled corpus can provide a solid foundation even in a low-resource
setting, while the implementation of a weighted BCE loss function was found to be essential for
mitigating the effects of extreme class imbalance.</p>
      <p>Discriminative fine-tuning, in which different learning rates were applied to distinct layers, was
also found to help preserve the pre-trained language representations while allowing rapid
adaptation of the classifier. Conversely, data augmentation via back-translation did not produce the
expected gains, suggesting that further research is needed to generate synthetic training examples
that capture the nuanced patterns of propaganda language. Future work is recommended to
explore ensemble methods, cross-lingual transfer, and techniques aimed at improving precision for
minority classes without sacrificing recall. It is hoped that these findings will be used to refine
existing methods and inspire the development of new approaches for detecting propaganda and
misinformation, particularly in low-resource environments.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT, Grammarly in order to:
Grammar and spelling check, Paraphrase and reword. After using this tool/service, the authors
reviewed and edited the content as needed and takes full responsibility for the publication’s
content.
[17] A. Tarekegn, M. Giacobini, and K. Michalak. A review of methods for imbalanced
multilabel classification. Pattern Recognit. 118 (2021) 107965. doi:10.1016/j.patcog.2021.107965.
[18] Hugging Face. youscan/ukr-roberta-base. Available at:
https://huggingface.co/youscan/ukrroberta-base (accessed 2025-04-03).
[19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the
North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis,
Minnesota, 2019. Association for Computational Linguistics.
[20] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, … V. Stoyanov. RoBERTa: A robustly
optimized BERT pretraining approach. ArXiv preprint arXiv:1907.11692, 2019.
[21] Hugging Face. ukr-models/xlm-roberta-base-uk. Available at:
https://huggingface.co/ukrmodels/xlm-roberta-base-uk (accessed 2025-04-03).
[22] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, … V.</p>
      <p>Stoyanov. Unsupervised cross-lingual representation learning at scale. ArXiv preprint
arXiv:1911.02116, 2019.
[23] Hugging Face. ukr-models/uk-summarizer. Available at:
https://huggingface.co/ukrmodels/uk-summarizer (accessed 2025-04-03).
[24] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, … C. Raffel. mT5: A
massively multilingual pre-trained text-to-text transformer. ArXiv preprint
arXiv:2010.11934, 2020.
[25] Hugging Face. KoichiYasuoka/roberta-base-ukrainian. Available at:
https://huggingface.co/KoichiYasuoka/roberta-base-ukrainian (accessed 2025-04-03).
[26] UberText dataset. Available at: https://lang.org.ua/uk/corpora/ (accessed 2025-04-03).
[27] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. Von Platen, C., Y. Jernite, J. Plu, C. Xu, T. Scao, S.
Gugger, M. Drame, Q. Lhoest, and A. Rush. HuggingFace’s Transformers: State-of-the-art
Natural Language Processing. ArXiv, abs/1910.03771, 2019.
[28] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal Loss for Dense Object Detection.</p>
      <p>IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2020) 318–327.
doi:10.1109/TPAMI.2018.2858826.
[29] G. B’en’edict, V. Koops, D. Odijk, and M. De Rijke. sigmoidF1: A Smooth F1 Score</p>
      <p>Surrogate Loss for Multilabel Classification. ArXiv, abs/2108.10566, 2021.
[30] M. Rezaei-Dastjerdehei, A. Mijani, and E. Fatemizadeh. Addressing Imbalance in
MultiLabel Classification Using Weighted Cross Entropy Loss Function. In: 2020 27th National
and 5th International Iranian Conference on Biomedical Engineering (ICBME), pp. 333–338,
2020. doi:10.1109/ICBME51989.2020.9319440.
[31] J. Howard and S. Ruder. Universal Language Model Fine-tuning for Text Classification. In:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers), pp. 328–339, 2018. Association for Computational Linguistics.
doi:10.18653/v1/P18-1031.
[32] M. C. Hinojosa Lee, J. Braet, and J. Springael. Performance Metrics for Multilabel Emotion
Classification: Comparing Micro, Macro, and Weighted F1-Scores. Applied Sciences 14(21)
(2024) 9863. doi:10.3390/app14219863.
[33] J. Tiedemann, M. Aulamo, D. Bakshandaeva et al. Democratizing neural machine
translation with OPUS-MT. Lang Resources &amp; Evaluation 58 (2024) 713–755.
doi:10.1007/s10579-023-09704-w.
[34] T. Bourgeade, S. Casola, A. M. Wizan, and C. Bosco. Data Augmentation through
BackTranslation for Stereotypes and Irony Detection. In: Proceedings of the 10th Italian
Conference on Computational Linguistics (CLiC-it 2024), pp. 90–97, December 2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , G. Da San Martino, H. Wachsmuth,
          <string-name>
            <given-names>R.</given-names>
            <surname>Petrov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <source>SemEval2020 Task</source>
          <volume>11</volume>
          :
          <article-title>Detection of Propaganda Techniques in News Articles</article-title>
          .
          <source>In: Proceedings of the SemEval-2020 Workshop</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>UNLP. UNLP</given-names>
            <surname>Shared</surname>
          </string-name>
          <article-title>Task</article-title>
          . Available at: https://unlp.org.ua/shared-task/ (accessed 2025-
          <volume>05</volume>
          - 03).
          <article-title>Content licensed under CC BY-NC-SA 4.0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Qin</surname>
          </string-name>
          .
          <article-title>Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities</article-title>
          . ArXiv, abs/2501.18845,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          .
          <article-title>Universal Language Model Fine-tuning for Text Classification</article-title>
          .
          <source>In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>339</lpage>
          , Melbourne, Australia,
          <year>2018</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sabiri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khtira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Asri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Rhanoui</surname>
          </string-name>
          .
          <article-title>Analyzing BERT's Performance Compared to Traditional Text Classification Models</article-title>
          .
          <year>2023</year>
          , pp.
          <fpage>572</fpage>
          -
          <lpage>582</lpage>
          . doi:
          <volume>10</volume>
          .5220/0011983100003467.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Imran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hodnefjeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kastrati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fatima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Daudpota</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wani</surname>
          </string-name>
          .
          <article-title>Classifying European Court of Human Rights Cases Using Transformer-Based Techniques</article-title>
          .
          <source>IEEE Access</source>
          <volume>11</volume>
          (
          <year>2023</year>
          )
          <fpage>55664</fpage>
          -
          <lpage>55676</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3279034</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Keller</surname>
          </string-name>
          , N. Rehbach,
          <string-name>
            <surname>and I. Zafar.</surname>
          </string-name>
          <article-title>nancy-hicks-gribble at SemEval-2023 Task 5: Classifying and generating clickbait spoilers with RoBERTa</article-title>
          .
          <source>In: Proceedings of the SemEval-2023 Workshop</source>
          , pp.
          <fpage>1712</fpage>
          -
          <lpage>1717</lpage>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .semeval-
          <volume>1</volume>
          .
          <fpage>238</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Tumuluru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kankanala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shoaib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Madhu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Advancing Twitter Sentiment Analysis: An Ensemble Approach with Transformer-XL, RoBERTa, and XGBoost</article-title>
          . In: 2023
          <source>International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS)</source>
          , pp.
          <fpage>944</fpage>
          -
          <lpage>950</lpage>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/ICSSAS57918.
          <year>2023</year>
          .
          <volume>10331828</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>She</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Zhang.</surname>
          </string-name>
          <article-title>BoostingBERT: Integrating Multi-Class Boosting into BERT for NLP Tasks</article-title>
          . ArXiv, abs/
          <year>2009</year>
          .05959,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khurana</surname>
          </string-name>
          , G. Mastorakos,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          , H. Liu, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Classification of Patient Portal Messages with BERT-based Language Models</article-title>
          .
          <source>In: 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI)</source>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/ICHI57859.
          <year>2023</year>
          .
          <volume>00033</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Garadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. O'Connor</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Graciela</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Perrone</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarker</surname>
          </string-name>
          .
          <article-title>Text classification models for the automatic detection of nonmedical prescription medication use from social media</article-title>
          .
          <source>BMC Medical Informatics and Decision Making</source>
          <volume>21</volume>
          (
          <year>2021</year>
          ).
          <source>doi:10.1186/s12911-021-01394-0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiulian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polishko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khandoga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Chubych</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ravishankar</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Shirawalmath</surname>
          </string-name>
          . From Bytes to Borsch:
          <article-title>Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation</article-title>
          . ArXiv, abs/2404.09138,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2404.09138.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rozovskaya</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>A Low-Resource Approach to the Grammatical Error Correction of Ukrainian</article-title>
          .
          <source>In: Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .unlp-
          <volume>1</volume>
          .
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yanchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruckdeschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Von Nordheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Von Königslöw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Wiedemann</surname>
          </string-name>
          .
          <article-title>Few-shot learning for automated content analysis: Efficient coding of arguments and claims in the debate on arms deliveries to Ukraine</article-title>
          . ArXiv, abs/2312.16975,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2312.16975.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiutiunnyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Dyomkin</surname>
          </string-name>
          .
          <article-title>Context-Based Question-Answering System for the Ukrainian Language</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sergii</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Oleksandr</surname>
          </string-name>
          .
          <article-title>Data preprocessing and tokenization techniques for technical Ukrainian texts</article-title>
          .
          <source>Applied Aspects of Information Technology</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .15276/aait.06.
          <year>2023</year>
          .
          <volume>22</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>