=Paper= {{Paper |id=Vol-3396/paper4 |storemode=property |title=Detecting of Anti-Ukrainian Trolling Tweets |pdfUrl=https://ceur-ws.org/Vol-3396/paper4.pdf |volume=Vol-3396 |authors=Kostiantyn Vyrodov,Anastasiya Chupryna,Ruslan Kotelnykov |dblpUrl=https://dblp.org/rec/conf/colins/VyrodovCK23 }} ==Detecting of Anti-Ukrainian Trolling Tweets== https://ceur-ws.org/Vol-3396/paper4.pdf
Detecting of Anti-Ukrainian Trolling Tweets
Kostiantyn Vyrodov, Anastasiya Chupryna and Ruslan Kotelnykov
Kharkiv National University of Radio Electronics, 14 Nauky Ave., Kharkiv, 61166, Ukraine


                               Abstract
                               The research aims to analyze the effectiveness of the modern machine-learning models usually
                               used for data classification to detect anti-Ukrainian trolling tweets on Twitter. This research
                               was conducted based on 6000 manually gathered tweets. The gathered dataset is divided into
                               training and validation subsets of 75% and 25%, respectively. Also, it consists of 3000 pro-
                               Ukrainian tweets and 3000 anti-Ukrainian tweets. Specific conditions of experiments, models,
                               performance metrics, platform, type of learning, and classification efficiency indicators are
                               determined. SVM, Decision Tree, Multinomial Naive Bayes, and Logistic Regression models
                               are trained using supervised machine learning on the colab research google platform. The
                               evaluation is done by analyzing famous classification metrics, such as accuracy, precision,
                               recall, and F1 score. Finally, the results of experiments are given, along with conclusions and
                               practical recommendations on using machine learning models.

                               Keywords 1
                               Machine Learning, SVM, Decision Tree, Multinomial Naive Bayes, Logistic Regression,
                               Twitter, Bot, Troll, NLP

1. Introduction

    On 24 February 2022, Russia invaded Ukraine in a major escalation of the Russo-Ukrainian War,
which began in 2014. However, Russian aggression is not limited to the battleground but includes
cyberattacks and PSYOPS (Psychological Operations) in social media.
    PSYOPS are operations to convey selected information and indicators to audiences to influence their
emotions, motives, objective reasoning, and ultimately the behavior of governments, organizations,
groups, and individuals. Today, social media platforms are perfect for performing PSYOPS via troll
accounts spreading misleading information. A troll is a person who posts or makes inflammatory,
insincere, digressive, extraneous, or off-topic messages online with the intent of provoking others into
displaying emotional responses or manipulating others' perceptions.
    Twitter is a popular social network that the Russian government widely uses to spread
disinformation about the war in Ukraine, spoil the Ukrainian reputation, and convince Ukrainian allies
to stop their support. Therefore, detecting and eliminating troll accounts and their fake trolling content
will positively affect the security of Ukrainians and complicate the execution of PSYOPS for the
aggressor.
    Machine learning is one of the approaches that can be used to identify trolling content on Twitter.
This research aims to gather an up-to-date dataset related to the Russia-Ukrainian war on the Twitter
platform and to set up experiments in order to determine trolling content using the widely used machine
learning models, evaluate the effectiveness of each model within specific conditions, and formulate
recommendations on the practical application of machine learning techniques and methods to solve this
type of problem.



COLINS-2023: 7th International Conference on Computational Linguistics and Intelligent Systems, April 20–21, 2023, Kharkiv, Ukraine
EMAIL: kostya.vyrodov@gmail.com (K. Vyrodov); anastasiya.chupryna@nure.ua (A. Chupryna); ruslan.kotelnykov@nure.ua (R.
Kotelnykov)
ORCID: 0009-0006-1746-2334 (K. Vyrodov); 0000-0003-0394-9900 (A. Chupryna); 0000-0003-2413-1809 (R. Kotelnykov)
                            © 2023 Copyright for this paper by its authors.
                            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Wor
 Pr
    ks
     hop
  oceedi
       ngs
             ht
             I
              tp:
                //
                 ceur
                    -
             SSN1613-
                     ws
                      .or
                    0073
                        g

                            CEUR Workshop Proceedings (CEUR-WS.org)
2. Related works

    Detecting trolling bots is not easy because anyone can post trolling content online. Currently, more
and more researchers are devoted to developing models and technologies for protecting people from
cyberbullying (trolling) [1-3]. In this respect, paper [1] presents the results of the analysis of
cyberbullying in social networks, paper [2] describes a transfer learning model for training neural
networks to recognize the facts of cyberbullying in social networks, and paper [3] proposes an effective
model for detecting emotions in messages and comments from social networks.
    The paper [4] proposed an integrated model to classify cyber harassment in social networks. The
paper [5] describes behavior-based machine-learning approaches for identifying government-sponsored
Twitter trolls. The paper [6] proposes and presents a model for detecting trolls based on user sentiment
analysis, including the results of experiments proving this statement. The paper [7] describes the
detection of cyber trolls using a model for extracting word embeddings (including hashtags) from tweets
to identify groups of interest. The works [8, 9] provide an up-to-date review of models and algorithms
for detecting farms and networks of trolls, Twitter bots, and their posts when interfering with networks
at the state level. The paper [10] shares models and algorithms for detecting facts of collusion between
retweets. The paper presents the results of the analysis, detection, and characteristics of such trolls and
messages. The work [11, 12] provides an interesting model for establishing parallels and transferring
technologies from electronic warfare to detecting and combating fake news, trolls, and troll farms. The
paper [13] considers the topical issue of the online trolling ecosystem. Since trolling is integral to the
functioning of modern social networks, models are proposed for detecting trolling facts along with
interesting assessments, analyses, and recommendations for practical application.
    The analysis of the current state of this problem shows that the vast majority of research is devoted
to analyzing information on Twitter based on the use of neural networks and ML. Papers [14, 15]
propose emotion recognition results on Twitter using the Unison application model. In addition, the
papers provide the results of comparable studies and learning outcomes. The paper [16] uses a multi-
aspect neural network Attentional Graph to determine the user's location in a social network. The
analysis of modern publications shows an excellent prospect for using neural networks, not only for the
analysis of textual information but also for the effect of (graphic) accumulation [17–19], mainly people's
faces and emotions. The paper [20] presents the results of a study on the imitation and recognition of
sarcasm on Twitter. The work [21] presents the results (models and algorithms) for detecting and
extracting social events from Twitter based on the BiLSTM-CRF model. The paper [22] presents the
results of the effects of political polarization of opinions (posts) in social networks based on the use of
neural networks. The paper [23] presents the results of detecting rumors in social networks using
transformer-type models. The paper [24] proposed a new algorithm that was called the "multilevel tweet
analyzer" (MLTA). This algorithm allows text to be graphically represented in social networks using
multi-layer networks (MLN) in order to obtain better results of coding relationships between
independent sets of tweets.
    The development of modern representation models is no less important for the combination and
presence of cyberbullying facts estimation in e-learning and some other systems [25, 26].
    A study analyzing government-sponsored trolls related to the Russian troll farm found that usually
trolling bots create a small portion of destructive content, such as posts or comments, and heavily spread
them by retweeting and copy-pasting the same information within a specific period [27]. Existing
Twitter bot detection methods can be grouped into feature-based, text-based, and graph-based methods
[28].
    The idea of feature-based methods is to discover features from user information and utilize machine
learning classification algorithms to detect bots. Researchers extract properties from users' metadata,
their follow relationships, and tweets, including various time patterns. The paper [29] presents results
where researchers achieved 87% accuracy in detecting Twitter bots using different machine-learning
methods on a dataset containing metadata about Twitter profiles. However, bot owners are increasingly
aware of discovered features allowing others to identify bot accounts, so new bots try to imitate other
behavior to evade detection. Subsequently, engineers implementing this approach for bot detection have
to keep track of bot evolution to keep their models actual.
    Graph-based methods treat Twitter as graphs using concepts from network science. This approach
adopts neural graph networks, heterogeneous graph neural networks, and node representation learning
to detect Twitter bots. For example, a group of researchers from Xi’an Jiaotong University proposed
TwiBot-22, a graph-based Twitter bot detection benchmark that presents a comprehensive dataset,
providing diversified entities and relations on the Twitter network. They re-implemented 35 Twitter bot
detection baselines, evaluated them on nine datasets, and achieved about 80% accuracy [30].
    Text-based methods utilize techniques in NLP to detect trolling bots based on tweets. Under the
hood, the methods use word embeddings, recurrent neural networks, and pre-trained language models.
Since trolling content is primarily textual and usually represented as a comment or a post containing
hostile language, employing a linguistic and sentiment analysis is a good approach for detecting trolling
content.
    The paper [31] shares the results of applying domain-adaptation techniques for sentiment analysis
of textual content in online forums. The researchers achieved around 70% in detecting trolls. In the
paper [32], researchers evaluated the sentiments of posts and other metadata from trolling posts and
were able to detect Twitter trolls more than 76% of the time. C.J. Hutto and Eric Gilbert presented
VADER [33], a simple rule-based model for general sentiment analysis. Utilization of the VADER in
combination with sentiment, aggression, lexical, and syntactic textual features to determine whether a
tweet is meant to troll or not achieved 88% accuracy when tested with the Kaggle Twitter cyber-trolls
dataset [34, 35]. Todor Mihaylov and Preslav Nakov developed two classifiers for detecting "sponsored
trolls" trying to manipulate the public's opinion and another for detecting "individual trolls" trying to
provoke negative emotions. They combined sentiment analysis with metadata of trolling posts
(information about the publication time) and achieved 82% accuracy [36].

3. Methods and materials

  Consider input data, used methods and conditions for experiments and metrics to understand which
model demonstrates better results.

3.1.    Data description

    In this study, existing Twitter datasets with already identified trolling users and trolling tweets (e.g.,
the IRA troll dataset or the Dataset of Russian trolling tweets for detection of cyber-trolls [37]) were
not used because they are not directory related to the context of the Russia-Ukrainian war.
    The data set, which was used for the research, consists of raw new tweets and labels specifying
whether a tweet is anti-Ukrainian or not. The tweets were gathered via Twitter API and filtered by one
of the similar keywords: “zov”, “nazis”, “azov”, “russia is a terrorist state” or “putin war crimes”
including different hashtags such “#RussiaInvadedUkraine” or “#ZOV”. Such keywords were selected
to find tweets where a user wanted to deliberately create an association with one of the sides in this war.
Each tweet was manually labeled as a pro-Ukrainian or an anti-Ukrainian one by the researcher. All
tweets were gathered via a JS script and saved into a Comma Separated Value (CSVs) file that could
be easily imported into an ML model.
    The dataset contains 6000 items. There are 3000 of anti-Ukrainian tweets and 3000 of pro-Ukrainian
tweets. Table 1 demonstrates two samples from the data set.

Table 1
Data set samples
         TweetId                                        Text                                     Label
 1593089288404873217            Zelensky is a war criminal and NATO is his enabler.          AntiUkrainian
 1620546714724880386           Ministry of Defense 🇺🇦showed the work of the Polish           ProUkrainian
                               🇵🇱"Crabs" at the front. #Ukraine #Poland #StopRussia

   The “TweetId” column represents an id of the tweet and can be used in future research to get extra
data about the tweet. “TweetId” is represented as a number containing 19 digits.
   The “Text” column is a raw tweet congaing a maximum of 280 characters, and the “Label” column
points to the category of the concrete sample. The “Text” can contain arbitrary characters and words,
including emojis, links, or hashtags.
   The dataset was split into two parts to analyze content and build the frequency distribution charts
based on used words. The first part contained anti-Ukrainian tweets and the second contained pro-
Ukrainian tweets. Each tweet was split into words and filtered from stop words and non-alphabetical
symbols. Figure 1 displays the distribution of the words in anti-Ukrainian tweets, and Figure 2 displays
the distribution in pro-Ukrainian tweets.




Figure 1: Words distribution in anti-Ukrainian tweets

   It is possible to see in Figure 1 that the most popular words in anti-Ukrainian tweets are “Ukraine”,
“Ukrainian”, “russian”, “Russia”, “war”, “nazi”, “NATO”, “zelensky”. In addition, these tweets contain
specific for this category words, such as “zelenskywarcrime” or “ukrainenazis”.




Figure 2: Words distribution in pro-Ukrainian tweets
   Figure 2 displays that the most popular words in pro-Ukrainian tweets are “Ukraine”, “russian”,
“ukrainian”, “russia”, “war”, “people”, “support”. These tweets contain specific for this category words
as well, such as “russiaisaterrorisstate” or “standwithukraine”.

3.2.    Machine learning models validation and metrics/efficiency indicators

    Machine learning models are built based on feedback from evaluated performance metrics that help
to understand whether a model meets requirements.
    There are different metrics in the AI Industry, such as recall or precision, helping to evaluate the
performance of a model. This research will use accuracy, precision, recall, and F1 metrics that are
derived from the confusion matrix.
    A confusion matrix (Figure 3) is a tabular structure that helps visualize the performance of
classifiers. Each column in the matrix represents classified instances based on predictions, and each
row of the matrix represents classified instances based on the actual class labels.




Figure 3: Confusion matrix

    True Positive (TP) indicates the number of correct hits or predictions for our positive class. False
Negative (FN) indicates the number of instances we missed for that class by predicting it falsely as the
negative class.
    False Positive (FP) is the number of instances we predicted wrongly as the positive class when it
was not.
    True Negative (TN) is the number of instances we correctly predicted as the negative class.
    Accuracy is defined as the overall accuracy or proportion of correct predictions of the model, which
can be depicted by the formula (1) where we have our correct predictions in the numerator divided by
all the outcomes in the denominator.
                                                     𝑇𝑃 + 𝑇𝑁                                         (1)
                              𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                              .
                                             𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
    Precision is defined as the number of predictions made that are correct or relevant out of all the
predictions based on the positive class. This is also known as positive predictive value and can be
depicted by the formula (2) where we have our correct predictions in the numerator for the positive
class divided by all the predictions for the positive class including the false positives.
                                                        𝑇𝑃                                           (2)
                                      𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =              .
                                                    𝑇𝑃 + 𝐹𝑃
    Recall is defined as the number of instances of the positive class that were correctly predicted. This
is also known as hit rate, coverage, or sensitivity and can be depicted by the formula (3) where we have
our correct predictions for the positive class in the numerator divided by correct and missed instances
for the positive class, giving us the hit rate.
                                                      𝑇𝑃                                             (3)
                                       𝑅𝑒𝑐𝑎𝑙𝑙 =              .
                                                  𝑇𝑃 + 𝐹𝑁
    F1 score is another accuracy measure that is computed by taking the harmonic mean of the precision
and recall and can be represented by the formula (4).
                                            2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙                                   (4)
                              𝐹1 𝑆𝑐𝑜𝑟𝑒 =                              .
                                               𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
3.3.    Main methods and techniques

    This research will rely on NLP techniques since the primary piece of information in the dataset is
raw text. NLP or Natural Language Processing is a part of computer science, human language, and
artificial intelligence whose goal is to make a program capable of “understanding” the content of
documents, including the contextual nuances of the language within them.
    The first step is the normalization of data. Data normalization is a process consisting of steps that
should be followed to wrangle, clean, and standardize textual data into a form that machine learning
models could consume.
    Text normalization steps:
    1. Tokenization. It is the process of splitting or segmenting text from sentences into their constituent
words.
    2. Removing special symbols such as punctuation or emojis.
    3. Expanding contractions such as “won’t” or “can’t”.
    4. Case conversion. Transforming all tokens to lowercase or uppercase.
    5. Removing stop words, words that have little or no significance. They are removed to retain words
having maximum significance and context.
    6. Stemming. It is the process of reducing a word to its stem that affixes to suffixes and prefixes or
the roots of words.
    7. Lemmatization.
    The next step after the normalization of the text is its vectorization. Text vectorization is the process
of converting text into a numerical representation. It is done since machine learning models can not
understand the text as is and require the data's numeric representation. This research will use two
popular vectorization methods: bag of words and normalized TF-IDF.
    The bag of words model is one of the most straightforward yet powerful techniques to extract
features from text documents. The essence of this model is to convert text documents into vectors such
that each document is converted into a vector representing the frequency of all the distinct words present
in the document vector space for that specific document.
    TF-IDF stands for Term Frequency-Inverse Document Frequency, a combination of two metrics:
term frequency and inverse document frequency. This technique was initially developed as a metric for
ranking functions for showing search engine results based on user queries and has come to be a part of
information retrieval and text feature extraction now.
    The cleaned and vectorized data is forwarded as input into the model for training. This research will
focus on decision trees, SVM, multinomial naive bayes, and logistic regression models to find out which
of them, within specific conditions, give the best results. These models were selected since they are
well-established and reliable. In addition, they are capable of drawing a line between different features
in a multi-dimensional space detecting the optimal line between the trolling and non-trolling tweets.

4. Experiment

     In order to identify the best model that is preferred to be used for detecting trolling content, we will
run multiple experiments using various machine-learning models that will be tested under different
conditions.
     For the experiments, the CoLab Research Google platform will be used. The programming
language is Python since it is well-supported in CoLab. In addition, the programming language has
many libraries for analyzing data and training models. All machine learning models, such as logistic
regression or multinomial naive bayes and vectorization packages, will be taken from the sklearn
Python package. The pandas library is required for experiments as it provides data reading and
manipulation functions. The pyplot library will be used for data visualization, and nltk will be used for
text preprocessing.
     The experiment consists of 3 parts and is visualized in Figure 2.
Figure 2: Visualization of the experiment.

   The first step in the experiment is data preparation. The entire dataset containing 6000 samples and
represented as a CSV file will be read using the “pandas” library. After this, column “TweetId” will be
dropped since it does not give any value to models and exists only as a reference to the original tweet
for extra information. As a result, the dataset containing data will consist of two columns: “Text” and
“IsTrolling”. The column “Text” is represented as an arbitrary text that contains links, emojis, and stop
words such as articles “the” or “a”. The column “IsTrolling” is represented as a number one or zero.
   After this, it necessary to perform text normalization. It is required because it can positively affect
the results of experiments.
   The normalization of text consists of the following steps:
   ● Words tokenization, a process of splitting a sentence into separate words, which will simplify
   the performing of the next steps.
   ● Cleaning all website links using regular expressions. It is required because they do not bring
   meaningful information and can confuse a machine learning model for experimenting.
   ● Cleaning everything except alphabetical and numerical characters.
   ● Lemmatization, a process of grouping the inflected forms of a word so they can be analyzed as
   a single item.
   ● Stemming, a process of reducing inflected words to their word stem.
   ● Removing stop words.
       The normalization is done using regular expressions for cleaning up text from needless data and
   using the “nltk” library. The “nltk” has already implemented functions for lemmatization, stemming,
   and removing stop words.
       The second step is text vectorization. It is necessary because machine learning models can not
   directly work with text and need data to be represented as numbers, so vectorization is the process
   of converting text into numerical data. There are various algorithms of vectorization, but this
   research will use bag of words and normalized TF-IDF since they are the most popular and are
   available out of the box on the colab.
       The final step is the training of models and obtaining results. Vectorized data will be split in the
   ratio of 75% and 25% to have training and validation sets. When the text is normalized, vectorized,
   and divided into training and validation chunks, ML models will be trained. For the first experiment,
   the support vector machine model will be used. The second experiment will use the decision tree
   model. The multinomial naive bayes will be used for the third one, and the last experiment will use
   logistic regression.
       Every model will have separate experiments for bag of words and normalized TF-IDF
   vectorization algorithms. In addition, every model will be tested with different levels of text
   normalization. There will be experiments with:
      1. Fully normalized text.
      2. Normalization without stemming.
      3. Normalization without stemming and lemming.
      4. Normalization without stemming, lemming and removing stop words.
      5. Normalization without stemming, lemming, removing stop words, and cleaning not
          alphabetical characters.
       After every run of the experiment, the following metrics will be collected: accuracy, recall,
   precision, and F1, and saved in a separate table for analysis. As a result, it should be possible to
   determine what model has the better output and should be used for detecting trolling tweets.

5. Results

   Consider obtained results of experiments conducted under different conditions using different
machine learning models, algorithms of vectorization, and different levels of text normalization. Each
chapter demonstrates results for the concrete model, but with different algorithms for text vectorization
and different levels of text normalization.

5.1.    Support vector machine

   Table 2 presents results with complete normalization of text.

  Table 2
  Results for fully normalized text.
                         Accuracy (%)             Recall (%)           Precision (%)           F1 (%)
  Bag of words               86.53                  86.53                 86.55                86.53
Normalized TF-IDF            86.93                  86.93                 86.96                86.92

   Table 3 presents results for text without stemming.

  Table 3
  Results without stemming.
                      Accuracy (%)                Recall (%)           Precision (%)           F1 (%)
  Bag of words            85.93                     85.93                 86.07                85.93
Normalized TF-IDF         87.46                     87.46                 87.54                87.46
   Table 4 presents results for text without stemming and lemming.

  Table 4
  Results without stemming and lemming.
                      Accuracy (%)              Recall (%)           Precision (%)         F1 (%)
  Bag of words            86.06                   86.06                 86.07              86.06
Normalized TF-IDF         88.00                   88.00                 88.00              88.00

   Table 5 presents results for text without stemming, lemming, and removing stop words.

  Table 5
  Results without stemming, lemming and removing stop words.
                      Accuracy (%)         Recall (%)        Precision (%)                 F1 (%)
  Bag of words            84.46              84.46              84.51                      84.46
Normalized TF-IDF         87.80              87.80              88.01                      87.78

   Table 6 presents results for text without stemming, lemming, removing stop words, and without
cleaning everything except alphabetical and numerical characters.

  Table 6
  Results without stemming, lemming, removing stop words, and cleaning not alphabetical сhars.
                      Accuracy (%)         Recall (%)         Precision (%)          F1 (%)
  Bag of words            84.20              84.20                84.25              84.20
Normalized TF-IDF         88.46              88.46                88.56              88.46

5.2.   Decision tree

   Table 7 presents results with complete normalization of text.

  Table 7
  Results for fully normalized text.
                         Accuracy (%)         Precision (%)           Recall (%)           F1 (%)
  Bag of words               77.13               77.14                  77.13              77.12
Normalized TF-IDF            79.13               79.13                  79.13              79.13

   Table 8 presents results for text without stemming.

  Table 8
  Results without stemming.
                      Accuracy (%)            Precision (%)           Recall (%)           F1 (%)
  Bag of words            80.20                  80.40                  80.20              80.10
Normalized TF-IDF         77.13                  77.14                  77.13              77.12

   Table 9 presents results for text without stemming and lemming.

  Table 9
  Results without stemming and lemming.
                      Accuracy (%)            Precision (%)           Recall (%)           F1 (%)
  Bag of words            81.06                  81.08                  81.06              81.06
Normalized TF-IDF            80.00                80.00                 80.00               79.99

   Table 10 presents results for text without stemming, lemming, and removing stop words.

  Table 10
  Results without stemming, lemming and removing of stop words.
                      Accuracy (%)        Precision (%)         Recall (%)              F1 (%)
  Bag of words            75.66              75.68                75.66                 75.66
Normalized TF-IDF         79.80              80.06                79.80                 79.76

   Table 11 presents results for text without stemming, lemming, removing stop words, and without
cleaning everything except alphabetical and numerical characters.

  Table 11
  Results without stemming, lemming, removing of stop words and cleaning not alphabetical and
  numerical characters.
                        Accuracy (%)       Recall (%)         Precision (%)          F1 (%)
  Bag of words             75.46             75.46                75.46              75.46
Normalized TF-IDF          77.53             77.53                77.53              77.51

5.3.   Multinomial naive bayes

   Table 12 presents results with complete normalization of text.

  Table 12
  Results for fully normalized text.
                         Accuracy (%)         Precision (%)           Recall (%)        F1 (%)
  Bag of words               89.20               89.25                  89.20           89.19
Normalized TF-IDF            87.13               87.47                  87.13           87.10

   Table 13 presents results for text without stemming.

  Table 13
  Results without stemming.
                      Accuracy (%)            Precision (%)           Recall (%)        F1 (%)
  Bag of words            88.00                  87.99                  88.00           87.99
Normalized TF-IDF         88.80                  88.87                  88.80           88.79

   Table 14 presents results for text without stemming and lemming.

  Table 14
  Results without stemming and lemming.
                      Accuracy (%)            Precision (%)           Recall (%)        F1 (%)
  Bag of words            89.40                  89.40                  89.40           89.39
Normalized TF-IDF         87.80                  87.80                  87.80           87.80

   Table 15 presents results for text without stemming, lemming, and removing stop words.

   Table 15
   Results without stemming, lemming and removing of stop words.
                          Accuracy (%)          Precision (%)         Recall (%)        F1 (%)
  Bag of words               88.06                 88.09                88.06           88.06
Normalized TF-IDF            87.66                 87.66                87.66           87.66

   Table 16 presents results for text without stemming, lemming, removing stop words, and without
cleaning everything except alphabetical and numerical characters.

  Table 16
  Results without stemming, lemming, removing stop words, and cleaning not alphabetical chars.
                      Accuracy (%)       Precision (%)          Recall (%)           F1 (%)
  Bag of words            87.73              87.73                87.73              87.73
Normalized TF-IDF         88.86              88.87                88.86              88.86

5.4.    Logistic regression

   Table 17 presents results with full normalization of text.

  Table 17
  Results for fully normalized text.
                         Accuracy (%)           Precision (%)         Recall (%)        F1 (%)
  Bag of words               85.80                 85.80                85.80           85.79
Normalized TF-IDF            85.46                 85.47                85.47           85.46

   Table 18 presents results for text without stemming.

  Table 18
  Results without stemming.
                      Accuracy (%)              Precision (%)         Recall (%)        F1 (%)
  Bag of words            86.00                    86.00                86.00           85.99
Normalized TF-IDF         85.13                    85.18                85.13           85.13

   Table 19 presents results for text without stemming and lemming.

  Table 19
  Results without stemming and lemming.
                      Accuracy (%)              Precision (%)         Recall (%)        F1 (%)
  Bag of words            87.33                    87.34                87.33           87.33
Normalized TF-IDF         86.60                    86.60                86.60           86.60

   Table 20 presents results for text without stemming, lemming and removing of stop words.

  Table 20
  Results without stemming, lemming and removing of stop words.
                      Accuracy (%)        Precision (%)         Recall (%)              F1 (%)
  Bag of words            88.53              88.53                88.53                 88.53
Normalized TF-IDF         84.06              84.08                84.06                 84.06

   Table 21 presents results for text without stemming, lemming, removing of stop words and without
cleaning everything except alphabetical and numerical characters.
  Table 21
  Results without stemming, lemming, removing of stop words and cleaning not alphabetical and
  numerical characters.
                        Accuracy (%)     Precision (%)          Recall (%)           F1 (%)
  Bag of words             87.26             87.37                87.26              87.26
Normalized TF-IDF          84.60             84.60                84.60              84.60

6. Discussion

    After analyzing the results (metrics) listed above, it is possible to conclude that the multinomial
naive bayes model provides the best result with the bag of words algorithm for text vectorization. The
result of multinomial naive bayes is 89.4% in all metrics. At the same time, the worst results
demonstrated the decision tree with 75.46% in all metrics using the bag of words vectorizer. The
difference between the multinomial naive bayes and the decision tree is 13.94%. The worst result for
the multinomial naive bayes was 87.13% with normalized TF-IDF, while the best outcome for the
decision tree was 81.06% with the bag of words algorithm.
    Logistic regression demonstrated 88.53% accuracy, which is second among the tested models. The
logistic regression achieved this result with the bag of words vectorization algorithm and the text that
was not normalized. The difference between the best and the worst result for this model is 4.47%.
    The SVM model took third place and achieved 88.46%, which is only 0.07% lower than the logistic
regression. This result was achieved using the normalized TF-IDF algorithm with the raw text that was
not normalized. The difference between this model's best and worst results is 4%. Figure 3 visualizes
discussed results and shows the difference between the best and the worst result for every model.


   90

   88

   86

   84

   82

   80

   78

   76

   74

   72

   70
                SVM                Decission tree     Multinomial naive bayes   Logistic regression

   Figure 3: Visualization of obtained results.

    An interesting fact is that only sometimes the text normalization leads to improvement of results.
Although there is no direct correlation in metrics when text normalization is done or not, overall, this
practice may positively affect results. For instance, the SVM model demonstrates good results with the
bag of words vectorizer and full-text normalization. However, the best result among all experiments
this model gave was when the text was not normalized, and the model used the normalized TF-IDF
vectorizer. Proper text normalization can improve the model's performance by 4.26%.
   There is no clear performance correlation in vectorizer algorithms since different models with
various levels of text normalization demonstrated different results. However, normalized TF-IDF gave
better results in 3 out of 4 models with the text without normalization. The most significant difference
between vectorization algorithms is 4.47%, so it is worth trying different vectorization approaches to
improve metrics.
   Based on the analyzed results, it is possible to recommend using the multinomial naive bayes model
for detecting trolling content since it demonstrated the best result among all models. In addition, using
the normalized TF-IDF vectorizer is preferred because it will most likely demonstrate a better result.
Also, it is not recommended to use fully normalized text, and it is better to try different normalization
levels to find that normalization level that will improve metrics.

7. Conclusion

    The researchers gathered 6000 tweets during this research, where every tweet was labeled as anti-
Ukrainian or pro-Ukrainian. The researchers selected four popular machine learning models and
conducted experiments to identify which ML model is the most suitable for identifying trolling content
in Tweets. The data samples were split in a 75% and 25% ratio for training and validating models.
Google colab was used as the experiment environment. This platform allows utilizing a programming
language called Python, which is popular in ML and Data Science and has an enormous number of
libraries for machine learning.
    Every model was tested under different conditions. They were tested with different algorithms of
text vectorization and with different levels of text normalization. Unexpectedly text normalization only
sometimes improves the performance metrics of models. For instance, the SVM model demonstrated
better performance results with no normalized text among all experiments conducted for the model.
    The multinomial naive bayes showed the best results for the selected tweets with completely
normalized text and the bag of words vectorization algorithm. At the same time, the worst results were
obtained from the decision tree in combination with the bag of words vectorization algorithm and
without text normalization.
    The results of the current research possibly could be used in big data methods for E-learning systems
trying to optimize the learning process for teachers and students, which were described in the work
[38]. The primary idea of such systems is organizing information stored in libraries with unstructured
data from emerging outlets such as social media.

8. Reference

[1] A. Ochoa et al., "Analysis of Cyber-bullying in a virtual social networking," 2011 11th
    International Conference on Hybrid Intelligent Systems (HIS), Melacca, Malaysia, 2011, pp. 229-
    234, doi: 10.1109/HIS.2011.6122110.
[2] M. Behzadi, I. G. Harris and A. Derakhshan, "Rapid Cyber-bullying detection method using
    Compact BERT Models," 2021 IEEE 15th International Conference on Semantic Computing
    (ICSC), Laguna Hills, CA, USA, 2021, pp. 199-202, doi: 10.1109/ICSC50631.2021.00042.
[3] L. Canales, C. Strapparava, E. Boldrini and P. Martínez-Barco, "Intensional Learning to Efficiently
    Build Up Automatically Annotated Emotion Corpora," in IEEE Transactions on Affective
    Computing, vol. 11, no. 2, pp. 335-347, 1 April-June 2020, doi: 10.1109/TAFFC.2017.2764470.
[4] K. B. Raj, J. K. Seth, K. Gulati, S. Choubey, I. Patni and Bhawna, "Automated Cyberstalking
    Classification using Social Media," 2022 International Conference on Innovative Computing,
    Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 2022, pp. 1-6,
    doi: 10.1109/ICSES55317.2022.9914337.
[5] S. Alhazbi, "Behavior-Based Machine Learning Approaches to Identify State-Sponsored Trolls on
    Twitter," in IEEE Access, vol. 8, pp. 195132-195141, 2020, doi: 10.1109/ACCESS.2020.3033666.
[6] L. Gao, Y. Wu, X. Xiong and J. Tang, "Discriminating Topical Influencers Based on the User
    Relative Emotion," in IEEE Access, vol. 7, pp. 100120-100130, 2019, doi:
    10.1109/ACCESS.2019.2929548.
[7] L. Recalde, J. Mendieta, L. Boratto, L. Terán, C. Vaca and G. Baquerizo, "Who You Should Not
     Follow: Extracting Word Embeddings from Tweets to Identify Groups of Interest and Hijackers in
     Demonstrations," in IEEE Transactions on Emerging Topics in Computing, vol. 7, no. 2, pp. 206-
     217, 1 April-June 2019, doi: 10.1109/TETC.2017.2669404.
[8] Luca Follis; Adam Fish, "3 When to Hack," in Hacker States , MIT Press, 2020, pp.73-111.
[9] Kate Eichhorn, "5 JOURNALISM AND POLITICS AFTER CONTENT," in Content , MIT Press,
     2022, pp.103-127.
[10] H. S. Dutta and T. Chakraborty, "Blackmarket-Driven Collusion Among Retweeters–Analysis,
     Detection, and Characterization," in IEEE Transactions on Information Forensics and Security,
     vol. 15, pp. 1935-1944, 2020, doi: 10.1109/TIFS.2019.2953331.
[11] Ross Anderson, "Electronic and Information Warfare," in Security Engineering: A Guide to
     Building      Dependable     Distributed    Systems,     Wiley,     2020,    pp.777-814,      doi:
     10.1002/9781119644682.ch23.
[12] H. Berghel, "Trolling Pathologies," in Computer, vol. 51, no. 3, pp. 66-69, March 2018, doi:
     10.1109/MC.2018.1731067.
[13] H. Berghel and D. Berleant, "The Online Trolling Ecosystem," in Computer, vol. 51, no. 8, pp. 44-
     51, August 2018, doi: 10.1109/MC.2018.3191256.
[14] N. Colnerič and J. Demšar, "Emotion Recognition on Twitter: Comparative Study and Training a
     Unison Model," in IEEE Transactions on Affective Computing, vol. 11, no. 3, pp. 433-446, 1 July-
     Sept. 2020, doi: 10.1109/TAFFC.2018.2807817.
[15] N. Colnerič and J. Demšar, "Emotion Recognition on Twitter: Comparative Study and Training a
     Unison Model," in IEEE Transactions on Affective Computing, vol. 11, no. 3, pp. 433-446, 1 July-
     Sept. 2020, doi: 10.1109/TAFFC.2018.2807817.
[16] T. Zhong, T. Wang, J. Wang, J. Wu and F. Zhou, "Multiple-Aspect Attentional Graph Neural
     Networks for Online Social Network User Localization," in IEEE Access, vol. 8, pp. 95223-95234,
     2020, doi: 10.1109/ACCESS.2020.2993876.
[17] K. Smelyakov, M. Shupyliuk, V. Martovytskyi, D. Tovchyrechko and O. Ponomarenko,
     "Efficiency of image convolution," 2019 IEEE 8th International Conference on Advanced
     Optoelectronics       and      Lasers      (CAOL),        2019,       pp.      578-583,       doi:
     10.1109/CAOL46282.2019.9019450.
[18] K. Smelyakov, A. Chupryna, O. Bohomolov and I. Ruban, "The Neural Network Technologies
     Effectiveness for Face Detection," 2020 IEEE Third International Conference on Data Stream
     Mining & Processing (DSMP), 2020, pp. 201-205, doi: 10.1109/DSMP47368.2020.9204049.
[19] K. Smelyakov, A. Chupryna, O. Bohomolov and N. Hunko, "The Neural Network Models
     Effectiveness for Face Detection and Face Recognition," 2021 IEEE Open Conference of
     Electrical, Electronic and Information Sciences (eStream), 2021, pp. 1-7, doi:
     10.1109/eStream53087.2021.9431476.
[20] F. Yao, X. Sun, H. Yu, W. Zhang, W. Liang and K. Fu, "Mimicking the Brain’s Cognition of
     Sarcasm From Multidisciplines for Twitter Sarcasm Detection," in IEEE Transactions on Neural
     Networks and Learning Systems, vol. 34, no. 1, pp. 228-242, Jan. 2023, doi:
     10.1109/TNNLS.2021.3093416.
[21] M. Xu, X. Zhang and L. Guo, "Jointly Detecting and Extracting Social Events From Twitter Using
     Gated BiLSTM-CRF," in IEEE Access, vol. 7, pp. 148462-148471, 2019, doi:
     10.1109/ACCESS.2019.2947027.
[22] L. Belcastro, R. Cantini, F. Marozzo, D. Talia and P. Trunfio, "Learning Political Polarization on
     Social Media Using Neural Networks," in IEEE Access, vol. 8, pp. 47177-47187, 2020, doi:
     10.1109/ACCESS.2020.2978950.
[23] Z. Luo, Q. Li and J. Zheng, "Deep Feature Fusion for Rumor Detection on Twitter," in IEEE
     Access, vol. 9, pp. 126065-126074, 2021, doi: 10.1109/ACCESS.2021.3111790.
[24] A. Nguyen, A. Longa, M. Luca, J. Kaul and G. Lopez, "Emotion Analysis Using Multilayered
     Networks for Graphical Representation of Tweets," in IEEE Access, vol. 10, pp. 99467-99478,
     2022, doi: 10.1109/ACCESS.2022.3207161.
[25] I. Shubin, I. Kyrychenko, P. Goncharov and S. Snisar, "Formal representation of knowledge for
     infocommunication computerized training systems," 2017 4th International Scientific-Practical
     Conference Problems of Infocommunications. Science and Technology (PIC S&T), Kharkov,
     Ukraine, 2017, pp. 287-291, doi: 10.1109/INFOCOMMST.2017.8246399.
[26] Krivoulya G., Tokariev V., Ilina I., Lebediev O., Shcherbak V. Algorithm of Iterations of
     Distribution of Subtasks Between «S-Bot» in One «Swarm-Bot» System // Proceedings of the 6th
     International Conference on Computational Linguistics and Intelligent Systems: (COLINS 2022).
     CEUR Workshop Proceedings., 12-13 may. 2022 y. - Gliwice, Poland, 2022. - P. 1531-1541.
[27] Clare Llewellyn, Laura Cram, Adrian Favero, and Robin L. Hill. Russian Troll Hunting in a Brexit
     Twitter Archive. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries
     (JCDL ’18). Association for Computing Machinery, New York, NY, USA, 361–362. 2018. URL:
     https://doi.org/10.1145/3197026.3203876/.
[28] A. Ramalingaiah, S. Hussaini, S. Chaudhari. Twitter bot detection using supervised machine
     learning. ICMAI 2021. Journal of Physics: Conference Series. 1950 (2021) 012006. 2021. URL:
     https://iopscience.iop.org/article/10.1088/1742-6596/1950/1/012006/pdf/.
[29] Bilal Ghanem, Davide Buscaldi, and Paolo Rosso. TexTrolls: Identifying Russian Trolls on Twitter
     from a Textual Perspective. 2019. URL: https://arxiv.org/pdf/1910.01340.pdf/.
[30] Shangbin Feng, Zhaoxuan Tan, Herun Wan, Ningnan Wang, Zilong Chen, Binchi Zhang, Qinghua
     Zheng, Wenqian Zhang, Zhenyu Lei, Shujie Yang, Xinshun Feng, Qingyue Zhang, Hongrui Wang,
     Yuhan Liu, Yuyang Bai, Heng Wang, Zijian Cai, Yanbo Wang, Lijing Zheng, Zihan Ma, Jundong
     Li, Minnan Luo. 2023. TwiBot-22: Towards Graph-Based Twitter Bot Detection. 36th Conference
     on       Neural     Information       Processing      Systems,     12      Feb    2023.    URL:
     https://arxiv.org/pdf/2206.04564.pdf/.
[31] C. W. Seah, H. L. Chieu, K. M. A. Chai, L. Teow, and L. W. Yeong. Troll detection by domain-
     adapting sentiment analysis. In 2015 18th International Conference on Information Fusion
     (Fusion). 2015. pp. 792–799.
[32] Paolo Fornacciari, Monica Mordonini, Agostino Poggi, Laura Sani, and Michele Tomaiuolo. A
     holistic system for troll detection on Twitter. Computers in Human Behavior 89 (2018), 258 – 268.
     2018. URL: https://doi.org/10.1016/j.chb.2018.08.008/.
[33] C.J. Hutto and Eric Gilbert. VADER: A Parsimonious Rule-based Model for Sentiment Analysis
     of Social Media Text. In Proceedings of the Eighth International AAAI Conference on Weblogs
     and Social Media. 2015. URL: https://ojs.aaai.org/index.php/ICWSM/article/view/14550/14399/.
[34] Jose Lorenzo C. Capistrano, Jessie James P. Suarez, and Prospero C. Naval. SALSA: Detection of
     Cybertrolls Using Sentiment, Aggression, Lexical and Syntactic Analysis of Tweets. In
     Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics
     (WIMS2019). Association for Computing Machinery, New York, NY, USA, Article Article 10, 6
     pages. 2019. URL: https://doi.org/10.1145/3326467.3326471/.
[35] DataTurks.      Tweets       Dataset    for     Detection    of    Cyber-Trolls,   2020.   URL:
     https://www.kaggle.com/dataturks/dataset-for-detection-of-cybertrolls/.
[36] Todor Mihaylov and Preslav Nakov. Hunting for Troll Comments in News Community Forums,
     2019. URL: https://arxiv.org/pdf/1911.08113.pdf/.
[37] FiveThirtyEight.         Tweets        Dataset       for      Russian-Troll,     2017.     URL:
     https://www.kaggle.com/datasets/fivethirtyeight/russian-troll-tweets/.
[38] Sharonova, N., Kyrychenko, I., Tereshchenko, G., “Application of big data methods in E-learning
     systems”, 2021 5th International Conference on Computational Linguistics and Intelligent Systems
     (COLINS-2021), 2021. – CEUR-WS, 2021, ISSN 16130073. - Volume 2870, РР. 1302-1311.