<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hybrid Model for Detecting Fraudulent Domain Names⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Serhii Buchyk</string-name>
          <email>buchyk@knu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Shabanova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Buchyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktoriia Shmatko</string-name>
          <email>vika.shmatko.24@knu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Kotov</string-name>
          <email>maksym_kotov@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>60 Volodymyrska str., 01033 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>110</fpage>
      <lpage>120</lpage>
      <abstract>
        <p>The relevance of the study is related to the risks arising from the growing number of botnets that use domain generation algorithms (DGA) to avoid detection. The use of DGA makes it difficult to identify malicious servers, creating significant challenges for cyber defense. Traditional machine learning methods require manually created domain features that become ineffective when attack patterns change. This paper proposes a hybrid machine learning model for DGA domain detection that combines high adaptability and accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;botnet</kwd>
        <kwd>domain generation algorithm</kwd>
        <kwd>DGA domains</kwd>
        <kwd>cybersecurity</kwd>
        <kwd>machine learning</kwd>
        <kwd>hybrid model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Despite the decline in the popularity of some remote access trojans (RATs), such as DCRAT (–59%),
NjRAT (–33%), and AsyncRAT (–29%), their overall share of botnet activity remains significant
(30.45%). RATs allow attackers to gain full control over an infected system, making them very
effective in spying campaigns or data theft. For example, Remcos, which has seen a 72% increase, is
actively used to steal credentials, including corporate ones. Cybercriminals can send commands to
an infected computer to obtain sensitive information, such as passwords or files.</p>
      <p>Overall, the slowdown in the growth of botnets is a positive development, but it does not
diminish the relevance of the global implementation of more stringent cyber defense measures.
Botnets remain difficult to detect, and their activity is constantly adapting to new defense methods,
which requires a comprehensive approach to solving problems.</p>
    </sec>
    <sec id="sec-2">
      <title>1.1. Characteristics of cyber attacks using DGA</title>
      <p>
        Domain Generation Algorithms (DGAs) are tools that allow attackers to dynamically create
thousands of unique domains to support command-and-control malware servers [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These
algorithms work based on various approaches, such as arithmetic calculations, hashing, or the use
of dictionaries. Their main goal is to avoid blocking and ensure stable communication between
infected devices and control servers even in the event of a partial infrastructure blockage [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
One of the main advantages of DGAs is the ability to create a large number of domains in a short
time due to the specifics of their formation, as shown in Fig. 2. Attackers can generate thousands of
addresses every day, which makes them much harder to detect and block. Moreover, the dynamic
nature of generation avoids predictability, making such domains more resistant to traditional
detection methods. Another important feature is the minimal dependence on physical
infrastructure. Thanks to DGA, even after blocking some servers, attackers can quickly switch to
new domains.
      </p>
      <p>
        DGAs are implemented using various algorithms that can be classified according to the
approach to domain creation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>One of the most common methods is arithmetic-based algorithms. In this case, domains are
generated using mathematical formulas that take into account parameters such as date, time, or
other variables. For example, the Conficker virus used this approach to create pseudo-random
domains, such as ‘fgavropgu.com’, which ensured the stability of its command-and-control
infrastructure.</p>
      <p>
        Another approach is based on the use of hash functions. In this case, the algorithm generates
domains by calculating a hash of input data, such as strings of text or IP addresses [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This allows
the creation of more complex and less predictable domain names that are difficult to detect. The
Bamital virus used this method to create domains, for example,
‘47faeb4f1b75a48499ba14e9b1cd895a.org’, ensuring the high resilience of its infrastructure.
      </p>
      <p>
        Another great approach is dictionary-based algorithms. In this case, a database of words that
make up names is used to generate domains, making them look more natural and less suspicious to
detection systems. For example, the Matsnu virus generated domains such as
‘catpeakfearinterview.com’ using this approach. This technique can significantly reduce the
number of false positives from cyber defense systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>1.2. Overview of traditional methods for detecting anomalous domains</title>
      <p>DGAs are widely used to organize various types of attacks. In addition to botnets, these algorithms
are used in Ransomware, which transmits encryption keys via dynamically generated domains.
They are also used in phishing campaigns when the seemingly familiar look of domains helps to
deceive users, which makes DGA one of the key tools in the modern arsenal of cybercriminals.
Traditional methods of detecting abnormal domains are based on analyzing static characteristics of
domain names and behavioral features of network traffic. One of the most common approaches is
the use of blacklists that store known malicious domains. While this method is effective for
combating already identified threats, it has significant limitations, as it is unable to respond to new,
previously unknown DGA domains. Another approach is to use regular expressions to detect
abnormal patterns in domains, such as analyzing domain name length, frequency of character
usage, or non-standard structures. However, this method demonstrates limited effectiveness when
dealing with modern DGAs, which can create domains with a fairly standardized look, in
particular, based on dictionaries.</p>
      <p>
        Behavioral methods are more adaptive to dynamic threats and are based on analyzing network
traffic, DNS query frequency, domain lifecycle, and domain relationships. For example, traffic
analysis systems can detect anomalous activity if a particular domain receives a significant number
of simultaneous requests from many IP addresses. However, these methods also have weaknesses,
as they depend on a large amount of historical data and may have a high false positive rate. As a
result, traditional detection methods remain effective only for a limited range of tasks and need to
be supplemented by modern approaches, including those based on artificial intelligence and
machine learning models [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7–9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>1.3. Using artificial intelligence to detect dynamic domains</title>
      <p>
        The introduction of artificial intelligence (AI) technologies in cybersecurity has significantly
improved the efficiency of detecting dynamically generated domains (DGAs), as AI’s ability to
automatically process large amounts of data and identify hidden patterns ensures high accuracy in
detecting new threats that cannot be identified using static approaches [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        One of the key areas of AI application is the analysis of the structural characteristics of domain
names. In particular, machine learning algorithms allow us to identify such features as the
frequency distribution of characters, domain length, morphological features, and the use of special
characters. Classification models such as Random Forest or gradient boosting are used to analyze
these characteristics. In addition, natural language processing (NLP) methods can detect DGA
domains that mimic natural words [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11–13</xref>
        ]. In addition, domain vectorization using Word2Vec or
similar tools is used to find hidden patterns in their structure.
      </p>
      <p>Another important approach is modeling network traffic behavioral patterns. This method is
based on the analysis of parameters such as the frequency of DNS queries, the frequency of domain
accesses, and the distribution of queries by IP address. Recurrent neural networks (RNNs) and Long
Short-Term Memory (LSTM) models allow for analyzing time series of traffic and identifying
anomalies that may be related to DGA activity. Additionally, clustering algorithms such as k-means
help to group domains by behavioral characteristics, which allows you to identify new potentially
malicious groups.</p>
      <p>
        Considerable attention is also paid to deep analysis based on neural networks. Deep learning
models, such as autoencoders, are used to reduce the dimensionality of data and search for
anomalies in the structure of domains [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Generative Adversarial Networks (GANs) allow to
creation of synthetic data for model training, in particular, to simulate the behavior of DGA
domains. In addition, transformers, which have become popular due to text data processing, are
used to analyze complex dependencies in the structure of domains.
      </p>
      <p>
        The practical implementation of such methods has already found its application in modern
cyber defense systems. For example, AI-based tools are integrated into security information and
event management systems (SIEM), such as Splunk or IBM QRadar. They allow analysing DNS
traffic in real-time and identifying abnormal patterns. Open-source solutions, such as
PyDGADetector, use TensorFlow and PyTorch libraries to create customized DGA detection
models, making this approach affordable [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>2. Concept and implementation of the hybrid model</title>
        <p>
          The hybrid model for DGA domain detection combines convolutional neural networks (CNNs),
bidirectional LSTMs (BiLSTMs), and a self-focused mechanism to efficiently analyze local and
global domain name features, as illustrated in Fig. 3. This architecture allows us to take into
account the context and highlight the key features of the data, ensuring high accuracy and
performance of the model. This paper aims to develop a hybrid machine-learning model for
detecting domains generated by algorithms aimed at compromising information systems or
misleading users. The object of the study is the process of detecting malicious domain names
created by domain generation algorithms (DGAs), and the subject is machine learning architectures
that can analyze and classify domain names for their legitimacy [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>The scientific novelty of the work lies in the further development of methods for detecting
fraudulent domains generated by algorithms through the integration of modern machine learning
architectures that provide rapid recognition of potential cyber threats. This approach allows for
increasing the level of cybersecurity due to the model’s ability to adapt to dynamic changes in the
behavior of botnets, which remain the main threat in modern cyberspace.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>2.1. Architectural solutions for the integration of CNN, BiLSTM, and self-attention mechanism</title>
      <p>
        The hybrid model proposed to detect domains generated by DGA algorithms uses a combination of
three key components: BiLSTM [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Self-Attention Mechanism, and Convolutional Neural
Networks (CNN) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. This architecture provides a multi-level analysis of input data, taking into
account local, global, and the most significant features specific to the domains created by the
algorithms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        BiLSTM is the basis of the model, which allows taking into account the context of characters in
a sequence both from left to right and from right to left, which ensures the model’s ability to
analyze complex patterns in domain names that cannot be analyzed by simple pattern analysis. The
use of bidirectional analysis helps to take into account the relationships between characters, which
increases the accuracy and reliability of the classification. In addition, BiLSTM’s retention and
forgetting mechanisms allow you to focus only on the most relevant features, ignoring irrelevant
information [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        The self-focus mechanism adds to the model the ability to highlight key features in domain
names. It focuses on the most relevant parts of the data, ignoring secondary elements, which
reduces noise and improves classification accuracy to detect domains that include random or
artificially generated sequences. The self-awareness mechanism makes the model more resilient to
challenging conditions and ensures high performance in real-time data processing [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>Convolutional Neural Networks (CNNs) perform domain structural analysis by identifying local
patterns that are specific to malicious domains. This component of the model extracts information
about character frequency, domain name length, and specific sequences, which allows for a better
understanding of the structure of the input data. Thanks to dynamic learning, CNNs can adapt to
changes in domain patterns and extract important features even from new types of data.</p>
    </sec>
    <sec id="sec-6">
      <title>2.2. Data preparation features: tokenization and standardization</title>
      <p>
        The data preparation process is an important step in ensuring model accuracy. Domain names
undergo tokenization—breaking them down into individual characters or bigrams, allowing us to
preserve the structure and relationships between the elements of the domain name necessary for
model analysis. After tokenization, the data is standardized: domain names are leveled to the same
length by adding zero values (padding) or trimming redundant characters [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        Filtering is also performed: characters that are not relevant to the classification, such as rare or
special characters, are removed. This process helps to reduce the dimensionality of the problem and
reduces the load on the model while preserving the key characteristics of the domains [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This
approach allows the model to receive clean and standardized data, which is the basis for successful
training and accurate prediction.
      </p>
    </sec>
    <sec id="sec-7">
      <title>2.3. Model implementation in TensorFlow and Keras: algorithms and optimization</title>
      <p>The model was implemented using TensorFlow and Keras, which provide high flexibility and
support for complex architectures. The model architecture is built using a modular approach that
allows changing configurations, testing different parameters, and identifying the optimal ones. The
model uses convolutional layers with different filter sizes that remove local features and BiLSTM
that takes into account contextual dependencies. A self-awareness mechanism is integrated to
focus on key sequence features.</p>
      <p>
        The model is trained using the Adam optimizer, which ensures a fast and stable reduction of the
loss function. Validation is performed after each epoch, and an early stopping mechanism is used to
prevent overfitting. To improve performance and reduce training time, GPU computing is
implemented. Thanks to this approach, the model demonstrates high performance and accuracy,
which allows it to be effectively used for real-time detection of DGA domains [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <sec id="sec-7-1">
        <title>3. Experimental research</title>
        <p>The effectiveness of the proposed hybrid model for domain name classification is determined by
comprehensive testing based on key metrics. Evaluation of accuracy, precision, completeness, and
F1-indicator reveals the advantages of the chosen architecture in comparison with traditional
approaches. This approach provides not only a quantitative characterization of the model’s
performance but also emphasizes its practical applicability in real-world cyber defense.</p>
        <p>The study is aimed at comparing the performance of the hybrid model with current popular
approaches to DGA domain detection, including Random Forest, SVM, and other machine learning
models. An important aspect is also the analysis of the computational performance and scalability
of the model, particularly in the context of its integration into real-time systems. This allows us to
determine the model’s potential for widespread implementation in automated threat monitoring
systems.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>3.1. Evaluation of the model’s effectiveness by key metrics</title>
      <p>The key metrics used to evaluate the model’s performance were accuracy, precision, completeness,
and F1 indicators. These indicators are necessary to understand the strengths and weaknesses of
the model in separating between positive (fraudulent) and negative (legitimate) classes.</p>
      <p>
        They are calculated based on four possible outcomes: true positive (TP); false positive (FP); true
negative (TN); and false negative (FN). TP corresponds to cases when the model correctly predicts a
positive class. Thus, in the context of detecting fraudulent domains, this result is true when a
resource is correctly identified as fraudulent and it is. FPs correspond to cases when the model
incorrectly predicts a positive class. In other words, a false positive occurs when a legitimate
domain is mistakenly identified as malicious. TNs are results when the model correctly predicts a
negative class. A true negative result occurs when a domain is correctly detected as legitimate and
is indeed so. FN are results when a negative class is incorrectly predicted. In other words, a false
negative occurs when the model cannot identify a fraudulent resource and incorrectly classifies it
as legitimate [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>
        Accuracy is a fundamental metric that measures the overall correctness of an ML model. It
quantifies the ratio of correctly predicted cases (both true positive and true negative) to the total
number of instances in the dataset, as defined in formula (1). High accuracy indicates the ability of
the model to make correct predictions about positive (fraudulent) and negative (legitimate)
cases [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>Accuracy = TP +TN , (1)</p>
      <p>TP +TN + FP + FN</p>
      <p>Accuracy measures the correctness of positive predictions made by the model. It quantifies the
ratio of true positive predictions to the total number of cases predicted as positive (TP, and FP), as
formulated in formula (2). Accuracy is particularly important because it measures the model’s
ability to avoid misclassifying legitimate domains as malicious. A value of high accuracy means a
low frequency of false positives.</p>
      <p>Precision= TP , (2)</p>
      <p>TP + FP</p>
      <p>Completeness, also known as sensitivity or true positive rate, assesses the ability of a model to
identify all positive cases in a dataset. It measures the ratio of true positive predictions to the total
number of actual positive cases (true positive and false negative), as formulated in equation (3). A
high level of completeness is crucial, as it indicates the model’s ability to detect a significant
proportion of actual online threats while minimizing false negatives.</p>
      <p>Recall= TP , (3)</p>
      <p>TP + FN</p>
      <p>The F1 indicator is the harmonic mean of accuracy and completeness, which provides a
balanced assessment of the model’s performance, as formulated in the formula (4).</p>
      <p>F 1= 2 × Precision × Recall , (4)</p>
      <p>Precision+ Recall</p>
      <p>This setting is preferred for situations where both high accuracy and completeness are required.
F1 is especially necessary when striking a balance between accurately identifying fraudulent
websites and minimizing false positives is crucial.</p>
      <p>
        Thus, the use of the described metrics contributes to an objective assessment of the model’s
performance and its ability to accurately identify classes of input data [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>The accuracy of the model was 99.25%, which indicates a high ability of the model to correctly
classify both legitimate and fraudulent domains. The precision reached 99.26%, indicating a
minimum number of false positive classifications, which is important in the context of ensuring
access to legitimate resources. The model’s completeness was 99.25%, demonstrating the model’s
ability to detect real threats without significantly missing fraudulent domains. The F1 score, which
takes into account both accuracy and completeness, was 99.25%, highlighting the model’s balance
and reliability across the different types of domains shown in Fig. 4.</p>
      <p>
        The learning dynamics confirm the effectiveness of the chosen architecture: throughout ten
epochs, there was a steady increase in the accuracy and F1-indicator, which approached one. This
indicates the model’s ability to generalize the acquired knowledge for invisible data. The graphs of
training and validation metrics show that the model successfully adapts to the data, minimizing
classification errors even when the test data sets are varied [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
    </sec>
    <sec id="sec-9">
      <title>3.2. Comparison of the hybrid model with popular approaches</title>
      <p>The study compared the hybrid model with traditional algorithms such as Random Forest, Support
Vector Machines (SVM), and simple metric-based neural networks, as shown in Fig. 5. The hybrid
model outperformed these approaches across all key metrics, demonstrating a significant
advantage in detecting complex patterns in domain names. For example, compared to Random
Forest, the model’s accuracy increased by more than 15%, and its precision and completeness
exceeded the SVM’s results by 10%.</p>
      <p>The main advantages of the hybrid model are its ability to integrate local and global data
features through a combination of CNN and BiLSTM, as well as a self-focusing mechanism. This
allows it to better handle uneven and non-standard domain names, which are typical for DGA. At
the same time, the model is more resource-intensive due to the need to use GPUs for optimal
training, which may be a limitation for use in low-power environments.</p>
    </sec>
    <sec id="sec-10">
      <title>3.3. Analysis of computational performance and scalability in real conditions</title>
      <p>The hybrid model training process was optimized to ensure efficient use of computing resources.
The model’s training time of 3 minutes and 46 seconds demonstrates its high performance (Fig. 6),
even when processing large amounts of data, which allowed the model to achieve high results in a
limited time, which is critical for rapid deployment in real-world systems.</p>
      <p>Predictions on the test dataset were performed with an average time of 696 milliseconds per
prediction, indicating the model’s ability to operate in real-time, demonstrating the ability to
process large datasets at high speed, making it suitable for enterprise networks and critical
infrastructures.</p>
      <sec id="sec-10-1">
        <title>Conclusions</title>
        <p>The model code should consist of three modules: the first is aimed at preparing and processing
data, the second is responsible for creating and training the model, and the third is focused on
evaluating the results.</p>
        <p>Central aspects of the model include:</p>
        <p>Data preparation: the model correctly processes domain names, converting them into
tokens at the character level.
2. Structure: a hybrid framework with CNN for feature extraction from text, BiLSTM for
context analysis in both directions, and a self-attention mechanism for emphasizing
significant characters to improve the model’s accuracy.
3. Optimization: the use of the Adam optimizer and mechanisms for early stopping and saving
the best model helps to avoid overtraining and provides balanced learning.
4. Evaluation and visualization of results: using a set of metrics such as accuracy, precision,
completeness, F1-index, and visualization of data through learning curves and mismatch
matrices provides a comprehensive report on the model’s performance.</p>
        <p>The model demonstrates the potential for highly accurate domain classification, which is key in
the fight against targeted cyber threats. Its flexible structure also allows it to be easily adapted to
modern security requirements and data types.</p>
        <p>The developed solution can be implemented in real-world cybersecurity applications, providing
a foundation for further research and expansion in this area. The integration of the developed
model is a priority and potentially very useful task given the constant growth of cyber threats.
Given that many cyberattacks begin with the disguise of fraudulent domains as legitimate ones, the
ability to quickly and accurately identify such resources will significantly reduce the risk of threats
being realized.</p>
        <p>
          Implementation of this model in cybersecurity systems such as firewalls, intrusion detection
systems, and endpoint security utilities, systems that were presented by the authors in [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], should
provide more thorough and reactive protection. In addition, browser extensions with the
implemented model will provide an additional layer of link verification.
        </p>
        <p>Declaration on Generative AI
While preparing this work, the authors used the AI programs Grammarly Pro to correct text
grammar and Strike Plagiarism to search for possible plagiarism. After using this tool, the authors
reviewed and edited the content as needed and took full responsibility for the publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Spamhaus botnet threat update Q4 2024</article-title>
          . URL: https://info.spamhaus.com/hubfs/Botnet%20 Reports/Jul-Dec%
          <volume>202024</volume>
          %20Botnet%
          <fpage>20Threat</fpage>
          %
          <fpage>20Update</fpage>
          .
          <source>pdf?hsCtaTracking=2d000669- 926f-444b-b656-98782e9af734%7C9e5ecf5c-f3a0-4532-b871-bc3213691253</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>D.</surname>
          </string-name>
           Ruts,
          <article-title>Improved DGA-based botnet detection through context-related feature selection based on packet flow information</article-title>
          ,
          <source>Master's Thesis</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
             
            <surname>Kapan</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
           E. Gunal,
          <article-title>Improved phishing attack detection with machine learning: A comprehensive evaluation of classifiers and features</article-title>
          ,
          <source>Appl. Sci</source>
          .
          <volume>13</volume>
          (
          <issue>24</issue>
          ) (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/app132413269
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
             
            <surname>Buchyk</surname>
          </string-name>
          , et al.,
          <article-title>DGA domain detection in Splunk with a hybrid machine learning model</article-title>
          ,
          <source>in: 2024 IEEE 17th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering</source>
          ,
          <year>2024</year>
          ,
          <fpage>261</fpage>
          -
          <lpage>264</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCSET64720.
          <year>2024</year>
          .10755590
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
             
            <surname>Sriram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
             P. 
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
           
          <article-title>Alazab, Malicious URL detection using deep learning</article-title>
          ,
          <source>Authorea Preprints</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
             
            <surname>Thammareddi</surname>
          </string-name>
          , et al.,
          <article-title>Analysis On cybersecurity threats in modern banking and machine learning techniques for fraud detection</article-title>
          ,
          <source>An Int. Multidisciplinary Online J</source>
          . (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Adamantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Sokolov</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
           Skladannyi,
          <article-title>Evaluation of state-of-the-art machine learning smart contract vulnerability detection method, Advances in Computer Science for Engineering and Education VII, vol</article-title>
          .
          <volume>242</volume>
          (
          <year>2025</year>
          )
          <fpage>53</fpage>
          -
          <lpage>65</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -84228-
          <issue>3</issue>
          _
          <fpage>5</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Buhas</surname>
          </string-name>
          , et al.,
          <article-title>Using machine learning techniques to increase the effectiveness of cybersecurity</article-title>
          ,
          <source>in: Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>3188</volume>
          , no.
          <issue>2</issue>
          (
          <year>2021</year>
          )
          <fpage>273</fpage>
          -
          <lpage>281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Zhebka</surname>
          </string-name>
          , et al.,
          <article-title>Methodology for predicting failures in a smart home based on machine learning methods</article-title>
          ,
          <source>in: Workshop on Cybersecurity Providing in Information and Telecommunication Systems, CPITS</source>
          , vol.
          <volume>3654</volume>
          (
          <year>2024</year>
          )
          <fpage>322</fpage>
          -
          <lpage>332</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname> Andresini</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
           
          <article-title>Appice, AI meets cybersecurity</article-title>
          .
          <source>J Intell. Inf. Syst</source>
          . (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
             
            <surname>Romanovskyi</surname>
          </string-name>
          , et al.,
          <article-title>Accuracy improvement of spoken language identification system for close-related languages, Advances in Computer Science for Engineering and Education VII, vol</article-title>
          .
          <volume>242</volume>
          (
          <year>2025</year>
          )
          <fpage>35</fpage>
          -
          <lpage>52</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -84228-
          <issue>3</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O.</given-names>
             
            <surname>Iosifova</surname>
          </string-name>
          , et al.,
          <article-title>Analysis of automatic speech recognition methods</article-title>
          ,
          <source>in: Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>2923</volume>
          (
          <year>2021</year>
          )
          <fpage>252</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[13] I. </source>
          <string-name>
            <surname>Iosifov</surname>
          </string-name>
          , et al.,
          <article-title>Natural language technology to ensure the safety of speech information</article-title>
          ,
          <source>in: Cybersecurity Providing in In-formation and Telecommunication Systems</source>
          , vol.
          <volume>3187</volume>
          , no.
          <issue>1</issue>
          (
          <year>2022</year>
          )
          <fpage>216</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname> Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
             
            <surname>Gangwani</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
           
          <article-title>Upadhyay, Integration of machine learning with cybersecurity: applications and challenges</article-title>
          ,
          <source>in: Artificial Intelligence in Cyber Security: Theories and Applications. Intelligent Systems Reference Library</source>
          , vol.
          <volume>240</volume>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A. S.</given-names>
             
            <surname>Saabith</surname>
          </string-name>
          , et al.,
          <article-title>A survey of machine learning techniques for anomaly detection in cybersecurity</article-title>
          ,
          <source>Int. J. Res. Eng. Sci</source>
          . (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <article-title>5 Types of LSTM recurrent neural networks</article-title>
          ,
          <year>2023</year>
          . URL: https://www.exxactcorp.com/blog/ Deep-Learning/5
          <article-title>-types-of-lstm-recurrent-neural-networks-and-what-to-do-with-them</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17] H. C. 
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
           R. Roth,
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning</article-title>
          ,
          <source>IEEE Transact. Medical Imaging</source>
          <volume>35</volume>
          (
          <issue>5</issue>
          ) (
          <year>2016</year>
          )
          <fpage>1285</fpage>
          -
          <lpage>1298</lpage>
          . doi:
          <volume>10</volume>
          .1109/TMI.
          <year>2016</year>
          .2528162
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
             
            <surname>Alshingiti</surname>
          </string-name>
          , et al.,
          <article-title>A deep learning-based phishing detection system using CNN, LSTM, and LSTM-CNN</article-title>
          . Electronics,
          <year>2023</year>
          .
          <article-title>4 Types of Classification Tasks in Machine Learning</article-title>
          ,
          <year>2020</year>
          . URL: https://machinelearningmastery.com
          <article-title>/types-of-classification-in-machine-learning</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. A. Al</given-names>
            <surname>Odaini</surname>
          </string-name>
          , et al.,
          <article-title>Cybersecurity in public space: Leveraging CNN and LSTM for proactive multivariate time series classification</article-title>
          .
          <source>In: IEEE International Conference on Big Data</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname> Lin</surname>
          </string-name>
          , et al.,
          <article-title>A new method for heart rate prediction based on LSTM-BiLSTM-</article-title>
          <string-name>
            <surname>Att</surname>
          </string-name>
          , Measurement,
          <volume>207</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <article-title>Unicode security mechanisms for</article-title>
          <source>UTS #39</source>
          ,
          <string-name>
            <surname>2023</surname>
            <given-names>URL</given-names>
          </string-name>
          : https://www.unicode.org/Public/ security/latest/confusables.txt
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
             
            <surname>Plohmann</surname>
          </string-name>
          , et al.,
          <article-title>A comprehensive measurement study of domain generating malware, The 25th USENIX Security Simposium</article-title>
          , USENIX Association,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
             
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <source>Machine learning on big data: Opportunities and challenges, Neurocomputing</source>
          <volume>237</volume>
          (
          <year>2017</year>
          )
          <fpage>350</fpage>
          -
          <lpage>361</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2017</year>
          .
          <volume>01</volume>
          .026
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>F. D.</surname>
          </string-name>
           Keles,
          <string-name>
            <given-names>P.</given-names>
             M. 
            <surname>Wijewardena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <article-title>On the computational complexity of self-attention</article-title>
          ,
          <source>in: 34th International Conference on Algorithmic Learning Theory</source>
          , vol.
          <volume>201</volume>
          ,
          <year>2023</year>
          ,
          <fpage>597</fpage>
          -
          <lpage>619</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
             
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
             N. 
            <surname>Aziz</surname>
          </string-name>
          ,
          <article-title>Data preprocessing and feature selection for machine learning intrusion detection systems, ICIC Express Lett</article-title>
          , vol.
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <year>2019</year>
          ,
          <fpage>93</fpage>
          -
          <lpage>101</lpage>
          . doi:
          <volume>10</volume>
          .24507/icicel.13.02.93
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M. N.</given-names>
             
            <surname>Alam</surname>
          </string-name>
          , et al.,
          <article-title>Phishing attacks detection using machine learning approach</article-title>
          ,
          <source>in: 3rd Int. Conf. Smart Syst. Inventive Technol</source>
          .
          <source>(ICSSIT)</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/ICSSIT48917.
          <year>2020</year>
          .9214225
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
             
            <surname>Gupta</surname>
          </string-name>
          , et al.,
          <article-title>Machine learning models for secure data analytics: A taxonomy and threat model</article-title>
          ,
          <source>Comput. Commun</source>
          .
          <volume>153</volume>
          (
          <year>2020</year>
          )
          <fpage>406</fpage>
          -
          <lpage>440</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.comcom.
          <year>2020</year>
          .
          <volume>02</volume>
          .008
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>G.</given-names>
            <surname> Logeswari</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
           Bose,
          <string-name>
            <surname>T.</surname>
          </string-name>
           
          <article-title>Anitha, An intrusion detection system for SDN using machine learning</article-title>
          ,
          <source>Intell. Autom. Soft Comput</source>
          .
          <volume>35</volume>
          (
          <issue>1</issue>
          ) (
          <year>2023</year>
          )
          <fpage>867</fpage>
          -
          <lpage>880</lpage>
          . doi:
          <volume>10</volume>
          .32604/iasc.
          <year>2023</year>
          .026769
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
             
            <surname>Toliupa</surname>
          </string-name>
          , et al.,
          <article-title>Building an intrusion detection system in critically important information networks with application of data mining methods</article-title>
          ,
          <source>in: 16th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering</source>
          ,
          <year>2022</year>
          ,
          <fpage>128</fpage>
          -
          <lpage>133</lpage>
          . doi:
          <volume>10</volume>
          .1109/TCSET55632.
          <year>2022</year>
          .9767029
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>