<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modern strategies for data leak detection and prevention in corporate networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anatoliy Sachenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petro Vizhevskyi</string-name>
          <email>vizhevskyipv@khmnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Savenko</string-name>
          <email>savenko_oleg_st@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktor Ostroverkhov</string-name>
          <email>v.ostroverkhov@wunu.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maslyyak</string-name>
          <email>as@wunu.edu.ua</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Casimir Pulaski Radom University</institution>
          ,
          <addr-line>26-600 Radom</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Khmelnytskyi National University</institution>
          ,
          <addr-line>Khmelnytskyi, 29016</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>West Ukrainian National Unversity</institution>
          ,
          <addr-line>Ternopil, 46009</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As companies handle ever-growing stores of sensitive information, from proprietary research to customer data, the threat of unauthorized disclosure escalates. Traditional Data Loss Prevention (DLP) measures, relying on static content matching and signature-based detection, have proven inadequate in detecting transformed or obfuscated sensitive information, particularly in environments that embrace remote work, Bring Your Own Device (BYOD) policies, and third-party integrations. This paper surveys the limitations of such conventional DLP systems and examines novel detection methodologies, including graph-based semantic analysis, probabilistic bigraph models, and context-aware anomaly detection, each addressing distinct facets of modern data leakage scenarios. Furthermore, the paper reviews prevention strategies that involve multi-layered defenses, robust encryption, secure file systems, and dynamic deception techniques to broaden the scope of adversarial deterrence. A primary contribution of this study is a genetic-algorithm-driven method for detecting data leaks. Experiments on real data-leak datasets show that the method matches or surpasses the performance of standard baselines, including Naive Bayes and SVM, while maintaining low computational overhead. Future research should explore a dynamic ensemble in which the genetic algorithm assigns weights to multiple detection modules, thereby reducing false positives and keeping pace with evolving threat landscapes and corporate data practices. The paper concludes by underscoring the necessity of a multilayered, continuously evolving DLP architecture, arguing that only through integrated and adaptive solutions can enterprises effectively safeguard their critical assets in an increasingly interconnected digital landscape.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Loss Prevention</kwd>
        <kwd>Anomaly Detection</kwd>
        <kwd>Secure File Systems</kwd>
        <kwd>Cloud Security</kwd>
        <kwd>Dynamic Deception</kwd>
        <kwd>Genetic Algorithms 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In today’s interconnected digital landscape, protecting sensitive corporate data has become
increasingly critical. Organizations now manage enormous volumes of both structured and
unstructured data, ranging from emails and internal reports to intellectual property and customer
records. This surge in data generation has not only heightened operational efficiencies but has also
expanded the potential avenues for unauthorized data disclosure. Data leakage, whether through
inadvertent mistakes by insiders or deliberate malicious actions, poses severe risks, including
significant financial losses, reputational damage, and non-compliance with stringent regulatory
frameworks.
0000-0002-0907-3682 (A. Sachenko); 0009-0009-4851-0839 (P. Vizhevskyi); 0000-0002-4104-745X (O. Savenko);
00000002-3818-0604 (V. Ostroverkhov); 0000-0002-9671-7617 (B. Maslyyak)</p>
      <p>
        Recent research highlights that traditional security mechanisms, which are predominantly
designed to defend against external cyber threats, are often insufficient when it comes to
monitoring internal data flows. Many conventional Data Loss Prevention (DLP) systems rely on
static content matching or predetermined patterns, which can falter when sensitive data undergoes
transformations such as editing, reformatting, or partial redaction. For example, one study
demonstrated that by representing documents as weighted graphs, it is possible to capture
contextual sensitivity and detect modified data that would otherwise bypass standard detection
methods [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In parallel, probabilistic models using bigraph representations have been introduced to
statistically assess how sensitive data is distributed among various entities within an organization.
These models underscore the importance of statistical analysis in tracking subtle changes in data
flow that traditional DLP techniques might miss [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A comprehensive review of existing DLP
methodologies further points out that approaches such as watermarking and content
fingerprinting, while useful in certain scenarios, often struggle with the complexity introduced by
insider threats and the dynamic nature of modern data formats [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Moreover, as organizations embrace modern workplace practices like BYOD (Bring Your Own
Device) and remote work, the security perimeter becomes increasingly porous. Advanced DLP
architectures are now required to monitor a heterogeneous mix of devices and endpoints without
disrupting everyday operations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Complementing these technical challenges, studies employing
anomaly detection in relational databases have illustrated that monitoring the behavioral patterns
of applications can offer an effective second layer of defense, further reinforcing the need for
integrated, multi-dimensional approaches to data leak prevention [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Another emerging perspective is the concept of contextual integrity, which shifts the focus from
merely detecting static content to evaluating the appropriateness of information flows between
entities. This approach considers the relationships between senders, recipients, and the underlying
data attributes, offering a more nuanced method to differentiate between legitimate and suspicious
data exchanges [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In environments where language complexity and document transformations
present additional hurdles, techniques based on morphological analysis have also been explored,
particularly for languages with intricate grammatical structures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Beyond software-centric solutions, research into physical and network-level vulnerabilities,
including electromagnetic leakage from hardware, emphasizes that comprehensive data protection
requires both digital and physical security measures [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Meanwhile, broader vulnerability
assessments and comparative analyses of DLP systems reveal that an effective data protection
strategy must combine technical innovations, such as big data analytics and machine learning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
with cost-effective, low-intrusive solutions tailored to the operational realities of modern
enterprises [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Recent advancements have also seen the integration of dynamic deception techniques, where
the system deliberately alters or obfuscates data to expand the perceived attack surface, thus
increasing the difficulty for attackers to extract genuine information. Such strategies complement
conventional DLP mechanisms and provide an additional layer of resilience against both external
and insider threats [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Finally, secure file system architectures designed specifically to address insider threats have
been proposed, aiming to offer transparent protection without impeding user productivity. These
systems leverage virtual file system techniques and are evaluated based on their ability to encrypt
and monitor data flows without introducing significant overhead 1[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Additionally, tagging
mechanisms that transform unstructured data into managed content repositories have emerged as
a promising method to control information dissemination within an organization [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>This paper aims to comprehensively survey the strengths and weaknesses of existing data leak
detection and prevention strategies and then propose a novel method for data leak detection based
on genetic algorithm that can be integrated into adaptive framework that unifies the most
promising techniques within an ensemble approach guided by genetic algorithms. By dynamically
weighting and combining modules, including morphological analysis, context-aware anomaly
detection, time stamp-based classification, and moving target defenses, organizations can more
effectively handle the complex mix of modern threats without continuous manual tuning.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Understanding Data Leaks</title>
      <p>
        Data leakage involves the unintended or unauthorized dissemination of confidential or sensitive
information outside the boundaries of an organization. This phenomenon may result from both
inadvertent mistakes by employees and deliberate actions by insiders or external adversaries. The
concept encompasses a range of incidents, including simple errors like accidental file sharing and
sophisticated cyberattacks that exploit system vulnerabilities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Sensitive information within organizations can be categorized into different states, each
presenting unique risks. Data at Rest is information stored on servers, databases, or external
storage devices. Breaches in this category often occur when unauthorized individuals gain physical
or remote access to these storage systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Data in Motion, which is actively transmitted
across networks via emails, file transfers, or cloud synchronization, is vulnerable to interception.
Effective protection in this state typically relies on secure transmission protocols and encryption
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Data in Use is actively processed or accessed by applications or users. Leakage at this stage is
frequently associated with insider threats or the exploitation of application-level vulnerabilities,
which may not be adequately addressed by traditional perimeter-based security measures [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Various factors contribute to data leakage. Individuals within an organization, whether through
negligence or malice, represent a significant threat. Studies show that a considerable percentage of
breaches can be traced back to insiders who inadvertently expose sensitive information [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Many
traditional DLP systems focus on static data patterns and keyword matching. These methods may
fail when data is modified, for example, by reformatting or partial redaction, before it is exfiltrated.
Advanced detection techniques are needed to accommodate these transformations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
increasing use of cloud services, mobile devices, and remote work arrangements expands the
potential leakage points. This diversity creates challenges in monitoring data consistently across
various platforms and endpoints [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The consequences of data leakage can lead to substantial direct costs, including regulatory fines,
litigation expenses, and remediation costs, along with indirect losses resulting from operational
disruptions and reduced market confidence [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Exposure of sensitive information can severely
damage an organization’s reputation, leading to a loss of customer trust and competitive edge. The
detection and mitigation process can strain organizational resources, particularly when security
systems generate excessive false positives that interfere with normal business operations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Threat Landscape and Attack Vectors</title>
      <p>
        Modern corporate networks face a dynamic and diverse threat landscape in which both internal
and external actors exploit various vulnerabilities to cause data leakage. Insider threats remain one
of the most challenging aspects of data security. These threats emerge when employees or trusted
individuals, either through carelessness or malicious intent, expose or intentionally leak
confidential data. Several studies emphasize that insiders are often responsible for a significant
portion of data breaches due to their extensive access rights and familiarity with internal systems.
For instance, research on data leakage detection has shown that traditional methods often struggle
to accurately monitor insider behavior, especially when the leaked data is intentionally altered to
evade standard controls [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Additionally, survey results on machine learning-based DLP
approaches indicate that insider actions, whether accidental or deliberate, require more adaptive
detection techniques to effectively distinguish between normal operational patterns and suspicious
behavior [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        External adversaries continuously evolve their tactics to breach corporate defenses. Attackers
exploit vulnerabilities in network protocols, unpatched software, and misconfigured systems to
gain unauthorized access [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In particular, ransomware and phishing campaigns have emerged as
prevalent forms of external attacks. Advanced systems that monitor application behavior and data
flow anomalies have been shown to detect such threats with higher accuracy, yet the rapid
evolution of malware strains often presents new challenges that traditional signature-based
methods cannot address [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Another emerging vector in the data leakage external threat landscape is the exploitation of
botnet networks. Botnets, which consist of numerous compromised devices coordinated through a
centralized command-and-control infrastructure, are increasingly being used not only for
distributed denial-of-service attacks but also for exfiltrating sensitive data. Botnets are organized in
multiple tiers, with a command-and-control center directing intermediate control nodes and basic
bot elements. This hierarchical structure enables attackers to remotely control a vast number of
endpoints, aggregating small amounts of leaked data in a stealthy, distributed manner that can
evade traditional data leakage prevention systems [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]. The dynamic and decentralized nature
of botnets makes it especially challenging for conventional security measures, which are typically
designed to detect static or predictable data flows, to identify and mitigate such threats. As botnets
continue to evolve, integrating specialized detection mechanisms that focus on identifying botnet
behavior and its associated data exfiltration patterns becomes critical for robust corporate data
security [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ].
      </p>
      <p>
        In today’s interconnected IT environment, organizations increasingly rely on third-party
services, cloud platforms, and external vendors. This reliance creates additional vectors for data
leakage, as vulnerabilities in supply chains or partner networks can serve as conduits for sensitive
information to be exfiltrated. Research into BYOD policies and cloud-based DLP systems highlights
that gaps in third-party security controls can lead to unmonitored data flows, making it imperative
for enterprises to incorporate comprehensive risk assessments and stringent access controls across
all external interfaces [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Furthermore, cost-effective strategies for cloud data protection are
critical, particularly for small and medium-sized enterprises, as they face unique challenges in
balancing security needs with limited resources [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        A significant challenge in detecting data leaks arises from attackers deliberately transforming or
obfuscating data to evade traditional DLP systems. Techniques such as content modification,
insertion of benign text, or even partial redaction are used to mask sensitive information. Emerging
detection models, including adaptive graph-based methods and contextual integrity frameworks,
address these challenges by focusing on the underlying semantics and relationships within the data
rather than relying solely on fixed patterns [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Such methods are especially effective in
environments where data undergoes frequent transformations during routine operations, ensuring
that even altered data is subject to robust monitoring.
      </p>
      <p>
        Beyond purely digital threats, physical and side-channel attacks also contribute to the data
leakage landscape. These attacks exploit non-traditional vectors such as electromagnetic emissions
or hardware vulnerabilities to capture information without directly breaching network security.
Investigations into the security of computer systems have demonstrated that electromagnetic
leakage from displays and peripheral devices can inadvertently expose sensitive information,
underscoring the importance of considering physical security measures alongside digital defenses
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        The diversity of attack vectors, including insider mishaps, external cyberattacks, supply chain
breaches, and physical side-channel exploits, underscores the complexity of the modern data
leakage threat landscape. A successful defense strategy requires a multi-layered approach that
integrates behavioral analysis, adaptive detection techniques, and comprehensive monitoring of
both digital and physical environments. By understanding the interplay of these factors,
organizations can design DLP solutions that are both resilient and responsive to the evolving
nature of cyber threats [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Survey on Data Leak Detection</title>
      <p>Detecting unauthorized disclosure of sensitive information in corporate environments requires a
multifaceted approach. Modern detection techniques have evolved to address not only static data
content but also transformed and obfuscated data, user behavior anomalies, and contextual
irregularities.</p>
      <p>
        One line of research involves representing documents as weighted graphs to capture both the
significance of key terms and their contextual relationships. In these approaches, documents are
converted into graphs where nodes represent sensitive keywords and edges capture their
contextual dependencies. By applying an adaptive weighted graph walk model, systems can
effectively identify cases where data has been altered, for example through partial modifications or
inserted noise, to evade traditional detection methods [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In parallel, probabilistic models that
leverage bigraph representations have been developed to statistically assess the likelihood of data
leakage events by mapping the distribution of sensitive data among entities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Both techniques
focus on overcoming the limitations of fixed-pattern matching by integrating contextual and
statistical analysis into the detection process.
      </p>
      <p>
        Another detection strategy centers on identifying deviations from established behavioral norms.
Systems employing anomaly detection techniques monitor sequences of operations, including
database queries or file access patterns, and compare them against profiles of normal application
behavior. For example, a detection system based on Hidden Markov Models (HMM) creates profiles
from normal program traces, and deviations from these profiles may indicate data leakage attempts
via application misuse [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This approach is especially useful for detecting subtle insider threats
where an authorized user may perform atypical actions that could result in data leakage.
      </p>
      <p>
        Detection techniques grounded in the concept of contextual integrity focus on evaluating
whether information flows adhere to the expected norms within a given environment. Instead of
simply scanning for sensitive keywords, these methods extract semantic flows by employing
advanced natural language processing to verify that data exchange patterns comply with
organizational policies and privacy regulations. By comparing observed communication sequences
against a set of declaratively defined privacy rules, these systems can flag potentially
noncompliant data transfers that may signal a leakage [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Advanced machine learning techniques have been employed to enhance detection accuracy,
particularly when data is unstructured or when it undergoes transformation. Methods based on
morphological analysis decompose text into its constituent parts (e.g., roots, stems, suffixes) to
better capture the semantic content even when superficial changes are made. Combined with
classification algorithms, these techniques help differentiate between benign modifications and
genuine leakage of sensitive information [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Furthermore, surveys of machine learning approaches
in DLP indicate that integrating both supervised and unsupervised learning models can
significantly improve detection precision while reducing false positives[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>
        Some detection systems incorporate temporal information as an additional layer of analysis. For
instance, time stamp-based methods involve clustering documents and assigning temporal labels
during a learning phase. During detection, if the document’s time stamp falls within a critical
period (e.g., before a scheduled public release), the system assigns a higher risk score, potentially
flagging it as confidential 1[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In a complementary approach, content tagging methods organize
data into controlled repositories. By tagging data with predefined labels, organizations can more
easily monitor and restrict the flow of sensitive information across internal networks, thereby
reducing the risk of inadvertent leakage [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Methods that integrate data transformation with moving target defense strategies dynamically
alter the appearance of data, making it more difficult for adversaries to identify and exfiltrate
genuine information. In these systems, deceptive data is generated based on both historical user
behavior and current operational context, thereby increasing the attack cost for adversaries while
preserving data usability for legitimate purposes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Static content matching and fixed-pattern detection, often falter when sensitive information is
disguised through reformatting, morphological changes, or partial redactions. Even more adaptive
models that leverage anomaly detection or context-aware analysis still struggle to handle the
heterogeneous mix of data flows brought by modern workforce practices and diverse endpoint
devices. In large-scale corporate environments, high false-positive rates can overwhelm security
teams, while purely signature-based systems prove ill-equipped against novel threats or insider
misuse. Table 1 is summarizing advantages and shortcomings of selected existing detection
methods.</p>
      <sec id="sec-4-1">
        <title>Identifies likely leaker Accuracy drops with without watermarking, broad sharing, no Structured files or DB rows simple to audit collusion detection</title>
      </sec>
      <sec id="sec-4-2">
        <title>AD-PROM HMM Very low false Needs wide training Anomaly Detector positives, light runtime coverage, mimicry may [5] impact evade</title>
      </sec>
      <sec id="sec-4-3">
        <title>Contextual Integrity [6]</title>
      </sec>
      <sec id="sec-4-4">
        <title>Flags semantic policy Rule maintenance breaches, supports rich effort, NLP errors raise GDPR-style norms false alerts</title>
      </sec>
      <sec id="sec-4-5">
        <title>Application and DB behaviour</title>
      </sec>
      <sec id="sec-4-6">
        <title>Email and text messages</title>
      </sec>
      <sec id="sec-4-7">
        <title>Protection expires</title>
        <p>
          Timestamp-Based automatically when
Sensitivity Scoring data is no longer
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] sensitive, accurate on
fully confidential files
        </p>
      </sec>
      <sec id="sec-4-8">
        <title>Content-Tag Repository Control [13]</title>
      </sec>
      <sec id="sec-4-9">
        <title>Central hub simplifies</title>
        <p>auditing and uniform
policy enforcement,
works across multiple
channels routed
through the repository</p>
      </sec>
      <sec id="sec-4-10">
        <title>Misses partial snippets,</title>
        <p>requires correct expiry</p>
      </sec>
      <sec id="sec-4-11">
        <title>Time-sensitive documents</title>
      </sec>
      <sec id="sec-4-12">
        <title>Users can bypass the repository, mis-tagging undermines protection</title>
      </sec>
      <sec id="sec-4-13">
        <title>Any file stored or sent through the CMS</title>
      </sec>
      <sec id="sec-4-14">
        <title>High precision with Requires labelled SVM Text Classifier moderate training data, corpus, vulnerable to [14] fast inference once newly obfuscated trained terms</title>
      </sec>
      <sec id="sec-4-15">
        <title>Emails, documents, chats Deep Autoencoder Anomaly Detection [14]</title>
      </sec>
      <sec id="sec-4-16">
        <title>Detects previously Computationally unseen leak patterns, intensive, benign works without labelled anomalies may trigger data false flags</title>
      </sec>
      <sec id="sec-4-17">
        <title>Network traffic, system logs, mixed telemetry</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Data Leak Prevention Strategies</title>
      <p>
        Preventing data leakage requires a proactive, multi-layered approach that combines robust policies,
technical safeguards, and adaptive monitoring. A strong foundation for data protection begins with
comprehensive policies that define what constitutes sensitive data and set clear rules for its
handling. Organizations should implement governance frameworks that enforce regulatory
compliance, including adherence to GDPR and other data protection laws, and ensure that
employees are well-trained in data security practices. These frameworks are essential for
establishing accountability and promoting a security-aware culture throughout the enterprise [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        To mitigate consequences of possible data leak deploying strong encryption for data at rest, in
transit, and in use is critical. Advanced encryption techniques, along with rigorous access control
policies, restrict unauthorized users from accessing or extracting sensitive information. Several
studies highlight the importance of integrating these measures into corporate IT environments to
both secure data and provide traceability in case of a breach [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Developing secure file systems
that incorporate on-the-fly encryption and controlled access can significantly mitigate internal
leakage risks. By creating virtual file systems that mirror actual file operations and enforce
encryption/decryption during read and write operations, organizations can transparently protect
data without hampering user productivity [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Some approaches classify data based on critical
time windows by assigning temporal labels during a learning phase and enforcing access
restrictions when documents fall within these sensitive periods. This strategy helps ensure that
information remains confidential until it is meant to be released [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        DLP solution can be grouped by deployment scheme as endpoint, network-wide or mixed [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Network deployed tools continuously monitor data flows across the organization’s network,
identifying and blocking unauthorized transmission of sensitive information. They can inspect
content in real time and enforce policies that prevent leakage over unsecured channels [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. At the
device level, endpoint solutions that monitor user activity are critical for detecting anomalous
actions that may signal insider threats. By comparing current user behavior against established
baselines, these systems help detect and prevent data leakage before it occurs [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Mixed ones
combine some of all attributes of both endpoint and network DLP tools.
      </p>
      <p>
        Emerging prevention techniques leverage context and deception to add a proactive layer of
defenses. Rather than relying solely on static rules, context-aware strategies assess whether
information flows adhere to predefined privacy norms. By analyzing the roles of data senders,
recipients, and the nature of the data exchanged, these systems can dynamically enforce policies
that reflect real-world expectations, reducing the risk of unintentional leakage [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To further
complicate efforts by adversaries, some systems dynamically generate deceptive data. This method
alters the appearance of sensitive data to create a larger, misleading attack surface. Such techniques
not only increase the difficulty for attackers to isolate genuine information but also trigger alarms
when deceptive elements are manipulated, providing early warnings of potential breaches [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        The widespread adoption of cloud services and BYOD policies requires specialized strategies. As
data migrates to cloud environments, integrated DLP systems that monitor both cloud storage and
transmission channels become vital. Hybrid strategies that combine on-premise controls with
cloud-based monitoring enable organizations to maintain visibility over data regardless of its
location, while ensuring compliance with evolving regulatory demands [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In environments
where employees use personal devices for work, DLP strategies must extend to managing these
endpoints. Tailored solutions include enforcing secure access policies, monitoring data flows on
mobile devices, and segregating personal from corporate data to reduce the risk of accidental leaks
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Another approach involves the use of content tagging and the formation of controlled content
repositories. By assigning metadata or labels to sensitive data as soon as it is created or modified,
organizations can track the movement of critical information across systems. This tagging allows
for the automated application of security policies and facilitates quick identification of data that
should not leave a secure repository [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Restricting data movement to specific, monitored portals
or repositories minimizes the exposure of sensitive information. These systems act as gatekeepers,
ensuring that data is only transferred through secure channels and only to authorized destinations.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Data Leak Detection Using Genetic Algorithm</title>
      <p>DLP systems typically classify data in two ways: by formal attributes (metadata such as
“confidential” labels, document type, author, etc.) and by analyzing the actual content (file text,
presence of specific patterns, keywords). The best results are achieved by combining both
approaches, so the proposed method considers both file metadata and content to determine the
information’s sensitivity level.</p>
      <p>
        We propose a classification method built on a genetic algorithm [
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ]. During the tuning
(training) phase, the system receives as input a set of data examples D={d1 , … , dn} labeled as
confidential or non-confidential, yi∈ {0,1}. Each document in the input data array can be
represented as a feature-presence vector:
(1)
(2)
(3)
(4)
where C denotes a set of k IF-THEN rules. Each rule is characterised by two subsequences: a
positive template pq that requires certain features to be present and a negative template nq that
requires certain features to be absent.
      </p>
      <p>A chromosome can therefore be written as:</p>
      <p>C =[ r1|r2|…|rk ] ,
rq={1 , if ( pq· x ≥ 1)∧ (nq· x=0)</p>
      <p>0
y^ ( x )=maxq=1,k rq .</p>
      <p>
        x j=( x j 1 , x j 2 , … , x jm) ,
where the element x ji∈{0 ; 1} indicates the presence or absence of features taken from a
predefined dictionaryT ={t1 , … , t m}; each t j may correspond to a keyword, a metadata field, or a
match to a specific pattern. The GA module gradually evolves a set of rules or a model capable of
classifying new data, and the resulting classifier is integrated into the distributed DLP system as
local agent [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. It inspects the content of files, messages, or other objects, together with their
attributes, in order to determine whether they contain confidential information.
      </p>
      <p>Each chromosome in the genetic algorithm encodes a candidate solution to the classification
problem:</p>
      <sec id="sec-6-1">
        <title>A document is classified as confidential if at least one rule is triggered:</title>
        <p>The chromosome is therefore represented as a bit sequence with total length L=2⋅k⋅m where
k is the number of rules and m is the size of the feature dictionary.</p>
        <p>During evolution each chromosome is evaluated on the training set. The evaluation measures
how well the encoded rules identify confidential data (true detections) and how well they avoid
confusing ordinary data with confidential data (minimising false alarms). A fitness function that
reflects overall classification accuracy is used to score every candidate model. For this purpose, the
counts of true positives TP , false positives FP, true negatives TN, and false negatives FN are
calculated:</p>
        <p>The fitness function to be maximised is defined as a combination of the PrecisionP and Recall R
metrics:</p>
        <p>T P= Σ [ ŷ ( d )=1∧ y =1 ] ,
F P= Σ [ ŷ ( d )=1∧ y =0 ] , .</p>
        <p>F N = Σ [ ŷ ( d )=0∧ y =1 ] ,
T N = Σ [ ŷ ( d )=0∧ y =0 ] .</p>
        <p>P=
R=</p>
        <p>F =</p>
        <p>T P
T P + F P + ε</p>
        <p>T P
T P + F N + ε</p>
        <p>2 P R ,
P + R + ε
,
,
F i t (C )=α · F − β · L ,</p>
        <p>
          Lma x
where ε ≪1 prevents division by zero, and α =0.9 and β =0.1 are penalty coefficients that
control the influence of rule length. At every iteration of the genetic algorithm the fittest
individuals are selected for reproduction using tournament selection [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. After selection, each pair
of parents is combined by single-point crossover. Let s ∈{1 ,…, L−1} be a randomly chosen cut
position; the offspring is
(5)
(6)
(7)
(8)
child =( parent1[1… s ] , parent 2 [ s +1… L]) .
        </p>
        <p>To maintain population diversity every bit in the chromosome is inverted with probability
pmut= 1 . The evolutionary process stops when either the predefined maximum number of</p>
        <p>L
generations is reached or the improvement over the last ten generations falls below tolerance
δ ≪1:</p>
        <p>|Fit bgest – Fit bge−st10|≤δ .</p>
        <p>
          The proposed method derives its classification rules automatically from real data, whereas
conventional DLP systems usually depend on hand-crafted templates and static policies. This
datadriven process reduces reliance on domain experts and enables the system to adapt quickly when
new formats or code words for sensitive information appear. The method produces an explicit set
of IF–THEN rules that security specialists can read and verify, avoiding the opacity typical of
black-box models such as many neural networks [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. Because the logic is transparent, analysts can
explain why a document was marked confidential, which fosters trust and simplifies forensic
investigations; when requirements change, the rules can be edited directly instead of retraining an
entire model.
        </p>
        <p>Proposed method can be utilized in endpoint-based agent data leak detection through three
main phases shown in Figure 1.</p>
        <p>For Offline Training Phase given historical leak data (emails, documents, media files) as input
following steps are executed:</p>
      </sec>
      <sec id="sec-6-2">
        <title>1. Feature extraction (text patterns, metadata).</title>
        <p>2. Rules sets population initialization.
3. Fitness evaluation.
4. Selection of new population based on fitness.
5. Crossover.
6. Mutation.</p>
        <p>Steps 3-6 are repeated until equation (8) is satisfied. Result is evolved set of IF-THEN rules for
classifying data as confidential or not that is used by Online Detection Phase.</p>
        <p>For Online Detection Phase initial input is set of rules, execution phase consists of steps:
1.
2.
3.
4.</p>
        <p>Data (document, email etc.) interaction identified.</p>
        <p>Feature extraction for identified data similar to Offline Training Phase.</p>
        <p>IF-THEN rules applied</p>
        <p>Appropriate action performed (Allow, Block, Log etc.)</p>
        <p>Continuous Learning phase generates new rules that reflect the most recent data landscape, so
the classifier keeps pace with emerging document structures. By evaluating both textual features
and metadata, the system gains a broader inspection context and lowers the likelihood of missing a
leak.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Experiments</title>
      <p>We validate our method across three real-world datasets listed in Table 2 and compare its
performance against established machine learning algorithms. Each dataset underwent extensive
preprocessing to ensure consistent feature extraction and fair comparison across methods.</p>
      <p>The preprocessing pipeline involved dataset-specific stages to ensure high-quality feature
extraction for both genetic algorithm and baseline methods. For the Enron Email Corpus, email
body text and metadata were extracted from raw files with headers parsed to separate content from
routing information. Signatures, quoted replies, and forwarding chains were removed, followed by
text normalization including lowercasing, contraction expansion, and special character removal
while preserving meaningful punctuation. A domain-specific stop word list retained
securityrelevant terms like "confidential" and "restricted." Email addresses were tokenized as
EMAIL_TOKEN with internal/external domain preservation, while attachments were processed
separately with filenames and extensions as additional features.</p>
      <p>The AI4Privacy PII-300k dataset required custom regular expressions to detect and tokenize PII
patterns including SSN (XXX-XX-XXXX), credit cards, phone numbers, and addresses. Each PII
type was replaced with corresponding tokens (SSN_TOKEN, CREDITCARD_TOKEN) while
preserving presence information. Special cases like partial redactions ("SSN: XXX-XX-1234") and
age-revealing date formats were handled, with PII density features calculating the ratio of PII
tokens to total tokens per document.</p>
      <p>For the Government FOIA dataset, classification markings (e.g., "CONFIDENTIAL//NOFORN")
were extracted as separate features before text removal. Redacted sections ([REDACTED] or 'X'
blocks) were replaced with REDACTION_TOKEN while counting redaction frequency and length.
Paragraph structure was preserved due to section-specific classification levels, with page numbers,
form numbers, and reference codes becoming metadata features.</p>
      <p>TF-IDF vectorization used dataset-specific parameters: Enron (1,000 features, 0.001-0.95
document frequency), AI4Privacy (500 features), and FOIA (750 features). Metadata-derived
features included sender-recipient patterns and time indicators for emails, PII co-occurrence
statistics for privacy data, and classification/redaction patterns for government documents.</p>
      <p>Data splitting employed stratified sampling to maintain class distribution across 80-20 train-test
splits with fixed random seeds for reproducibility. A validation set (10% of training data) was
created for baseline hyperparameter tuning, while the genetic algorithm used the full training set
with fixed parameters. For temporal datasets (Enron and FOIA), we verified that random splitting
avoided temporal leakage and ensured documents from the same conversation threads or
document families remained within the same split to prevent information leakage.</p>
      <p>The genetic algorithm used a population of 100 individuals with maximum 200 generations and
early stopping when fitness improvement over 20 consecutive generations fell below 0.001. The
fitness function combined F₁-score with a rule length penalty (λ=0.01) to balance accuracy and
interpretability. Tournament selection (size 3), single-point crossover (probability 0.7), and bit-flip
mutation (probability 0.05 per gene) maintained diversity while preserving good solutions.</p>
      <p>Five baseline methods were compared using standard parameters: Multinomial Naive Bayes
1
with Laplace smoothing (α =1.0), SVM with RBF kernel (C =1.0 , γ =
), Decision Tree and
nf e at ur e s
Random Forest (100 estimators) both with maximum depth 10, and Logistic Regression with L2
regularization (C = 1.0). All methods used identical preprocessed features and train-test splits.</p>
      <p>Evaluation used macro-averaged F1-score as the primary metric to handle class imbalance, with
precision and recall providing additional insight into false positive and false negative rates critical
for security applications. For interpretable models, rule complexity was measured as the number of
unique features in the final rule set. Training times were recorded on identical hardware (Apple M3
Max, 36 GB RAM) to assess computational requirements.</p>
      <p>Table 3 presents the F1-scores achieved by each method across all datasets, demonstrating the
genetic algorithm's consistent high performance across diverse document types and classification
challenges. The genetic algorithm achieved the highest average F1-score (0.877) across all datasets,
demonstrating robust performance in diverse document classification scenarios. While SVM
slightly outperformed GA on the AI4Privacy dataset (0.921 vs 0.913), this dataset's relatively simple
PII patterns favored SVM's ability to find optimal separating hyperplanes. GA showed superior
performance on datasets with more complex decision boundaries, particularly the Enron corpus
where subtle contextual patterns determine document sensitivity.</p>
      <p>The genetic algorithm's evolved rules demonstrated clear domain-specific patterns that align
with human understanding of document sensitivity. For the Enron Email dataset, the algorithm
identified a core set of required features including "confidential" "internal", "restricted", "employee",
and "salary" combined with forbidden features such as "public", "press", "announcement", and
"release". This rule effectively captures the intuitive notion that documents discussing internal
employee matters with confidentiality markers, but lacking public dissemination indicators, likely
contain sensitive information.</p>
      <p>On the AI4Privacy dataset, evolved rules centered on PII patterns, requiring presence of tokens
like "ssn", "credit_card", "address", "phone", and "email" while forbidding synthetic data indicators
such as "example", "test", "sample", and "demo". This demonstrates the algorithm's ability to
distinguish real PII from training examples or documentation, a critical capability for practical
deployment.</p>
      <p>Government FOIA document rules revealed hierarchical classification patterns, with required
features including official classification markings ("classified", "secret", "redacted", "official_use")
and forbidden features representing public release indicators ("unclassified", "public_release",
"approved"). The algorithm successfully learned the bureaucratic language patterns that distinguish
classified from publicly releasable government documents.</p>
      <p>The evolutionary process showed consistent convergence within 100 generations across all
datasets. Enron converged at generation 87 ( F1-score 0.872), AI4Privacy reached convergence
fastest at generation 62 (F1-score 0.913) due to simpler pattern structure, and Government FOIA
required 95 generations (F1-score 0.847) reflecting diverse terminology across agencies. Early
stopping prevented unnecessary computation and overfitting as fitness improvements plateaued
upon discovering optimal feature combinations, with consistent convergence behavior suggesting
robust algorithm design adapting naturally to different problem complexities.</p>
      <p>The genetic algorithm demonstrated notable robustness to class imbalance, a critical property
for security applications where sensitive documents typically comprise a minority class. At the
most balanced ratio of 1:1.3 in the AI4Privacy dataset, SVM slightly outperformed GA with an
F₁score of 0.921 versus 0.913. However, as imbalance increased to 1:3.2 and 1:4.5 in the Government
FOIA and Enron datasets respectively, GA showed consistent advantages of approximately 0.016 in
F₁-score over both SVM and Naive Bayes. This robustness stems from the fitness function's use of
F₁-score, which inherently balances precision and recall, combined with the evolutionary search's
ability to discover feature combinations that reliably identify minority class instances even when
positive examples are scarce.</p>
      <p>The interpretability comparison reveals fundamental differences between methods in terms of
human comprehension and practical deployment. The GA approach produces rules using between
15 and 35 features across different datasets, compared to decision trees requiring 45 to 89 features
to achieve lower accuracy. Black-box methods like SVM, Neural Networks, and Naive Bayes utilize
the entire feature space of over 1,000 features, making interpretation practically impossible without
additional explanation techniques. GA rules can be directly expressed in natural language that
security analysts understand, such as "Document is sensitive if it contains 'confidential' and
'internal' but not 'public' or 'press release'." This interpretability enables security teams to validate
rules against organizational policies, adjust them based on domain knowledge, and explain
decisions to stakeholders or during audits.</p>
      <p>The experimental results validate our hypothesis that evolutionary search can effectively
explore the vast space of feature combinations to discover accurate yet interpretable classification
rules. The genetic algorithm achieved the highest average F₁-score across diverse datasets while
producing rules an order of magnitude simpler than decision trees. This demonstrates that the
global search capability of evolutionary algorithms can identify compact feature sets that capture
essential patterns for sensitive document detection. The method's consistent performance across
datasets with varying characteristics indicates robust generalization. Unlike black-box methods
that may learn dataset-specific quirks, the evolved IF-THEN rules capture fundamental patterns
that transfer well across domains.</p>
      <p>The interpretability of evolved rules extends beyond mere feature counting. The rules express
logical relationships that align with human intuition about document sensitivity, combining
positive indicators with negative evidence in a natural way. This bi-directional reasoning mirrors
how human analysts approach document classification, checking both for presence of sensitive
markers and absence of public dissemination indicators. The compact rule sets can be directly
implemented in existing security infrastructure without specialized machine learning frameworks,
using simple pattern matching engines. Security analysts can inspect and understand the rules,
building trust in automated decisions and enabling manual overrides when organizational policies
change.</p>
      <p>The method's high precision reduces false positive rates that plague many automated security
systems, preventing alert fatigue among security teams. Meanwhile, competitive recall ensures
most sensitive documents are caught, with the interpretable rules helping analysts understand any
misclassifications and refine detection patterns. The evolutionary approach also supports
incremental improvement as new types of sensitive documents emerge. Rather than retraining
from scratch, the existing rule population can seed a new evolutionary run, allowing rapid
adaptation to evolving threats while preserving proven detection patterns.</p>
      <p>While we evaluated on diverse real-world datasets, highly specialized document types may
exhibit different characteristics. Technical documents with extensive code snippets or
mathematical formulas might require adapted preprocessing. However, the genetic algorithm's
flexibility to incorporate domain-specific features suggests it would adapt well to such scenarios.
The selection of F₁-score as the primary metric appropriately balances the competing demands of
precision and recall in security applications where both false positives and false negatives carry
significant costs.</p>
      <p>Our comprehensive experimental evaluation demonstrates that genetic algorithm-based
document classification successfully achieves its dual objectives of high accuracy and
interpretability. Across three diverse datasets representing different sensitive document detection
scenarios, the evolutionary approach discovered compact IF-THEN rules that achieved superior
average performance while using significantly fewer features than decision trees. The method
showed particular strength on challenging real-world datasets like Enron emails, where complex
contextual patterns determine sensitivity. Its robustness to class imbalance and ability to produce
human-understandable rules make it especially suitable for security applications where both
performance and explainability are critical. These results validate evolutionary search as an
effective approach for exploring the combinatorial space of features in document classification,
finding globally optimal solutions that balance multiple objectives. The success of this method
opens possibilities for applying evolutionary techniques to other security tasks requiring
interpretable models, such as intrusion detection, fraud identification, and regulatory compliance
monitoring.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Future Directions</title>
      <p>
        Existing data leak detection and prevention solutions exhibit several critical shortcomings
discussed in section 4. Attempts to integrate encryption, tagging, or behavioral analysis sometimes
lack holistic coordination, leading to visibility gaps that sophisticated adversaries readily exploit [
        <xref ref-type="bibr" rid="ref14 ref8">8,
14</xref>
        ]. Furthermore, although dynamic deception and multi-layered defense can reduce false
negatives, their efficacy depends heavily on precise calibration for each organization’s operational
context, which can be labor-intensive to maintain.
      </p>
      <p>Way to overcome these limitations is to take the most promising features of existing detection
techniques:

</p>
      <p>
        Comprehensive Policy Frameworks and Employee Training: Organizations should
establish clear data handling policies and conduct regular training sessions to ensure that
all employees understand the importance of data security. This practice forms the backbone
of any effective Data Loss Prevention (DLP) strategy by aligning technical measures with
organizational culture [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Robust Encryption and Access Controls: Implementing end-to-end encryption for data
at rest, in transit, and in use is critical. Coupled with strict access control mechanisms, these




technical safeguards help restrict unauthorized access and ensure that sensitive data
remains protected even if other layers of defense are breached [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Context-Aware Monitoring and Anomaly Detection: Modern DLP systems benefit
from integrating behavioral analytics and context-aware detection techniques. By
continuously comparing user actions against established baselines, these systems can
quickly identify deviations that may indicate insider threats or other anomalies. This
layered approach not only reduces false positives but also provides a more nuanced
understanding of data movement within the network [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Time Stamp and Content Tagging Strategies: Employing methods that incorporate
temporal metadata and content tagging can significantly enhance data classification. By
marking data with time-sensitive attributes and specific labels, organizations can automate
policy enforcement and restrict data exposure during critical periods. This is especially
useful in environments where the confidentiality status of data changes over time [
        <xref ref-type="bibr" rid="ref13 ref15">13, 15</xref>
        ].
Layered Defense with Network and Endpoint Integration: Best practices recommend
deploying a blend of network-wide monitoring alongside endpoint-specific controls. This
ensures that even if data is accessed or modified at a local level, broader network policies
and real-time monitoring can detect and mitigate unauthorized data flows [32, 33].
Employing local and network-wide hardware based security modules can significantly
speed up analysis and decrease operational costs [31].
      </p>
      <p>
        Dynamic Deception and Moving Target Defense: Approaches such as generating
deceptive data or dynamically altering the data attack surface add a proactive dimension to
DLP strategies. These methods complicate an attacker’s efforts by increasing uncertainty
and raising the cost of data exfiltration, thereby acting as an additional safeguard against
both internal and external threats [
        <xref ref-type="bibr" rid="ref11 ref17">11, 17</xref>
        ].
      </p>
      <p>And unite them as modules into single adaptive framework guided by a genetic algorithm. ach
detection module offers partial “scores” or indications of potential leakage. These outputs then
become the genetic algorithm’s raw material. The system starts with multiple candidate
configurations, each specifying how to weight or combine modules’ outputs under different
network conditions, data types, and time constraints. Based on feedback from both real-time and
historical data leak events, including false positives and missed detections, an evolutionary process
evaluates the fitness of each configuration, eliminating underperforming combinations and
promoting or mutating more successful ones.</p>
      <p>
        Over time, this iterative process hones in on configurations that maximize true positives and
minimize false alarms, continually rebalancing priorities among modules. Such an ensemble not
only becomes more robust against varied threats but also circumvents the need for constant
manual tuning, a known pain point in large-scale DLP deployments [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>Effective protection against data leakage demands a layered, adaptive strategy that integrates
complementary detection and prevention techniques. Experiments on three real-world datasets
confirm that the proposed genetic-algorithm classifier delivers the highest average F₁-score while
producing concise, human-readable rules. Because these rules explicitly combine textual cues and
metadata, analysts can readily verify decisions and refine policies without retraining opaque
models. The method’s robustness to class imbalance and its modest computational overhead make
it practical for large-scale corporate environments in which sensitive documents form only a small
fraction of overall traffic.</p>
      <p>Beyond a single classifier future work should be aimed at developing an ensemble architecture
in which the genetic algorithm continually adjusts the weight of diverse modules such as anomaly
detection, contextual integrity checks, time-aware sensitivity scoring and dynamic deception. By
treating each module’s output as an input feature and evolving optimal weightings, the ensemble
reduces false positives and adapts as new workflows, devices and cloud services emerge. This
evolutionary coordination also lessens the manual effort that traditionally accompanies rule
maintenance in complex distributed systems.</p>
      <p>Future research should also focus on extending the framework to additional data states,
including encrypted streams and multimedia content, and on tightening resistance to adversarial
transformations. Investigating hardware-level telemetry for corroborating evidence and integrating
privacy-preserving learning techniques will further enhance resilience. As corporate networks
grow more heterogeneous and regulations more stringent, the genetic-algorithm-guided ensemble
offers a promising foundation for DLP solutions that must remain accurate, transparent and agile.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <sec id="sec-10-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>2020 Modern Machine Learning Technologies and Data Science Workshop 2020, pp. 159-171.</p>
        <p>ISSN 1613-0073.
[29] The Enron Email Dataset. URL:
https://www.kaggle.com/datasets/wcukierski/enron-emaildataset.
[30] AI4Privacy PII-300k. URL: https://huggingface.co/datasets/ai4privacy/pii-masking-300k.
[31] Government FOIA. URL: https://www.foia.gov/foia-dataset-download.html.
[32] A. Sachenko, V. Kochan, V. Turchenko, Instrumentation for gathering data [DAQ systems],
IEEE Instrumentation &amp; Measurement Magazine, Vol. 6, 2003, pp. 34-40.
doi:10.1109/MIM.2003.1238339.
[33] V. Hamolia, V. Melnyk , P. Zhezhnych, A. Shilinh, INTRUSION DETECTION IN COMPUTER
NETWORKS USING LATENT SPACE REPRESENTATION AND MACHINE LEARNING.</p>
        <p>International Journal of Computing, Vol. 19(3), 2020, pp. 442-448. doi:10.47839/ijc.19.3.1893.
[34] O. Kehret, A. Walz, A. Sikora, INTEGRATION OF HARDWARE SECURITY MODULES INTO
A DEEPLY EMBEDDED TLS STACK, International Journal of Computing, Vol. 15(1), 2016, pp.
22-30. doi:10.47839/ijc.15.1.827</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>A Novel Mechanism for Fast Detection of Transformed Data Leakage</article-title>
          , IEEE Access, vol.
          <volume>6</volume>
          ,
          <issue>2018</issue>
          , pp.
          <fpage>35926</fpage>
          -
          <lpage>35936</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2018</year>
          .
          <volume>2851228</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>A Probability based Model for Data Leakage Detection using Bigraph</article-title>
          ,
          <source>in: Proceedings of the 2017 7th International Conference on Communication and Network Security (ICCNS '17)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:1-
          <fpage>5</fpage>
          .
          <fpage>10</fpage>
          .1145/3163058.3163060.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kush</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Review on Data Leakage Detection for Secure Communication</article-title>
          ,
          <source>NTERNATIONAL JOURNAL ENGINEERING AND APPLIED TECHNOLOGY (IJEAT)</source>
          , Vol.
          <volume>7</volume>
          ,
          <issue>2017</issue>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Calias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Caoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Padilla</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Tum-en,</article-title>
          <string-name>
            <surname>K. C. Bacilio</surname>
            , I. Lyn,
            <given-names>G.S.</given-names>
          </string-name>
          <string-name>
            <surname>Guaki</surname>
          </string-name>
          ,
          <article-title>The Impact of BYOD (Bring Your Own Device) On Network Security: A Literature Review</article-title>
          , in: Southeast
          <source>Asian Journal of Science and Technology</source>
          , Vol.
          <volume>9</volume>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fadolalkarim</surname>
          </string-name>
          , E. Bertino,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sallam</surname>
          </string-name>
          ,
          <article-title>An Anomaly Detection System for the Protection of Relational Database Systems against Data Leakage by Application Programs</article-title>
          , Purdue University,
          <source>in: Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE)</source>
          , Dallas, TX, USA,
          <year>2020</year>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>276</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDE48307.
          <year>2020</year>
          .
          <volume>00030</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shvartzshnaider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pavlinovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balashankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nissenbaum</surname>
          </string-name>
          , P. Mittal,
          <article-title>VACCINE: Using Contextual Integrity For Data Leakage Detection</article-title>
          ,
          <source>in: Proceedings of the The World Wide Web Conference (WWW '19)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1145/3308558.3313655.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manadhata</surname>
          </string-name>
          , R. Johnson,
          <source>Text Classification for Data Loss Prevention</source>
          ,
          <year>2011</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -22263-
          <issue>4</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Syarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Toleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Traykov</surname>
          </string-name>
          ,
          <article-title>Data Leakage Prevention and Detection in Digital Configurations: А Survey</article-title>
          ,
          <source>in: Proceedings of the 15th International Scientific and Practical Conference</source>
          , Vol.
          <volume>2</volume>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .17770/etr2024vol2.8045
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.K.</given-names>
            <surname>Periasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cindy</surname>
          </string-name>
          <string-name>
            <surname>Catherine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Elamathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subhiksha</surname>
          </string-name>
          ,
          <article-title>Data Leakage Vulnerability Assessment</article-title>
          ,
          <source>in: Proceedings of the 2023 Intelligent Computing and Control for Engineering and Business Systems</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/ICCEBS58601.
          <year>2023</year>
          .
          <volume>10448949</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , F. Liu,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <article-title>Enterprise data breach: causes, challenges, prevention, and future directions</article-title>
          ,
          <source>in: WIREs Data Mining Knowl Discov</source>
          ,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1002/widm.1211.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ma</surname>
          </string-name>
          , L.
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>What You See Is The Tip Of The Iceberg: A Novel Technique For Data Leakage Prevention</article-title>
          ,
          <source>in: Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design</source>
          ,
          <year>2024</year>
          . doi:0.1109/CSCWD61410.
          <year>2024</year>
          .
          <volume>10580487</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I. Herrera</given-names>
            <surname>Montano</surname>
          </string-name>
          , I. de la Torre Díez,
          <string-name>
            <given-names>J. J. García</given-names>
            <surname>Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Ramos</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Molina</given-names>
            <surname>Cardín</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Guerrero</surname>
          </string-name>
          <string-name>
            <surname>López</surname>
          </string-name>
          ,
          <article-title>Secure File Systems for the Development of a Data Leak Protection (DLP) Tool Against Internal Threats</article-title>
          ,
          <source>in: Proceedings of the 17th Iberian Conference on Information Systems and Technologies</source>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .23919/CISTI54924.
          <year>2022</year>
          .
          <volume>9820170</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Matthee</surname>
          </string-name>
          ,
          <article-title>Tagging Data to Prevent Data Leakage (Forming Content Repositories</article-title>
          ),
          <year>2016</year>
          . doi:
          <volume>99</volume>
          .9999/woot07-
          <fpage>S422</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <article-title>Survey on Data Leakage Prevention through Machine Learning Algorithms</article-title>
          ,
          <source>in: Proceedings of the 2022 International Mobile and Embedded Technology Conference</source>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/MECON53876.
          <year>2022</year>
          .9752047
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Peneti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Rani</surname>
          </string-name>
          ,
          <article-title>Data Leakage Prevention System with Time Stamp</article-title>
          ,
          <source>in: Proceedings of the International Conference on Information Communication and Embedded Systems</source>
          ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1109/ICICES.
          <year>2016</year>
          .7518934
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>O.</given-names>
            <surname>Savenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lysenko</surname>
          </string-name>
          , G. Markowsky,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Vasylkiv, BOTNET DETECTION APPROACH BASED ON THE DISTRIBUTED SYSTEMS</article-title>
          , in:
          <source>International Journal of Computing</source>
          , Vol.
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <year>2020</year>
          , pp.
          <fpage>190</fpage>
          -
          <lpage>198</lpage>
          . doi:
          <volume>10</volume>
          .47839/ijc.19.2.1761.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lysenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Savenko</surname>
          </string-name>
          , K. Bobrovnikova,
          <article-title>DDoS Botnet Detection Technique Based on the Use of the Semi-Supervised Fuzzy c-Means Clustering, CEUR-WS</article-title>
          , Vol.
          <volume>2104</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>688</fpage>
          -
          <lpage>695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lysenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
             
            <surname>Savenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
             
            <surname>Bobrovnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Kryshchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Savenko</surname>
          </string-name>
          .
          <article-title>Information technology for botnets detection based on their behaviour in the corporate area network</article-title>
          ,
          <source>Communications in Computer and Information Science</source>
          , Vol.
          <volume>718</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>166</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>O.</given-names>
            <surname>Pomorova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Savenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lysenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kryshchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bobrovnikova</surname>
          </string-name>
          .
          <article-title>A Technique for the Botnet Detection Based on DNS-Traffic Analysis</article-title>
          .
          <source>Communications in Computer and Information Science</source>
          . Vol.
          <volume>522</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>127</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Valleru</surname>
          </string-name>
          ,
          <article-title>COST-EFFECTIVE CLOUD DATA LOSS PREVENTION STRATEGIES FOR SMALL</article-title>
          AND
          <string-name>
            <surname>MEDIUM-SIZED</surname>
            <given-names>ENTERPRISES</given-names>
          </string-name>
          ,
          <source>International Research Journal of Engineering and Technology</source>
          , Vol.
          <volume>11</volume>
          ,
          <string-name>
            <surname>Iss</surname>
          </string-name>
          .
          <volume>05</volume>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , waRLOCK:
          <source>Countering Ransomware and Data Leak, in: Proceedings of the 2024 IEEE International Conference on Contemporary Computing and Communications</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/INC460750.
          <year>2024</year>
          .
          <volume>10649292</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kashtalian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lysenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Savenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nicheporuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sochor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Avsiyevych</surname>
          </string-name>
          ,
          <article-title>Multicomputer malware detection systems with metamorphic functionality</article-title>
          ,
          <source>Radioelectronic and Computer Systems</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>175</lpage>
          . doi:
          <volume>10</volume>
          .32620/reks.
          <year>2024</year>
          .
          <volume>1</volume>
          .13.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alojo</surname>
          </string-name>
          ,
          <article-title>Innovative Approaches in Data Management and Cybersecurity: Insights from Recent Studies</article-title>
          ,
          <source>World Journal of Advanced Research and Reviews</source>
          , Vol.
          <volume>23</volume>
          ,
          <string-name>
            <surname>Iss</surname>
          </string-name>
          .
          <volume>03</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>2410</fpage>
          -
          <lpage>2425</lpage>
          . doi:
          <volume>10</volume>
          .30574/wjarr.
          <year>2024</year>
          .
          <volume>23</volume>
          .3.2897.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>M. M. R. Mazumder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Phillips, PARTITIONING KNOWN ENVIRONMENTS FOR MULTIROBOT TASK ALLOCATION USING GENETIC ALGORITHMS</article-title>
          ,
          <source>International Journal of Computing</source>
          , Vol.
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <year>2020</year>
          , pp.
          <fpage>480</fpage>
          -
          <lpage>490</lpage>
          . doi:
          <volume>10</volume>
          .47839/ijc.19.3.1897
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bykovyy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kochan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          , G. Markowsky,
          <article-title>Genetic Algorithm Implementation for Perimeter Security Systems CAD</article-title>
          , in
          <source>: Proceedings of the 2007 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications</source>
          , Dortmund, Germany,
          <year>2007</year>
          , pp.
          <fpage>634</fpage>
          -
          <lpage>638</lpage>
          , doi: 10.1109/IDAACS.
          <year>2007</year>
          .
          <volume>4488498</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Obadan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A MULTI-AGENT APPROACH TO POMDPS USING OFF-POLICY REINFORCEMENT LEARNING AND GENETIC ALGORITHMS</article-title>
          .
          <source>International Journal of Computing</source>
          , Vol.
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <year>2020</year>
          , pp.
          <fpage>377</fpage>
          -
          <lpage>386</lpage>
          . doi:
          <volume>10</volume>
          .47839/ijc.19.3.1887
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bykovyy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pigovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kochan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          , G. Markowsky, S. Aksoy,
          <article-title>Genetic algorithm implementation for distributed security systems optimization</article-title>
          ,
          <source>in: Proceedings of the 2008 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>120</fpage>
          -
          <lpage>124</lpage>
          , doi:10.1109/CIMSA.
          <year>2008</year>
          .
          <volume>4595845</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lynnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matseliukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Burov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Demkiv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaverbnyj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sachenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shylinska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Yevseyeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bihun</surname>
          </string-name>
          ,
          <source>DDOS Attacks Analysis Based on Machine Learning in Challenges of Global Changes, in: CEUR Workshop Proceedings (CEUR-WS.org) MoMLeT+DS</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>