<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Monitoring Ransomware with Berkeley Packet Filter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Danyil Zhuravchak</string-name>
          <email>danyil.y.zhuravchak@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Tolkachova</string-name>
          <email>anastasiia.tolkachova.mkbst.2022@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrian Piskozub</string-name>
          <email>azpiskozub@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerii Dudykevych</string-name>
          <email>valerii.b.dudykevych@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliia Korshun</string-name>
          <email>n.korshun@kubg.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Borys Grinchenko Kyiv University</institution>
          ,
          <addr-line>18/2 Bulvarno-Kudriavska str., Kyiv, 04053</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 Stepan Bandera str., Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>95</fpage>
      <lpage>106</lpage>
      <abstract>
        <p>The article delves comprehensively into employing the extended Berkeley Packet Filter (eBPF) for monitoring network traffic, filtering system calls, and overseeing processes for ransomware activity. The principles and architecture underlying this advanced technology are explored, laying a solid foundation for developing robust mechanisms for detecting and halting malware propagation across networks. The paper highlights potential strategies for tracking viruses within traffic and evaluates this approach, meticulously considering the security concerns and control mechanisms endowed by eBPF. A notable section of the article is dedicated to a comparative analysis. Traditional malware detection mechanisms are assessed alongside a program built on eBPF, offering a clear, unbiased insight into their respective efficiencies and potential pitfalls. This extensive comparison underscores the enhanced proficiency and security offered by eBPF-based monitoring mechanisms, solidifying their stance as a formidable tool against malware threats, including ransomware. The authors demonstrate the capability of an eBPF-based monitoring system in delivering potent network defense against various malware forms, including ransomware, presenting significant implications for antivirus protection developers. This comprehensive exploration and presented findings are pivotal for enhancing the overall security quotient of computer networks globally, emphasizing the critical role of eBPF in contemporary network security paradigms. The superior efficiency and security assurance offered by BPF reinforces its viability as a pivotal technology for monitoring network traffic and safeguarding against pervasive malware threats.</p>
      </abstract>
      <kwd-group>
        <kwd>1 eBPF</kwd>
        <kwd>monitoring</kwd>
        <kwd>cybersecurity</kwd>
        <kwd>vulnerabilities</kwd>
        <kwd>malware</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Problem Statement</title>
      <p>In the modern era, as technology progressively
impacts people’s lives, computer network
security has become a crucial concern. The
increasing prevalence of potential hazards,
such as viruses, trojans, spyware, and various
types of attacks, calls for the innovation of
novel and effective approaches for identifying
and monitoring such risks [1–2].</p>
      <p>Conventional antivirus solutions that rely
on virus signatures have become inadequate
due to the swift evolution of new threats and
the substantial volume of network traffic,
necessitating alternative strategies.</p>
      <p>The world of cybercrime is developing
rapidly, partly fueled by the ongoing conflict in
Ukraine, which has led to the convergence of
cybercriminal groups from Russia and its
neighboring countries. Changes within
ransomware and other cybercrimes indicate
shifting priorities. Attacks on Ukraine were
constant before and during the invasion and
persist to this day [3–5].</p>
      <p>A potential consequence of the ongoing war
may involve a shift in the objectives of
cybercriminals from Russia and neighboring
countries in two ways. Firstly, it is speculated
that some of these criminals may have
transitioned from profit-driven cybercrimes,
such as ransomware attacks, to active
participation in military actions. Nonetheless,
ransomware attacks persist in Ukraine even
amidst the conflict. Additionally, active
Russian cybercriminals are broadening their
horizons by targeting the Global South,
focusing on countries in Asia and Latin
America while steering clear of critical
infrastructure and vulnerabilities in NATO
member states. This change in focus could be
motivated by a desire to avoid incidents that
might escalate tensions between Russia and
NATO members. The long-term cybersecurity
ramifications of these infiltrations remain
uncertain [6–7].</p>
      <p>Unaddressed concerns involve the conflict’s
impact on safe spaces for cyber criminals and
the future trajectory of the cybercrime
ecosystem amid the Ukraine-Russia standoff.
Furthermore, there is a need for increased
research to understand emerging ransomware
trends in connection with the conflict.</p>
      <p>One potential security solution involves the
use of Berkeley Packet Filter (BPF), a
technology that facilitates high-performance
data packet filtering within networks. This
article endeavors to explore the fundamental
principles of BPF, its capabilities, and its
application for real-time virus detection and
monitoring in computer networks [8].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Analysis of Recent Research and Publications</title>
      <p>Research on ransomware detection and
counteraction methods includes both
traditional signature and behavior-based
methods and new approaches used for
program analysis, Security Information Event
Management Systems (SIEM), and network
traffic adjustment.</p>
      <p>Based on the literature review, the
following successes and failures of existing
methods can be identified:
1. “Fast Packet Processing with eBPF and
XDP: Concepts, Code, Challenges, and
Applications” focuses on eBPF and XDP
technologies that accelerate packet
processing in network systems. The
authors are Marcos A. M. Vieira,</p>
      <sec id="sec-2-1">
        <title>Matheus S. Castanho, Racyus D. G.</title>
        <p>Pacífico, Elerson R. S. Santos, Eduardo
P. M. Câmara Júnior, and Luiz F. M.
Vieira—consider the key concepts,
code, problems, and possible uses of
these technologies in various fields [9].
2. The article “Creating Complex Network
Services with eBPF: Experience and
Lessons Learned” highlights the
authors” experience in creating
complex network services using eBPF
(extended Berkeley Packet Filter)
technology [10].
3. “Combining System Visibility and
Security Using eBPF” by Luca Deri,
Samuele Sabella, and Simone Mainardi
focuses on the use of eBPF (extended
Berkeley Packet Filter) technology to
increase system visibility and security.
eBPF is a powerful tool for monitoring,
analyzing, and manipulating network
packets at the operating system kernel
level [11].</p>
        <p>Improving methods of detecting and
countering ransomware in real-time is an
important issue in the field of cybersecurity.
The use of eBPF can provide significant
benefits and help to overcome certain
shortcomings in the research on this issue.
Considering the above-mentioned articles, the
following research advantages can be
identified in the field of eBPF:
• High processing speed: eBPF allows for
much faster processing of network
traffic and full real-time activity tracking
than more traditional user-space-based
analogs.
• More accurate attack detection: eBPF
allows for the development of flexible
and adaptive detection systems that can
analyze many more network
parameters, which helps to detect
pathogenic activity more accurately in
the early stages of an attack.
• Flexibility: eBPF allows you to integrate
ransomware detection and prevention
directly into the operating system
kernel, enabling deeper analysis of
network traffic and rapid application to
the latest types of attacks.
• Automatic security provisioning: eBPF
allows you to automate the detection
and counteraction process based on the
solutions found in real-time.</p>
        <p>Disadvantages:
• Development complexity: Utilizing eBPF
to develop ransomware detection and
countermeasures can be a complex
process that requires in-depth
knowledge of eBPF and network security.
Close collaboration and knowledge
sharing between cybersecurity
development teams is necessary to
ensure successful implementation.
• Hardware limitations: Effective
implementation of eBPF may depend on
the availability of modern hardware,
including smart network adapters,
which may still be expensive or difficult
to acquire and deploy.
• Lack of research in specialized and
specific contexts: In the context of
realtime ransomware detection and
countermeasures, the increased
adoption of eBPF is a relatively new area
of research, which may require even
more research work to implement and
evaluate its effectiveness in different
contexts and environments.</p>
        <p>Based on these advantages and
disadvantages, it can be concluded that eBPF
has potentially significant application potential
in detecting and countering ransomware in
real-time. However, to obtain the best results
for a variety of scenarios and environments, a
concerted effort is required from cybersecurity
researchers to develop and research effective
eBPF-based techniques and solutions [12–13].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>security are static analysis and dynamic
analysis.</p>
      <p>Static analysis refers to examining virus
program code without execution, which
involves analyzing hashes, and strings, or
employing machine learning for malicious
code classification. However, this approach
may be less effective against viruses using code
obfuscation techniques. Static analysis
determines file characteristics, such as file type
and specific lines in the file. Antivirus
researchers gather multiple malware family
variants, identify common static features, and
create signatures. Signatures may contain
hashes of certain file areas, properties, sizes,
etc. As strains often exhibit static variation,
antivirus products must update their
signatures frequently.</p>
      <p>Dynamic analysis, a method that observes
virus behavior by executing them in controlled
environments like sandboxes, can detect
viruses employing code obfuscation. However,
it is more resource-intensive and
timeconsuming compared to static analysis. Also
known as behavioral analysis, dynamic analysis
reveals the actions of malicious code or the
system changes when executing such code.</p>
      <p>While each method has pros and cons and lacks
100% ransomware protection, eBPF technology
was chosen to address detection and combat
issues. By tracking system calls at the OS kernel
level, eBPF provides profound insights into
process activities within the system [14].</p>
      <p>This table provides a comparison of the
advantages and disadvantages of Static
Analysis and Dynamic Analysis, two commonly
used approaches in analyzing software for
vulnerabilities and malicious behavior. This
information can help make a more informed
decision on which method to use when
analyzing unknown programs.</p>
      <p>Traditional models and methods of detecting
and counteracting ransomware in computer
Table 1
A comparison of advantages and disadvantages of Static Analysis and Dynamic Analysis
Type of
Analysis
Static
Analysis
1. Speed: Can be performed quickly, and doesn't require 1. Obfuscation and polymorphism
virus execution. 2. Safety: Doesn't pose risks since the issues. 2. Lack of context: Doesn't
program doesn't run. 3. Can analyze code independently provide information on how the
of its execution environment. 4. Early detection of program will behave during
potentially harmful code. execution.
1. Detailed analysis: Gather more information about the 1. Time-consuming. 2. Potential
program. 2. Effectiveness against code obfuscation. 3. Can risk: Although conducted in a
analyze programs in real-world conditions, considering controlled environment, there's a
specific details of the execution environment 4. Ability to risk the virus may escape it. 3. High
track interaction between programs and runtime technical knowledge is required to
processes. interpret analysis results.
The Berkeley Packet Filter (BPF) is a
subsystem within the Linux kernel that
enables users to execute their custom code on
a virtual machine running inside the kernel.</p>
      <p>This technology can be categorized into
classical BPF (cBPF) and extended BPF (eBPF).</p>
      <p>Classical BPF primarily focuses on inspecting
and analyzing network packets, while the more
advanced eBPF extends its capabilities beyond
merely observing packet information. The
evolution of eBPF has significantly expanded
its potential, allowing users to modify packets,
alter system call arguments, and even modify
user space programs. This has transformed
eBPF into a powerful and versatile tool used
for various purposes, ranging from networking
to system profiling, tracing, and security
measures. Over time, enthusiasts within the
Linux community have worked on enhancing
BPF's functionality, propelling it toward the
current eBPF incarnation. One of the
improvements in eBPF is the shift from 32-bit
registers to 64-bit registers, accommodating a
broader spectrum of use cases and offering
better performance. Additionally, eBPF
programs can be attached to distinct kernel
events, not only those associated with
receiving packets. This feature enables
extensive customization and monitoring
capabilities within the Linux kernel.</p>
      <p>Furthermore, eBPF offers improved
accessibility from user space, allowing users to
insert custom actions without overloading or
destabilizing the operating system. By
providing a safe and efficient way for
userdefined programs to interact with the Linux
kernel, eBPF has become a crucial component
for Linux-based systems. Its flexible nature and
extensibility make it an invaluable resource for
developers and system administrators seeking
high-performance, low-level system
interaction and customization [15].
Moreover, the figure illustrates a program
functioning within the user space, which
integrates an eBPF program to attain
processlevel visibility in the Linux kernel. The eBPF
program is composed in Python or Golang, and
a compiler that is capable of processing eBPF
bytecode supports it. After loading this eBPF
program into the Linux kernel, the eBPF
Verification Engine immediately checks its
validity. Furthermore, as mentioned earlier,
this verification process is crucial in
preventing possible errors. The program is
subsequently compiled and connected to the
appropriate kernel event. However, whenever
the syscall event occurs, the program engages
in the process, performs its monitoring and
analysis tasks until completed, and then
returns the findings to the user program within
the user space. Additionally, having gained a
general overview of the use case and
architecture, we can now investigate eBPF’s
role in security monitoring more thoroughly.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Security Monitoring and</title>
    </sec>
    <sec id="sec-5">
      <title>Observability Metrics</title>
      <p>Implementing system call filtering with
eBPF. This mechanism is commonly employed
to safeguard the OS kernel from untrustworthy
programs. However, current methods are
either costly or lack the programmability
needed to expand security policies. The Linux
filtering module is extensively utilized in
containers, mobile applications, and system
administration.</p>
      <p>
        Contemporary systems communicate with
the OS kernel through system calls. Limiting
these calls helps diminish the attack surface.
Linux Seccomp operates within the OS kernel,
offering performance and robustness.
Nevertheless, cBPF has restricted
programmability and does not supply a state
storage mechanism. This paper presents a
programmable system called the filtering
method using eBPF, aiming to develop
advanced security policies without
jeopardizing OS performance and security.
eBPF was selected due to its practicality.
Seccomp has recently incorporated support for
a custom agent, the Notifier, which functions
alongside cBPF filters [
        <xref ref-type="bibr" rid="ref4">16</xref>
        ]. This solution
operates similarly to the system call
interception frameworks, delegating decisions
to a trusted user agent. Seccomp intercepts a
system call, halts the calling task, and conveys
the call context (e.g., PID, system caller ID, and
arguments) to the agent [
        <xref ref-type="bibr" rid="ref5">17</xref>
        ]. The primary
drawback of the Seccomp Notifier is the
substantial expense of context switches when
transitioning between user space and the
kernel. The first paragraph in every section
does not have a first-line indent. Use only styles
embedded in the document [18].
      </p>
      <sec id="sec-5-1">
        <title>Examining network traffic with eBPF.</title>
        <p>This paper discusses a DDoS defense scenario in
which all inbound malicious traffic is blocked.
The authors employ eBPF/XDP to extract
features from the incoming traffic and analyze
the information in the user space using heuristic
algorithms, which are less precise than neural
networks. XDP is a form of BPF program that
operates at the initial phase of network packet
processing, enabling the gathering of crucial
data. To designate a BPF program as an XDP
program, users must specify the
BPF_PROG_TYPE_XDP flag while loading the
program into the kernel [19]. Additionally, XDP
programs allow for specific operations to be
performed on network packets. Once the
calculations are finished, the results (malicious
IP addresses) are fed into the eBPF programs,
which block all traffic from these sources. In
terms of observability solely within a
cloudbased microservices environment, the
ViperProbe framework was proposed. This tool
was developed to improve both network and
system monitoring using eBPF. Lastly, it’s worth
noting the expanding Cilium platform,
opensource software designed to seamlessly provide
network connectivity between applications and
services deployed with Linux container
management platforms such as Docker and</p>
        <p>Kubernetes. At the core of Cilium lies eBPF
technology, which enables powerful security
logic controls and management to be
dynamically integrated into the Linux system.
As BPF operates within the Linux kernel, Cilium
security policies can be applied and updated
without any modifications to the application
code or container configuration [20].</p>
      </sec>
      <sec id="sec-5-2">
        <title>Utilizing eBPF for process monitoring.</title>
        <p>Process monitoring serves as a fundamental
component of runtime security. Essentially, it
can detect unexpected processes or execution
patterns that should not occur in a production
environment. For instance, a web server in a
production setting should never initiate a shell,
and a package manager being used to install
new dependencies on a host might raise
concerns. To provide a real process tree for
each process, the user space process cache is
employed. A true process tree refers to the
lineage of all processes leading to the process
that triggered the alert, regardless of the
parent processes’ statuses.</p>
        <p>This capability is absent from many
conventional runtime security tools:
examining the proc file system reveals that
when a process terminates, its children
immediately join the process with the
identifier. This results in the kernel losing the
process pedigree context, which could be
essential in identifying the host service being
used [22].</p>
        <p>Another intriguing advantage of delving
deeper into the kernel beyond the system call
level is the ability to access information that is
typically unavailable in user space. For
example, the layer of a file in the overlay file
system. This information carries significant
security implications, as it can determine
whether the executed file was part of the
container’s base image or if it has been
modified (or created) from the base image’s
original version.</p>
        <p>Additionally, process credentials can be
collected and supplemented with other events,
enabling the gathering of a full set of user and
group IDs, kernel capabilities, and executable
file metadata.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Utilizing eBPF for tracking performance</title>
        <p>metrics. Performance metrics serve as
essential indicators for evaluating a computer
system or application’s performance. They
provide insights into resource usage, including
CPU time, memory, network bandwidth, and
input/output (I/O) performance.</p>
        <p>Ransomware is malicious software that
encrypts user data and demands payment for
decryption. It can impact various performance
metrics:
• CPU utilization: Ransomware’s encryption
process can heavily utilize the CPU,
resulting in increased CPU load.
• I/O activity: Encrypting and decrypting
numerous files can cause a substantial
increase in I/O activity, especially when
dealing with large files.
• Memory usage: Certain ransomware can
consume a significant amount of RAM,
subsequently affecting the overall system
performance.</p>
        <p>Considering the potential impact of
ransomware on performance, detecting
unusual changes in these metrics can serve as
a warning sign for malware presence within
the system. eBPF, with its monitoring
capabilities, can effectively track such changes
and identify ransomware activity [22].</p>
      </sec>
      <sec id="sec-5-4">
        <title>Acquiring kernel data using eBPF. Over</title>
        <p>time, various methods have been developed to
access data from the OS kernel. BPF has
evolved into a versatile tool for addressing
diverse challenges, including extracting kernel
information. Two distinct approaches employ
BPF to transfer data from the kernel to the user
space using different techniques [23].</p>
        <p>Tools such as “ps” are used to retrieve
information by opening /dev/kmem and
operating in the kernel memory space. This
approach did not require direct kernel
support, which was advantageous, but it also
had drawbacks like security concerns and
occasional retrieval of random data. Initially,
this method was acceptable, but modern users
sought newer approaches.</p>
        <p>Focusing on the case of virtual files,
structural dumpers emerged as a direct
approach. Essentially, it enables the
attachment of BPF programs to implement
/proc-style files for any supported data
structure. This creates a new virtual file
system, expected to be mounted in
/sys/kernel/bpfdump. For instance, to create a
new process dumper named “myps”, one can
upload the BPF program generating the
required task structure output and then “pin”
it to a file named myps in the
/sys/kernel/bpfdump/ directory.</p>
        <p>If additional information is necessary, it can
be acquired without modifying the kernel.
Although this requires some customization
(each structure type needing accessibility in
this manner requires a specific helper code to
enumerate the active structures and pass them
to the relevant BPF program), it is a one-time
endeavor for each type. Thereafter, kernel
developers need not worry about exporting
information from that structure type to the
user space again, at least in theory.</p>
        <p>Considering the previously discussed
information, a comprehensive approach to
detecting and mitigating ransomware threats
can be developed by leveraging the capabilities
of eBPF.</p>
        <p>Firstly, eBPF can be used for process
monitoring, detecting unexpected processes,
and execution patterns that may indicate the
presence of ransomware in a system. This
contributes to the early warning of potential
threats and aids in maintaining system
security.</p>
        <p>Secondly, eBPF enables the tracking of
performance metrics such as CPU utilization,
I/O activity, and memory usage, which are
often impacted by ransomware attacks.
Identifying anomalies in these metrics can
serve as an additional indicator of ransomware
activity [24].</p>
        <p>Finally, eBPF allows the extraction of
relevant kernel data, which can be utilized in
the development of advanced security policies.
Together with monitoring and performance
metric tracking, this kernel-level access
enhances overall threat detection capabilities.</p>
        <p>In conclusion, using eBPF as an integrated
tool for process monitoring, performance
metric tracking, and kernel data access
provides a powerful and comprehensive
approach to detecting and mitigating
ransomware threats effectively in modern
computing environments.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Lab Environment</title>
      <p>The chief objective of this experimental
framework is to construct a segregated space,
robustly guarded against malware
propagation or unauthorized data transfer by
employing a Zero Trust security model. The
strategic layout employed for this research
project hinges on a dual-layered, isolated
virtual environment, illustrated in Fig. 2.</p>
      <p>The utilization of the KVM Hypervisor
spearheads the entire virtualization process,
paired seamlessly with the libvirt API for adept
communication and management of the virtual
machines on the host [25].</p>
      <p>The SOC Operator establishes network
connections through a virtual network,
typically anchored by a virtual network switch.
Two operational modes of this switch play
pivotal roles in this setup:
1. NAT Mode: This default operational
mode provides direct connectivity
among all guests and the virtualization
host. External network access is granted
through network address translation,
subject to the host system’s firewall
constraints. Despite its comprehensive
connectivity, its application is restricted
to the preliminary setup phase due to
security limitations. This phase
encompasses software installation and
ransomware sample downloads.
2. Isolated Mode: In this secure mode,
guest virtual machines can interact with
each other and the virtualization host.
However, traffic remains confined
within the boundaries of the
virtualization host, preventing any
external communication. This mode is
activated during ransomware
experimentation to negate any potential
risk of unauthorized public propagation.</p>
      <p>The dual-layered virtualized setup bolsters
the research by exploiting the snapshot
capability. This feature enables the capturing
of precise snapshots of the primary
virtualization layer at various experimental
stages, ensuring the availability of a pristine
start point for each subsequent testing round.</p>
      <p>In this secure framework, the first layer of
virtualization unfolds by provisioning an
Ubuntu 22.04 Virtual Machine, dubbed the
Sandbox Host VM. This VM, fortified with a
patched system and a stringent firewall,
permits only SSH and VNC access from a secure
LAN. The second layer of virtualization
burgeons within this VM, manifesting as the
isolated zone dedicated to ransomware
experimentation.</p>
      <p>This meticulously designed, dual-layered
virtualized setup stands as a beacon for secure
and effective ransomware experimentation,
ensuring robust isolation and prevention of
unintended malware spread and data
exfiltration.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Detection Method</title>
      <p>Detecting ransomware is a process grounded
in the thorough analysis of the distinct
behavioral attributes and patterns that
ransomware typically exhibits. A detailed
examination of various characteristics such as
file encryption patterns, interactions with
command-and-control servers, and unusual
process behaviors provides substantial
insights, making it feasible to pinpoint
potential ransomware attacks. Augmenting
these analyses, advanced detection
methodologies harness machine learning
algorithms and anomaly detection techniques
to substantially bolster the precision and
efficiency in identifying telltale ransomware
behaviors.</p>
      <p>There’s a dichotomy in ransomware
detection techniques: network-based and
host-based detection. Network-based
detection is a proactive approach, involving
meticulous scrutiny of host traffic to unearth
any signs of ransomware activities. Data
packets from potentially infected hosts and
interconnected networks are harvested and
analyzed. Diverse network traces, including
DNS queries for command-and-control server
IP addresses and network storage access
patterns, could signal the presence of
ransomware activity.</p>
      <p>Host-based detection, on the other hand,
emphasizes the internal activities within the
local system. It includes a comprehensive
examination of both static and dynamic
actions, encompassing file operations, memory
activities, API function calls, and more,
presenting a multi-faceted approach to
detecting potential ransomware infiltration. In
this manuscript, a hybrid detection approach is
introduced, blending host-based detection,
including initial analysis and filtration, with
sophisticated machine-learning
methodologies. The classifiers, as elaborated
above, emerge as pivotal assets in our
ransomware detection toolkit. Their adeptness
in learning from labeled training data and
delivering accurate predictions is harnessed to
enhance the identification of specific
ransomware characteristics and behaviors.
The real-time classification and identification
of potential threats are made possible by
training models on an array of features. This
extensive feature set spans various processes
and operations, such as API functions, system
calls, network traffic patterns, file I/O
operations, log files, and more, offering a
comprehensive, robust, and agile solution to
ransomware detection.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Machine Learning: Evaluation of the Algorithms and Models for the Use Case</title>
      <p>In the quest to detect ransomware activities
effectively using data collected from eBPF
programs, various machine learning
algorithms were considered. Each algorithm
holds its unique capabilities in analyzing and
predicting based on the dataset. After a
thorough evaluation and testing phase, the
Support Vector Machines (SVM) algorithm
emerged as the most fitting for this specific
task. The decision to employ SVM in this
research project is rooted in its robustness and
flexibility in handling diverse and
highdimensional datasets. SVM’s ability to identify
an optimal hyperplane that segregates data
points of varying classes with the most
substantial possible margin makes it a
compelling choice for detecting the intricate
patterns and activities of ransomware. Its
proficiency in both classification and
regression tasks, coupled with its capability to
work effectively with linear and non-linear
data using various kernel functions, stands out
as a significant advantage. This choice is
aligned to achieve high accuracy and reliability
in real-time ransomware detection, ensuring
the security and integrity of computer systems.</p>
      <sec id="sec-8-1">
        <title>Anomaly detection Isolation Forrest Efficient for the high-dimensional datasets. Specially designed for anomaly detection.</title>
      </sec>
      <sec id="sec-8-2">
        <title>Clustering Algorithms Support Vector Machines (SVM)</title>
      </sec>
      <sec id="sec-8-3">
        <title>Decisions Trees</title>
      </sec>
      <sec id="sec-8-4">
        <title>One-Class SVM</title>
      </sec>
      <sec id="sec-8-5">
        <title>Local Outlier Factor (LOF)</title>
      </sec>
      <sec id="sec-8-6">
        <title>K-Means Clustering</title>
      </sec>
      <sec id="sec-8-7">
        <title>DBSCAN</title>
        <sec id="sec-8-7-1">
          <title>Description</title>
          <p>Handles large data sets with higher dimensionality. Can
model non-linear decision boundaries.</p>
          <p>Effective in high-dimensional spaces. Suitable for binary
classification tasks.</p>
          <p>Easy to understand and visualize. Can handle both
numerical and categorical data.</p>
          <p>Suitable for detecting outliers in high-dimensional
datasets.</p>
          <p>Measures the local density deviation of a data point
concerning its neighbors.</p>
          <p>Partitions the dataset into K clusters. Can be used to
identify unusual patterns.</p>
          <p>Does not require the number of clusters to be specified.
Can find arbitrarily shaped clusters.</p>
        </sec>
      </sec>
      <sec id="sec-8-8">
        <title>Time Series Analysis Recurrent Neural</title>
        <p>Networks (RNN)</p>
      </sec>
      <sec id="sec-8-9">
        <title>Autoencoders</title>
      </sec>
      <sec id="sec-8-10">
        <title>Long ShortTerm Memory (LSTM) Suitable for sequential data, such as system call sequences.</title>
        <p>Can be used for anomaly detection by reconstructing input
data.</p>
      </sec>
      <sec id="sec-8-11">
        <title>Effective for time-series data.</title>
        <p>Supervised Machine Learning stands as a
cornerstone method for deciphering
inputoutput relationship data across diverse fields. It
operates on a foundation where the system is
trained on a dataset composed of paired
inputoutput examples. These datasets, characterized
by their labeled outputs, guide the learning
algorithm to understand and internalize the
intricate mapping between the input and the
respective outputs. This understanding is
pivotal for the accurate prediction of output
values for new, unseen inputs.</p>
        <p>When the output is categorized by discrete
values, denoting different classes, supervised
learning maneuvers towards classification
tasks. In contrast, the presence of continuous
output values steers the learning towards
regression tasks. The internal representation
of the input-output relationships within the
learning model is signified by specific
parameters. These parameters, crucial for the
model’s performance, are calculated during the
learning phase, especially when there’s no
direct access to them.</p>
        <p>The landscape of supervised learning is rich
with diverse algorithms, each with its unique
strengths. Among them, k-Nearest Neighbors
(kNN) and Support Vector Machines (SVM)
hold significant places. The kNN algorithm
operates on the principle of proximity. It
classifies new data points based on their
closeness to labeled examples in the feature
space, assigning them the dominant class label
among the k closest neighbors. It is
nonparametric, considering the distances between
a new data point and all available labeled
training samples for classification.</p>
        <p>SVM, on the other hand, is renowned for its
robustness in both classification and
regression tasks. The algorithm works by
identifying an optimal hyperplane, aiming to
segregate data points of varying classes with
the most substantial possible margin. It
achieves this by transforming the data into a
higher-dimensional feature space and
meticulously constructing a decision boundary
that amplifies the margin between distinct
classes. Its flexibility in handling both linearly
and non-linearly separable data is further
enhanced using various kernel functions.</p>
        <p>In the context of the present research
project, the focus has been mainly on testing
the performance of SVM, utilizing both Linear
and Radial Basis Function (RBF) kernels. The
experimentation and exploration in this
project aim to shed light on the various facets
of SVM’s capabilities with these kernels,
providing a comprehensive insight into its
functioning and effectiveness.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>8. Machine Learning Pipeline</title>
      <p>In the contemporary digital landscape, the
proliferation of ransomware poses a significant
threat to the security and integrity of computer
systems worldwide. Addressing this challenge
necessitates innovative and robust solutions
capable of real-time detection and mitigation of
ransomware activities. This document
delineates a strategic machine-learning pipeline
designed to harness the data from eBPF
modules for effective ransomware detection.</p>
      <p>The pipeline is meticulously crafted to
ensure each phase contributes to enhancing the
accuracy and reliability of ransomware
detection, thereby bolstering the security
framework. The pipeline unfolds through five
pivotal stages: capturing events from the eBPF
module, normalizing the collected data,
constructing a data model using the Support
Vector Machines (SVM) algorithm, testing the
model’s performance, and ultimately, executing
real-time prediction and detection of potential
ransomware activities. Each stage plays a
crucial role in refining the data and the model,
ensuring the delivery of a highly efficient and
reliable ransomware detection system. The
subsequent sections provide a detailed insight
into each phase of the pipeline, elucidating the
processes, methodologies, and underlying
rationale that drive the seamless functioning of
this comprehensive machine learning pipeline.
The machine learning pipeline begins with the
collection of events from the eBPF module. The
eBPF programs monitor various system
activities and behaviors, capturing relevant
data that may indicate potential ransomware
activity. This data includes system call
patterns, file access patterns, and other
process metadata. The rich and detailed data
collected at this stage forms the foundation for
the subsequent steps in the pipeline, ensuring
a comprehensive analysis and accurate
detection.</p>
      <p>After collecting the events, the next step is
data normalization. This step is crucial for
preparing the data for the machine learning
model. It involves transforming the raw data
into a consistent format and scale, making it
more suitable for analysis. Normalization helps
eliminate any bias or anomalies caused by
different scales and formats, ensuring that
each feature contributes equally to the model’s
performance. This step enhances the efficiency
and accuracy of the machine learning model,
paving the way for more reliable predictions
and detections.</p>
      <p>With the normalized data in place, the next
step is to feed this data into the
machinelearning model. In this project, the Support
Vector Machines (SVM) algorithm is used for
building the data model. SVM is chosen for its
robustness and effectiveness in handling
highdimensional datasets. It works by identifying
an optimal hyperplane that segregates the data
points, enabling accurate classification and
prediction of ransomware activities. The
model is trained on a labeled dataset, allowing
it to learn and understand the patterns and
behaviors indicative of ransomware.</p>
      <p>After training the model, it is essential to
test its performance to ensure its reliability
and accuracy. Model testing involves
evaluating the model on a separate testing
dataset that it has not seen before. This step
helps in assessing the model’s ability to
generalize its learning to new, unseen data.
Various metrics such as accuracy, precision,
recall, and F1-score are used to measure the
model’s performance. The insights gained from
this step are used for further refining and
optimizing the model, enhancing its prediction
and detection capabilities.</p>
      <p>The final step in the pipeline is prediction
and detection. With the tested and optimized
model, real-time eBPF data is analyzed to make
predictions and detect potential ransomware
activities. The model analyzes the incoming
data, identifies patterns and behaviors, and
makes predictions about possible ransomware
activity. If ransomware activity is detected,
alerts are generated, and necessary actions are
taken to mitigate the threat. This step is crucial
for providing real-time protection against
ransomware, ensuring the security and
integrity of the systems.
9. Results
The implementation of the machine learning
pipeline for ransomware detection using eBPF
data and the SVM algorithm has yielded
promising results. This section presents a
detailed overview of the outcomes,
demonstrating the effectiveness and efficiency
of the proposed pipeline.</p>
      <sec id="sec-9-1">
        <title>Data Collection and Normalization</title>
        <p>During the initial phase, the eBPF module
successfully captured a comprehensive dataset
encompassing various system activities and
behaviors. Post normalization, the dataset,
comprising over 100,000 events, was
transformed into a consistent and standardized
format, ready for further processing.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Model Training and Testing</title>
        <p>The SVM model was trained on a dataset of
80,000 events and tested on a separate set of
20,000 events. The model exhibited robust
performance, achieving an accuracy of 95.2%
on the testing dataset. The other performance
metrics were also commendable, with a
precision of 94.8%, a recall of 95.5%, and an
F1-score of 95.1%.</p>
      </sec>
      <sec id="sec-9-3">
        <title>Prediction and Detection</title>
        <p>In the real-time prediction and detection
phase, the model successfully identified and
alerted for ransomware activities in various
instances. Out of 50,000 real-time events
analyzed, the model accurately detected 472
ransomware activities, with only 3 false
positives, underscoring the model's reliability
and effectiveness.</p>
      </sec>
      <sec id="sec-9-4">
        <title>Comparative Analysis</title>
        <p>For a comparative perspective, the same
dataset was also tested using the k-Nearest
Neighbors (kNN) algorithm. The SVM model
outperformed the kNN model, which achieved
an accuracy of 90.3%, a precision of 89.7%, a
recall of 90.8%, and an F1-score of 90.2%.</p>
      </sec>
      <sec id="sec-9-5">
        <title>Conclusion of Results</title>
        <p>The results affirm the robustness and
reliability of the proposed machine learning
pipeline for ransomware detection using eBPF
data and SVM algorithm. The high accuracy,
along with excellent precision, recall, and
F1score, underscores the model’s capability to
effectively detect ransomware activities in real
time, contributing significantly to enhancing
system security and integrity.
10.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Summary</title>
      <p>eBPF (Extended Berkeley Packet Filter) is
becoming increasingly popular as a security
instrument, particularly in cloud settings.
Previously, network monitoring and threat
detection relied on audits, system logs, and
disk analysis [26]. These methods were
resource-intensive, not always effective, and
disk analysis was inefficient. Signature analysis
is unable to detect ransomware, which is
nearly invisible. The primary advantages of
eBPF include:
• Flexibility and scalability: eBPF permits
the use of code within the OS kernel
without modifying the kernel itself,
making it simpler to adjust the system to
network traffic without needing to
restart.
• Performance: By executing code on the
kernel, eBPF enables high-speed data
processing for real-time monitoring and
threat detection.
• Customizability: eBPF allows for the
monitoring of specific parameters,
aiding in the identification of complex
threats such as ransomware.
• Integration: eBPF can be seamlessly
integrated with other security tools to
broaden analysis capabilities.
• Risk reduction: eBPF diminishes risks
associated with traditional methods by
offering a controlled environment for
code execution.</p>
      <p>Considering these advantages, eBPF serves
as the perfect solution for contemporary
environments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>P. O'Kane</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sezer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Carlin</surname>
          </string-name>
          , Evolution of Ransomware,
          <source>Iet Networks</source>
          <volume>7</volume>
          (
          <issue>5</issue>
          ) (
          <year>2018</year>
          )
          <fpage>321</fpage>
          -
          <lpage>327</lpage>
          . doi:
          <volume>10</volume>
          .1049/iet-net.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Analysis of the use of software decoys as a means of information security</article-title>
          ,
          <source>Cybersecur. Educ. Sci. Technol</source>
          .
          <volume>2</volume>
          (
          <issue>10</issue>
          ) (
          <year>2020</year>
          )
          <fpage>88</fpage>
          -
          <lpage>97</lpage>
          . doi:
          <volume>10</volume>
          .28925/
          <fpage>2663</fpage>
          -
          <lpage>4023</lpage>
          .
          <year>2020</year>
          .
          <volume>10</volume>
          .8897.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Buriachok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sokolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Skladannyi</surname>
          </string-name>
          ,
          <article-title>Security Rating Metrics for Distributed Wireless Systems</article-title>
          ,
          <source>in: Workshop of the 8th International Conference on “Mathematics. Information Technologies. Education:” Modern Machine Learning Technologies and Data Science</source>
          , vol.
          <volume>2386</volume>
          (
          <year>2019</year>
          )
          <fpage>222</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          , et al.,
          <article-title>Programmable System Call Security with eBPF</article-title>
          ,
          <source>IBM Research</source>
          , Yorktown
          <string-name>
            <surname>Heights</surname>
          </string-name>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/arxiv.2302.10366.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kerrisk</surname>
          </string-name>
          ,
          <article-title>Using Seccomp to Limit the Kernel Attack Surface</article-title>
          ,
          <source>in Linux Plumbers Conference (LPC'15)</source>
          (
          <year>2015</year>
          ). URL: https://man7.org/conf/lpc2015/limitin
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>