<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Cyber Resilience against APTs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Gaudenzi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Nodari</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodolfo Valentim</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Giordano</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Idilio Drago</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Russo</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Cerutti</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Turin</institution>
          ,
          <addr-line>Turin, TO</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Control and Computer Engineering</institution>
          ,
          <addr-line>Politecnico di Torino, Turin, TO</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Information Engineering, University of Brescia</institution>
          ,
          <addr-line>Brescia, BS</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Imperial College London</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper examines the Uber data breach of September 2022, where the Lapsus$ group exploited multi-factor authentication (MFA) fatigue to compromise contractor credentials. The attackers gained access to internal systems, demonstrating the sophistication and persistence of modern Advanced Persistent Threats (APTs). Using the ACRE framework, which focuses on later stages of the cyber kill chain, we highlight how efective Cyber Threat Intelligence (CTI) can systematically detect and analyse such attacks. The ACRE framework provides tools to collect, process, and analyse threat data, enabling organisations to identify APT activity and mitigate risks proactively. By applying ACRE to the Uber breach, this study demonstrates its capacity to uncover critical intelligence and improve defensive strategies. The case underscores the importance of intelligence-driven approaches in addressing the complexities of contemporary cyber threats and enhancing organisational resilience.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Threat intelligence</kwd>
        <kwd>Attack modelling</kwd>
        <kwd>Neuro-symbolic machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The modern threat landscape, characterised by an evolving adversarial ecosystem and an expanding
attack surface, presents a level of complexity that profoundly impacts the interconnected fabric of our
digital world. This environment fosters the proliferation of APTs [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], where sophisticated attackers
relentlessly seek to infiltrate networks, exfiltrate sensitive information, or establish a foothold for
launching subsequent attacks. Today’s cyber threats are increasingly powered by artificial intelligence,
persistent in nature, and omnipresent across all domains of the digital ecosystem.
      </p>
      <p>Advanced Persistent Threats represent a significant challenge in contemporary cybersecurity. These
threats are characterized by their sophisticated orchestration, stealthy execution, extended persistence,
and targeting of valuable assets across diverse sectors. APTs typically involve prolonged and targeted
cyberattacks, often orchestrated by well-funded and skilled adversaries, including nation-states and
organized cybercriminal groups. The complexity and persistence of APTs necessitate advanced detection
and defense mechanisms to protect critical infrastructure and sensitive information.</p>
      <p>
        A clear example of the intricacy of APT operations is provided by the Lockheed-Martin cyber kill
chain model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which outlines six key phases typically employed by attackers. The sequence begins
with reconnaissance, where adversaries gather intelligence to identify suitable targets. This is followed
by weaponisation, where a tailored payload is created and subsequently delivered to the target system.
Once delivered, the attacker exploits vulnerabilities to install a persistent backdoor. At this stage, the
malware establishes a command-and-control (C2) channel, enabling the attacker to pursue their mission
objectives. The cyber kill chain model also highlights potential defensive strategies, including detection,
denial, disruption (e.g., inline anti-virus), degradation (e.g., throttling communication), and deception
(e.g., deploying decoys such as honeypots).
      </p>
      <p>The focus of the ACRE framework lies in detecting attacks during the later stages of the kill
chain. This approach necessitates the collection and analysis of intelligence—often derived from
deception-based tools—to develop mechanisms capable of identifying threats at earlier stages.</p>
      <p>Pod</p>
      <p>Storage
Worker #2</p>
      <p>Worker #1</p>
      <p>VPN</p>
      <p>Pod
Master</p>
      <p>Pod</p>
      <p>Worker #3
Workstation #1</p>
      <p>Server #2</p>
      <p>Server #1</p>
      <p>Virtual
Machine #1</p>
      <p>
        CTI is the iterative process used by analysts to generate actionable insights into the vulnerabilities of
organisational assets that adversaries could exploit. Similar to traditional intelligence methodologies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
CTI involves several distinct phases. The process begins with defining intelligence requirements, such
as identifying APT campaigns. Analysts, either human or autonomous, then collect raw data—such as
network logs from firewalls—into preliminary storage repositories, often referred to as “shoeboxes.”
These raw data sets are then organised into an evidence file (Section 2).
      </p>
      <p>Subsequent stages involve data processing, where meaningful semantics are applied to the raw
data, creating a structured schema. This enriched dataset becomes the foundation for deeper analysis,
hypothesis generation, and verification (Section 3).</p>
      <p>Our preliminary experimental analysis (Section 4) builds on top of the Uber data breach by Lapsus$. In
September 2022, Uber Technologies Inc. sufered a cybersecurity breach by the Lapsus$ group using MFA
fatigue to compromise contractor credentials. The attackers accessed internal systems, posted messages
on Slack, and altered OpenDNS. Uber reported no access to sensitive user data. By systematically
following the methodological steps of CTI, organisations can develop a robust understanding of APT
activity and implement proactive measures to mitigate future threats.
2. ACRE Data Gathering and Processing Framework
The ACRE data collection architecture consists of several distributed nodes, each equipped with
specialized probes for capturing diverse data feeds from telescopes, honeypots, and CTI crawlers. As
seen in Figure 1, each node operates in a virtualized environment for edge computing. This structure
is designed to be deployed in multiple network providers, collecting data nearby possible victims of
remote attacks, such as servers in datacenters or client/IoT devices with public addresses on the edge of
the network. It ofers a distributed viewpoint on ongoing attacks. Each node is autonomous, capable of
local data processing and storage, and can run specific network sensors. The platform also supports
the running of distributed algorithms directly in the nodes, thus providing a framework for federated
learning of attacking patterns.</p>
      <p>
        Currently, the infrastructure hosts three types of sensors:
• Telescope nodes: Monitor one-way trafic, capturing unsolicited trafic directed to unused IP
address spaces. This data provides insights into global scanning activities and potential threats
targeting unprotected networks.
• Honeypots: The honeypot modules simulate vulnerable systems to attract and record attempted
attacks. The current deployment supports containers distributed by the TPot project [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The
collected data includes brute-force attack attempts as well as logs of shell sessions that can be
used to understand attacking behavior and identify emerging attack vectors.
• CTI Crawlers: These crawlers gather data from dark web sources and other CTI platforms,
aggregating threat intelligence indicators that are later used for enrichment and threat detection
purposes.
      </p>
      <p>The data processing pipeline begins at the node level, where each module gathers data on network
activity, detected anomalies, and reports threat intelligence insights. After collection, the logs of the
sensors undergo feature extraction, transforming raw inputs into a structured format suitable for
machine learning analysis. In particular, embeddings that represent the trafic patterns of diferent
attackers are learned directly on the nodes, relying on a federated learning version of the iDarkvec
algorithm [6].</p>
      <p>Embeddings and general trafic features are then logged into time-series representations, together
with attributes such as attackers’ IP addresses, timestamps, and trafic metadata. The parsed data is then
centralized for use in various downstream tasks. In particular, these data have been used for learning
attacking patterns and textual explanations using Large Language Models (LLMs) as well as to trigger
alerts based on anomaly detection algorithms [7].</p>
      <p>In the following, we detail one of the applications that can be built upon the data collected in this
infrastructure.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Reward Machines for CTI</title>
      <p>Reward machines are formal structures that extend finite state automata (FSA) by associating rewards
or costs with transitions between states. They are widely used in reinforcement learning to provide
structured feedback that guides decision-making processes. In the context of cybersecurity, reward
machines can model the sequential nature of tasks or attacks, where each state represents a specific
step in a process, and observable actions or events trigger transitions. This structured representation is
particularly valuable in capturing the procedural nature of APTs, where attackers typically follow a
series of well-defined stages to achieve their objectives.</p>
      <p>State estimation within a reward machine framework plays a crucial role in CTI. Given that the steps
of an attacker can be modelled as an FSA, state estimation allows defenders to infer the current phase of
an ongoing attack based on observed system behaviours or detected anomalies. For example, an attacker
might progress through reconnaissance, exploitation, and exfiltration, with each step corresponding to
a state in the automaton. However, adversaries often obscure their activities, making it challenging
to directly observe transitions. State estimation provides the means to infer these hidden states from
partial or noisy observations, enabling defenders to assess the attack’s progression.</p>
      <p>The integration of reward machines into CTI enhances this process by quantifying the impact of
potential defensive actions. By associating rewards with actions such as detecting, disrupting, or
deceiving the attacker, reward machines help optimise response strategies. For instance, defenders
can evaluate the trade-ofs between immediately halting an attack versus allowing it to proceed to
collect more intelligence about the adversary’s objectives and techniques. This approach aligns with
the goals of CTI, which seeks not only to understand and monitor threats but also to inform proactive
and adaptive defence measures. Consequently, the combination of state estimation, reward machines,
and CTI enables a systematic, intelligence-driven defence strategy that improves situational awareness
and enhances resilience against sophisticated threats.</p>
      <sec id="sec-2-1">
        <title>3.1. Basics of Reinforcement Learning</title>
        <p>We formalize Reinforcement Learning (RL) tasks as labelled Markov decision processes (MDPs) [8, 9]. An
MDP is defined as the tuple ⟨, , , , , , , ℒ⟩, where:
•  is the set of states,
• ℒ :  ×  ×  →</p>
        <p>propositions.
•  is the set of actions,
•  :  ×  → Δ() is the transition probability function,
•  : ( ×  )+ ×  → R is the reward function,
•  : ( ×  ) ×  * → {⊥, ⊤} × {⊥ , ⊤} is the termination function,
•  ∈ [0, 1) is the discount factor,
•  is a finite set of propositions representing high-level events,</p>
        <p>2 is a (perfect) labeling function mapping state-action-state triplets to sets of
These sets of propositions are referred to as labels. The transition function  is Markovian, whereas
the reward function  and termination function  may depend on the history (i.e., they are
historydependent).</p>
        <p>Given a state-action history ℎ = ⟨0, 0, . . . , ⟩ ∈ ( ×  )* × , we define a trace   = ⟨ℒ(∅, ∅, 0),
. . . , ℒ(− 1, − 1, )⟩ ∈ (2 )+, which assigns labels to the triplets in ℎ. The objective is to find a
policy  : (2 )+ ×  → Δ() that maps traces and states to a probability distribution over actions,
maximizing the expected cumulative discounted reward (or return)  = E [︀ ∑︀
=  − (ℎ)︀] , where
 is the final step of the episode. To achieve this, traces must accurately represent histories, since the
reward and termination functions may depend on traces rather than just the current state.</p>
        <p>The agent-environment interaction proceeds as follows. At time , with trace   ∈ (2 )+, the
 , ⟩, where  ∈  is the current state,  ∈ {⊥, ⊤} indicates if the
agent observes the tuple ⟨, 
history is terminal, and  ∈ {⊥, ⊤} indicates if the task’s goal has been achieved. Both  and  are
determined by the termination function  . The agent also observes a label  = ℒ(− 1, − 1, ). If
the history is non-terminal, the agent selects an action  ∈ , and the environment transitions to state
+1 ∼ (· | , ). The agent then observes the new tuple ⟨+1, +1, +1⟩ and label +1, updates

the trace as  +1 =   ⊕ +1, and receives reward +1. A trace   is a goal trace if ⟨ , ⟩ = ⟨⊤, ⊤⟩,
a dead-end trace if ⟨ , ⟩ = ⟨⊤, ⊥⟩, and an incomplete trace if  = ⊥.</p>
        <p>Learning policies over entire histories or traces is impractical due to their potentially unbounded
length. Therefore, we use reward machines to encode traces, succinctly facilitating eficient policy
learning.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Reward Machines</title>
        <p>A reward machine (RM) [10, 11] is a finite-state representation of a reward function. Formally, an RM is
a tuple  = ⟨, ,  ,  , 0, , ⟩, where:
•  is a set of states,
•  is the set of propositions (alphabet),
•   :  × 2 →  is the state-transition function,
•   :  ×  → R is the reward-transition function,
• 0 ∈  is the initial state,
•  ∈  is the accepting state,
•  ∈  is the rejecting state.</p>
        <p>Reward machines are used during agent-environment interactions. Starting from 0, the agent
transitions through RM states according to   and receives rewards via  . Given an RM  and a
trace  = ⟨0, . . . , ⟩, a traversal  ( ) = ⟨0, 1, . . . , +1⟩ is the sequence of RM states where (i)
0 = 0, and (ii) +1 =  (, ) for  = 0, . . . , . Traversals for goal and dead-end traces should end
in  and , respectively; incomplete traces end elsewhere.</p>
        <p>Reward machines provide compact representations of traces by encoding diferent task completion
stages within RM states. As a result, when rewards are defined over  ×  , they become Markovian.
Based on this, [11] propose an algorithm that learns an action-value function (Q-function) over  ×  ,
estimating the expected return after taking an action from a given state.
 ×  ×  →</p>
        <p>Given a transition from state  to ′ with action  and label  = ℒ(, , ′), the Q-function  :
R is updated as:
(, , −) ←   (, ′) +  max (′, ′, ′),
′∈
(1)
where ′ =  (, ), and −  ←   denotes  ←  +  ( − ). In the tabular setting, where estimates
are maintained for each state-action pair, this algorithm converges to the optimal policy in the limit [11,
Theorem 4.1].
3.3. Reward Machine and CTI: the Importance of State Estimation
An essential aspect of leveraging RMs in reinforcement learning is the ability to associate observed
traces with the current RM state. This association is particularly critical when only partial traces are
available, as in real-world scenarios where agents operate under incomplete or noisy observations. By
learning a mapping between partial traces and RM states, one can predict the most likely RM state
even when the full trace is unavailable, enabling the agent to act efectively in dynamic and uncertain
environments.</p>
        <p>The idea stems from the observation that RM states encapsulate task progressions through a
sequence of high-level events represented by labels [10, 11]. Each RM state corresponds to a specific
configuration of task completion, and transitions between states are governed by the labels observed in
the environment. By training a model to infer the current RM state from observed labels and partial
trace information, we can enable agents to generalise across similar tasks and handle ambiguous or
missing data.</p>
        <p>Such learning approaches could involve supervised learning methods where a dataset of traces and
corresponding RM states is used to train a classifier or regressor. Alternatively, reinforcement learning
agents can integrate trace-to-state prediction into their policy optimisation process, using predictions
to guide decisions in real time. This capability enables the agent to navigate complex environments
with non-Markovian rewards while maintaining a compact representation of task progress.</p>
        <p>Learning trace-to-state associations also facilitates eficient state estimation in partially observable
environments. In particular, several complex cyber attacks can be modelled as reward machines,
where each state represents a distinct stage of the attack, and transitions between states correspond
to specific adversarial actions or system events. For instance, the ultimate reward for an attacker
might be the successful exfiltration of sensitive information, with intermediate states corresponding
to reconnaissance, exploitation, lateral movement, and persistence. This structured representation
captures the sequential and goal-driven nature of APTs, where the attacker progresses systematically
through the attack stages.</p>
        <p>By analysing the traces left behind by the attacker — such as logs, network anomalies, or other
technical intelligence (TECHINT) — defenders can infer both the occurrence of an attack and its current
stage. Each trace, composed of high-level events extracted from system logs or behavioural data,
corresponds to a label in the RM framework. For example, detecting an unusual spike in network trafic
might indicate a transition to a data exfiltration state. Similarly, identifying a previously unknown
process running on a critical server might signal lateral movement.</p>
        <p>Mapping these traces to RM states enables defenders to reconstruct the adversary’s progression,
providing actionable intelligence to predict their next moves. This approach allows cybersecurity
teams to transition from reactive measures, such as responding to detected anomalies, to proactive
strategies that anticipate and mitigate potential threats. Additionally, the RM framework ofers a compact
and interpretable model of the attack process, facilitating both real-time analysis and retrospective
investigations to strengthen overall cyber resilience.</p>
        <p>Action
Network Share Discovery
Tactic_ID: TA0007
Technique_ID: T1135
Description: Within the
Uber environment, the user
had access to a network
share</p>
        <p>Action
Credentials in Files
Tactic_ID: TA0006
Technique_ID: T1552.001
Description: Attackers
discovered a PowerShell
script containing hard-coded
privileged accounts within
the network share</p>
        <p>Action
Valid Accounts
Tactic_ID: TA0005
Technique_ID: T1078
Description: Attackers used
valid account credentials to
access the PAM solution</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. The Uber Breach by Lapsus$</title>
      <p>In September 2022, Uber Technologies Inc. experienced a cybersecurity breach attributed to the Lapsus$
hacking group. The attackers gained access by compromising the credentials of an external contractor,
employing a technique known as MFA fatigue or MFA bombing. This method involves inundating the
target with multiple multi-factor authentication (MFA) requests until one is approved, thereby granting
the attacker access.1</p>
      <p>Once inside Uber’s network, the intruders accessed several internal systems, including G-Suite and
Slack. They posted a message on a company-wide Slack channel and altered Uber’s OpenDNS to display
a graphic image on some internal sites. However, Uber reported no evidence of access to production
systems that store sensitive user information, such as personal and financial data. 2</p>
      <p>Uber’s investigation, conducted in collaboration with the FBI and the U.S. Department of Justice,
concluded that the attackers were afiliated with Lapsus$, a group known for targeting technology
companies.</p>
      <sec id="sec-3-1">
        <title>4.1. The Attack Analysed using MITRE ATT&amp;CK Flow</title>
        <p>The MITRE ATT&amp;CK Flow3 is a structured framework designed to visualise systematically and model
adversary behaviours and attack sequences. It allows security teams to document, analyse, and
communicate the progression of attacks using a flowchart-like representation, connecting individual tactics,
techniques, and procedures (TTPs) from the MITRE ATT&amp;CK knowledge base. By mapping these
sequences, organisations can better understand how an adversary moves through an attack lifecycle,
identify potential defence gaps, and improve detection and response strategies. The framework enhances
situational awareness, enabling more robust security postures and collaborative threat analysis.</p>
        <p>Figure 2 describes the central part of the Uber breach. Once gained access through the external
contractor identity, the attackers had access to network shares. They discovered a PowerShell script
containing hard-coded privileged account credentials, which could be used to access the Privileged
Access Manager (PAM) solution. That allowed them to access the entire communication network in
Uber.</p>
        <p>These three steps can thus be described as a small finite-state machine that can be analysed in a
hard-coded simulator to assess the possibility of learning the state of the attack from observations.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. From MITRE ATT&amp;CK Flow to a Reward Machine</title>
        <p>To prove the viability of detecting the attack’s progression based on observable events, we define a
three-state reward machine to represent the attack sequence, shown in figure 3.</p>
        <p>State 0 (Initial Access): The attacker has gained initial access through the compromised external
contractor’s credentials.
1https://center-for-threat-informed-defense.github.io/attack-flow/ui/?src=..%2fcorpus%2fUber%20Breach.afb (on 27 November
2024).
2https://www.uber.com/en-NO/newsroom/security-update/ (on 27 November 2024).
3https://center-for-threat-informed-defense.github.io/attack-flow/ (on 27 November 2024).</p>
        <p>label:
Credential
Retrieval
label:
Access</p>
        <p>Attempt
0
Initial Access</p>
        <p>1
Credential Discovery</p>
        <p>2
PAM Access</p>
        <p>State 1 (Credential Discovery): The attacker has discovered the PowerShell script containing
hardcoded privileged account credentials.</p>
        <p>State 2 (PAM Access): The attacker successfully accessed the PAM system using the discovered
credentials. This state represents the successful completion of the modelled attack sequence.</p>
        <p>A crucial component of this model is the labelling function, which maps observable events to labels.
These labels provide evidence that a transition between states has occurred. They could be derived
from system logs analysed through LLMs, as explained in Section 2. In this case, we define two possible
labels:
Credential Retrieval: This label corresponds to successfully retrieving or accessing the hard-coded
credentials from the PowerShell script and thus triggers the transition from state 0 to state 1 in
Fig. 3. Evidence for this could be found in logs showing the execution of the PowerShell script or
access to the file containing the credentials.</p>
        <p>PAM Access Attempt: This label corresponds to an attempt to access the PAM system, triggering the
transition from state 1 to state 2 in Fig. 3. Logs showing authentication attempts against the PAM
system would trigger this label.</p>
        <p>Thus, the labelling function connects the reward machine’s abstract state transitions to concrete,
observable events within the system logs, providing evidence that the state transitions have occurred.
4.3. Simulated Environment, Trace Collection and State Estimation
In the simulated environment, the nodes represent various devices or systems within a network, and
the goal of the agent representative of the attacker is to navigate through the network, probe for
vulnerabilities, and access critical credentials in a particular node to access the PAM solution of the
system.</p>
        <p>The actions the agent can perform in this environment are probing a node and attempting to retrieve
credentials by exploiting a vulnerability. Probing involves scanning a node to gather information, such
as identifying vulnerabilities or obtaining credentials. Attempting to retrieve credentials occurs when
the agent exploits a discovered vulnerability to gain access to sensitive information.</p>
        <p>The traces are collected by recording the agent’s state, the action taken, the labeling function activated,
and the resulting state of the attack at each time step during every try of the agent’s training. After
collecting the initial traces, we removed the first 20% of the data to avoid making the dataset too biased
by observations from the purely exploratory phase.</p>
        <p>The agent’s state corresponds to the last node it probed, with an additional flag indicating whether it
has just attempted to retrieve credentials (without indicating whether the credentials were successfully
retrieved). The other information about the environment’s state is derived from the labeling function.</p>
        <p>We implemented two neural network architectures: a Multi-Layer Perceptron (MLP) and a Long
Short-Term Memory (LSTM) network. The MLP is a simple feedforward neural network that processes
ifxed-size input data, while the LSTM is a type of recurrent neural network designed to handle sequential
Model
LSTM
MLP
1
2
3
4
5
1
2
3
4
5
data by capturing temporal dependencies. For the models we used, the input consists of the agent’s
state and the labeling function. For each time step, we considered a rolling window of the previous n
observations, where n can range from 1 to 5. The rolling window ensures that the models have access
to a broader context, while ensuring that the model does not have access to too much of the past history
which could introduce unnecessary complexity or noise.</p>
        <p>The MLP model consists of three fully connected layers: the first layer transforms the input features
into a higher-dimensional space, the second layer reduces the dimensionality of the features to half the
size of the previous layer and the final layer maps the processed features to the output space. Each of
the first two layers is followed by a ReLU activation function to introduce non-linearity and enable the
network to model complex relationships. The last layer outputs predictions without any activation.</p>
        <p>The LSTM model consists of two components: an LSTM layer that processes input sequences and
outputs a sequence of hidden states. This layer is capable of capturing both short-term and long-term
dependencies in the data. A dropout mechanism is included within the LSTM to reduce overfitting
during training and a fully connected layer that maps the hidden states from the LSTM to the output
space. The output of the fully connected layer is passed through a softplus activation function to ensure
smooth, non-negative predictions.</p>
        <p>Table 1 summarizes the results for both models, measured using accuracy, precision, recall and
F1-score metrics. The results demonstrate how each model’s performance varies with diferent window
sizes, reflecting the impact of this parameter on the models’ ability to process and analyze the input
data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions</title>
      <p>This paper has demonstrated how the ACRE framework can address the challenges posed by Advanced
Persistent Threats by providing efective tools for data collection, processing, and foresight generation.
Using the Uber data breach as a case study, we showed how ACRE enhances the Cyber Threat
Intelligence process by systematically analysing threats during the later stages of the cyber kill chain. The
framework’s ability to integrate and structure raw data enables the extraction of actionable intelligence,
supporting organisations in identifying complex attack patterns and mitigating risks proactively.</p>
      <p>This research is important because it contributes to bridging the gap between traditional CTI
methodologies and modern advancements in threat analysis. By leveraging tools such as ACRE, organisations
can process large-scale threat data eficiently and anticipate and respond to adversarial strategies in a
timely manner. Integrating predictive foresight with actionable intelligence is critical in addressing
the growing sophistication of cyber threats, enabling a shift from reactive to proactive cybersecurity
postures.</p>
      <p>Future work will explore the automatic learning of reward machines to enhance the ACRE
framework’s capabilities further [12]. As part of neuro-symbolic reinforcement learning, reward machines
ofer a structured means of modelling adversarial behaviours and generating dynamic threat responses.
By incorporating automated learning mechanisms, ACRE can adapt to evolving APT tactics in real-time,
enabling more efective modelling of complex threat environments.</p>
      <p>In addition, we aim to leverage the capabilities of LLMs to strengthen the ACRE framework further.
LLMs provide a powerful foundation for extracting meaningful semantics from unstructured and
semistructured threat intelligence data, automating parts of the CTI process such as data enrichment, pattern
recognition, and hypothesis generation. By connecting LLMs with the learning mechanisms of reward
machines, we can create an adaptive pipeline that processes and interprets large-scale threat data and
generates symbolic representations of adversarial behaviours. This integration will enable ACRE to
refine its predictive foresight capabilities, facilitating the identification of novel attack patterns and
supporting the development of proactive defence strategies.</p>
      <p>Combining neuro-symbolic approaches with LLMs’ natural language processing abilities will position
ACRE as a scalable and adaptive solution for modern cybersecurity challenges, capable of addressing
the complexity and volume of data inherent in today’s threat landscape.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This project was partially funded by the Italian Ministry of University as part of the PRIN: PROGETTI
DI RICERCA DI RILEVANTE INTERESSE NAZIONALE – Bando 2022, Prot. 2022EP2L7H This work
was partially supported by project SERICS (PE00000014) under the MUR National Recovery and
Resilience Plan funded by the European Union – NextGenerationEU, specifically by the project NEACD:
Neurosymbolic Enhanced Active Cyber Defence (CUP J33C22002810001). The research reported in
this paper was sponsored in part by the DEVCOM Army Research Laboratory via cooperative
agreement W911NF2220243. Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect the views of the United States
government.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[6] L. Gioacchini, L. Vassio, M. Mellia, I. Drago, Z. B. Houidi, D. Rossi, i-darkvec: Incremental
embeddings for darknet trafic analysis, ACM Trans. Internet Technol. 23 (2023). URL: https:
//doi.org/10.1145/3595378. doi:10.1145/3595378.
[7] M. Bofa, I. Drago, M. Mellia, L. Vassio, D. Giordano, R. Valentim, Z. B. Houidi, Logprécis:
Unleashing language models for automated malicious log analysis: Précis: A concise summary of
essential points, statements, or facts, Computers &amp; Security 141 (2024) 103805. URL: https://www.
sciencedirect.com/science/article/pii/S0167404824001068. doi:https://doi.org/10.1016/j.
cose.2024.103805.
[8] J. Fu, U. Topcu, Probably Approximately Correct MDP Learning and Control With Temporal Logic</p>
      <p>Constraints, in: Proceedings of the 10th Robotics: Science and Systems Conference (RSS), 2014.
[9] D. Furelos-Blanco, M. Law, A. Jonsson, K. Broda, A. Russo, Hierarchies of Reward Machines,
in: Proceedings of the 40th International Conference on Machine Learning (ICML), 2023, pp.
10494–10541.
[10] R. Toro Icarte, T. Q. Klassen, R. A. Valenzano, S. A. McIlraith, Using Reward Machines for
HighLevel Task Specification and Decomposition in Reinforcement Learning, in: Proceedings of the
35th International Conference on Machine Learning (ICML), 2018, pp. 2112–2121.
[11] R. Toro Icarte, T. Q. Klassen, R. A. Valenzano, S. A. McIlraith, Reward Machines: Exploiting Reward
Function Structure in Reinforcement Learning, Journal of Artificial Intelligence Research 73 (2022)
173–208.
[12] R. Parac, L. Nodari, L. Ardon, D. Furelos-Blanco, F. Cerutti, A. Russo, Learning robust reward
machines from noisy labels, in: Proceedings of the 21st International Conference on Principles of
Knowledge Representation and Reasoning (KR 2024), 2024. Preprint available at arXiv:2408.14871.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alshamrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Myneni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>A survey on advanced persistent threats: Techniques, solutions</article-title>
          , challenges, and research opportunities,
          <source>IEEE Communications Surveys &amp; Tutorials</source>
          <volume>21</volume>
          (
          <year>2019</year>
          )
          <fpage>1851</fpage>
          -
          <lpage>1877</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Combating advanced persistent threats: Challenges and solutions</article-title>
          , IEEE Network (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Hutchins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cloppert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <article-title>Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains</article-title>
          ,
          <source>Lockheed Martin Corporation</source>
          <volume>1</volume>
          (
          <year>2011</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pirolli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Card</surname>
          </string-name>
          ,
          <article-title>Sensemaking processes of intelligence analysts and possible leverage points as identified through cognitive task analysis</article-title>
          ,
          <source>Proceedings of the 2005 International Conference on Intelligence Analysis</source>
          (
          <year>2005</year>
          )
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] TPot, The all in one honeypot platform</article-title>
          ,
          <year>2024</year>
          . URL: https://github.com/telekom-security/tpotce.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>