Heated Alert Triage (HeAT): Network-Agnostic Extraction of
                  Cyber Attack Campaigns
                              Stephen Moskal∗1 and Shanchieh Jay Yang1
                                   1
                                       Rochester Institute of Technology


                                                   Abstract
              With growing sophistication and volume of cyber attacks combined with complex network
          structures, it is becoming extremely difficult for security analysts to corroborate evidences
          to identify campaigns and threats on their network. So much so that organizations employ
          teams of security professionals just to keep up with vast amount of data presented to the
          analysts each day. This work develops HeAT (Heated Alert Triage): given a critical indi-
          cator of compromise (IoC) such as a severe IDS alert, HeAT produces a HeATed Attack
          Campaign depicting the actions that led up to the critical event including reconnaissance
          and initial exploitation stages. We define the concept of “Alert Episode Heat" to represent
          the analysts opinion of how much an event contributes to the attack campaign of the critical
          IoC given their own knowledge of their network context and security expertise. Leveraging
          a network-agnostic feature set and a short but targeted training process, HeAT is able to
          realize insightful and concise attack campaigns for IoC’s not observed before, compare at-
          tack strategies of different attackers with the same IoC, and also be applied across networks
          with the same degree of fidelity. HeAT maintains the analysts original assessment of the
          specified “HeAT" regardless of the critical event being assessed or the network topology. We
          demonstrate the capabilities of HeAT with case studies using cyber-competition datasets to
          mimic how HeAT would be deployed in practice and assess the HeATed attack campaign
          from the analyst’s perspective. With the goal of aiding the analyst in quickly finding further
          evidence of an attack, we show that HeAT immediately reveals each attack stage of an attack
          campaign embedded deeply within millions of alerts that may have needed a whole team of
          analysts to achieve otherwise.


1        Introduction
Threats of sophisticated and highly impactful cyber attacks have become so common that many
organizations have implemented “Security Operations Centers" (SOC) to investigate, respond
to, and hunt potential threats within networks. SOC’s typically implement a tiered structure
where a tier 1 analyst triages the network for critical events which may be escalated to a tier
2 analyst who will respond to the incident. Assume the role of a tier 1 SOC analyst and you
observe a critical alert, “GPL EXPLOIT CodeRed v2 root.exe access", targeting a customer
database. While occurring on a critical asset, a single alert may not be enough evidence to
escalate to the tier 2 analyst and you must now look for other “Indicators of Compromise" (IoC)
to develop more evidence that the alert was indeed caused by an adversary. This is known as
a “triage" and is typically a time consuming and mostly manual process sometimes involving
multiple analysts to comb through lengthy log files to find other IoC’s related to the initial IoC.
With the inflation of network sizes and the general increase of foreign threats broadly targeting
any type of organization, many SOC analysts are overwhelmed with the amount of log data from
    ∗
        sfm5015@rit.edu


    Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons
                        License Attribution 4.0 International (CC BY 4.0).
    Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                         S. Moskal et. al.


Intrusion Detection Systems (IDS) which hampers their ability to quickly assess their network
for threats.
    Given a critical alert (an IoC) and IDS alert logs, we ask if we can leverage machine learning
techniques to aid the analyst in the triage process and automatically reveal other steps the
adversary took to “arrive" at their goal. The compilation of the actions detailing each “stage"
of the attack is called an “attack campaign" which would describe how, when, and where the
attacker learned about the network, gained initial access, and then eventually achieving their
goal. Developing this attack campaign from IDS alert logs can be extremely difficult as the
analyst must consider for each alert: the network context, related attributes between the alerts,
and their own expertise to determine the relationship between the critical alert and prior alerts.
These considerations sometimes leads to subjectivity of the actual contribution of the alert to
the attack campaign. We envision an automated triage system to reflect the analyst’s opinion
on the types of events that they believe are a part of an attack campaign and ability to apply
that “thinking" to other triages in the future.
    We propose a system, HeATed Alert Triage (HeAT), to perform automated triaging of IDS
alerts. Given a critical IDS alert, HeAT creates a “HeATed Attack Campaign" (HAC) using a set
of network agnostic features and a small set of analyst defined critical alert episode relations. In
the form of aggregated alerts defined as “Alert Episodes," the HAC’s generated by HeAT tells the
story of the attacker’s progression leading to a critical event. HeAT estimates the "Alert Episode
HeAT” (AEH) for each alert episode with respect to the critical alert to describe the episodes
contribution to the attack campaign given how the analyst has interpreted AEH previously.
We have developed HeAT with reusability and transferability in mind; we use network agnostic
features so that HeAT can uncover attack campaigns for other critical alerts, adversaries, or
networks. We envision HeAT to be used by SOC analysts to display the HAC once they observe
the first IoC so that they can quickly determine if further action is needed for not only one attack
type but for many. Note that we demonstrate the methodology and capability of HeAT with
one specific IDS, Suricata, in this work, while the network agnostic features are generalizable to
treat heterogeneous alerts and event logs.
    Using a set of targeted case studies by processing data collected through cyber-competitions,
we demonstrate, in close to real “deploy-able" scenarios, HeAT’s ability to:

    1. Leverage a small amount of self-labeled data to discover meaningful insights into attack
       campaigns for critical alerts,

    2. compare attack strategies for the same critical event and quickly determine key milestones
       such as discovery, initial access, and other events leading up to the critical event, and

    3. identify attack campaigns under different network settings, and reveals non-coincidental
       patterns in attack strategies across networks.


2     Related Work
Extraction and assessments of attack campaigns has been studied in depth in the form of At-
tack Graphs (AG) and have the capability to provide detailed insights into how attackers can
traverse a network. AG’s use network topology and vulnerability assessments to define poten-
tial paths through a network an adversary can exploit. AG works employ techniques such as
alert correlation [20, 25, 23, 22], process-mining [5, 3], and Markovian-based approaches [7, 8] to
map observables to pre-existing AG’s. However these approaches require a significant amount
of expert knowledge to configure, create attacker scenario templates, and assumes that each
vulnerability is known [2]. If we constrain our research to find approaches that give AG-like
insight without intimate knowledge of the network and vulnerabilities, we find significantly less
works within academia. Navarro et al. presents HuMa [16] and OMMA [17] to extract context


    Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                         S. Moskal et. al.


from logs, vulnerability databases like CVE and CAPEC, and analyist feedback to find malware
behaviors. Moskal et al. used a suffix-based Markov chain to derive sequences of aggregated
alerts based on their alert characteristics called attack episodes so that sequences of episodes
could be compared [14]. Landauer et al. [11] extracts from cyber threat intelligence (CTI)
reports and applies the knowledge to raw log data to report actionable multi-stage scenarios.
Lastly, Nadeem et al. [15] present SAGE which employs S-PDFA to extract meaningful AG’s
from only intrusion alerts and without prior expert knowledge. However, an issue that plagues
these works is the lack of high quality labeled attack scenario data to comprehensively assess,
compare, and validate the identified attack strategies.
    In the private sector, where data is more abundant, the concept of AI-driven products to
assess and automatically triaging a network is an extremely fast growing sector. As of 2021,
the adoption of AI/ML techniques to solve cyber security problems has exploded. To name a
few, companies such as DarkTrace with their “Cyber AI Analyst" [4], IBM with QRadar Advisor
with Watson [10], and Centripetal with AI-Analyst [18] all advertise their capabilities to leverage
AI specifically to aid analysts in the triaging process. While these products are undoubtedly
extremely sophisticated due to their substantial resources, it is impossible to assess their true
capabilities due to the proprietary nature of the method and data. While we do not claim to
compete with these products, their existence shows that this is a developing and more notably
a valuable problem to address and document.


3     HeAT: Extraction of a Heated Attack Campaign (HAC)
Given the IDS alert logs from a network and an IoC such as a critical IDS alert, our objective is
to develop of sequence of alerts likely to be related to the IoC, forming the attack campaign of the
adversary. We define an “Attack Campaign" as the collection of actions in time which describe
each stage of an attack conducted by an adversary leading to some objective. As there will be
no ground truth describing the real attack campaign, we rely on an initial triage to establish
characteristics of an actual attack campaign first. Then we address other technical challenges
such as high alert volume, network-specific attack characteristics, and limited analyst data-
labeling resources to extract meaningful and concise attack campaigns quickly. The summary of
the methods used in HeAT are described below and the system overview is shown in Figure 1.

     • Introduce the labeling approach called “Alert Episode Heat" (AEH) as a numeric ranking
       system (0-3) representing key milestones of an attack campaign leading towards an IoC,

     • propose a short and efficient labeling process to capture the analyst’s reflection of mean-
       ingful relationships between alert episodes contributing to the same attack campaign,

     • use an attack stage-based Gaussian smoothing approach to alert aggregation to create
       alert episodes indicative of actions performed by adversaries,

     • use alert episodes to derive network agnostic features relating characteristics between
       episodes, enabling prediction of AEH regardless of attack type or network configuration,

     • use ML/AI to learn and predict AEH values for prior episodes to a critical episode, and

     • construct and visualize HeATed Attack Campaigns (HAC) with “HeATed episodes".

    In the subsequent sections in this work we define the concept of AEH, describe our process
to aggregate alerts as alert episodes, define our AEH labelling process, and then finally describe
our application of AEH to generate the HAC for a given critical alert.


    Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                         S. Moskal et. al.


Figure 1: Process overview of HeAT to generate HAC from a set of IDS alerts and a given
critical alert.


3.1   Alert Episode Heat - Progress Towards Attack Objective
The concept of Alert Episode Heat (AEH) is a numeric ranking system (0-3) which given a
critical alert episode ec and a prior episode ep , AEH ranks the contribution of ep to the attack
campaign of ec . We use the concept of “alert episodes" to represent groups of alerts that are
indicative of action(s) with a specific impact. Each alert episode may contain one or many
alerts sharing similar attributes, such as attack impact, which may or may not be related to
the campaign of ec . AEH is intended to capture the attacker’s progression towards ec given the
alerts of ep .
    While many IDS’s already have some notion of severity embedded within the alert (Suricata’s
severity attribute), these are typically static and independent from all other alerts that have
occurred. IDS’s such as Suricata have no notion of correlated alerts but simply report suspicious
behavior based on signature matches of known adversarial actions and additional information is
needed to determine if two events are correlated. Additional factors such as the network topology,
the assets contained on specific machines, and the analyst’s own expertise is considered when
correlating the true severity between security events. The concept of Alert Episode Heat (AEH)
is to create these correlations between a critical episode and the episodes prior. Given ec and ep ,
our objective is to define an AEH Generator as: h(ep |ec ) = f (ep , ec ) where {h ∈ R|0 ≤ h ≤ 3}.
    We design the AEH values as a small set of discrete values that signify key milestones within
an attack campaign. Table 1 describes the characteristics of the “HeAT levels" which are used
to label and create the initial AEH training set. We use the high-level attack stages such as
“reconnaissance", “exploitation", and “actions on objective" [12] to represent heat levels 1, 2
and 3, respectively, to reflect their progressive impact on a network. We choose a “less-is-more"
approach as we embed specific attack stage information within our labels and human studies show
that 3 to 4 options is optimal reduce error for human surveys [1]. With HeAT level representing
a small number of mutually exclusive attack stages, we believe the analyst can quickly determine
an appropriate HeAT level and we believe there will be less ambiguity between HeAT levels.

              Table 1: Description of the AEH levels relating to attack milestones

               AEH                              Description
                0      No relation to critical event
                1      Recon. actions that may provide info. about ec
                2      Exploitation of assets giving access required to achieve ec
                3      Exfiltration/DoS/Access to info. directly relevant to ec

    Given our focus on episodes we now describe our process for converting individual alert
streams into aggregated alerts to “Alert Episodes" and then features that describe our episodes.


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                          S. Moskal et. al.


3.2   Alert Episodes with the Action-Intent Framework (AIF)
When discussing IDS’s it is common knowledge that they are plagued with generating a high
volume of alerts due to false positives, vague signatures, or actions that cause excessively repeated
alerts. Alert aggregation is used to group alerts of similar attributes such as time proximity or
impact to reduce the number of events presented to the analyst but also could represent an action
performed by the adversary. Our main limitation is that we only have the attributes defined
within the Suricata IDS alerts, which is limited in scope, however we use the alert signature
description to deduce the “type" of action the adversary could be performing. Given the alert
attributes, we define an “Alert Episode" to be the set of aggregated alerts for a single source IP
and same attack stage across multiple target IP’s within a similar time proximity. We accept
that source IP is not a totally reliable attribute, we believe it is the best opportunity to capture
alerts caused by one adversary.
    We adopt the Gaussian Smoothing approach as described by Moskal et al. to aggregate alerts
based on source IP, attack stage, and time [14]. Moskal et al. [14] describes a process where
alerts are aggregated based on the fluctuations of alert volume within a time window for specific
IP addresses and Suricata categories to uncover common sub-sequences of attack patterns. We
choose this process due to its effective application of Gaussian smoothing to represent aggregate
alerts whose the alert arrival time may be inconsistent, sporadic, or periodic. Moskal et al.
concludes [14] mentioning that the Suricata alert “category" is a weak representation attack
stages of an well established attack stage framework such as MITRE ATT&CK. The Action-
Intent Framework (AIF), also defined by Moskal et al. [13], was created as an IDS-focused
version of MITRE ATT&CK and other kill chain frameworks. A mapping between Suricata
signatures and a Action-Intent Stage (AIS) is given. Gaussian low-pass filtering is applied to
histograms in time of alert volume for single IP and AIS, where the LPF filter parameter is set
based on the expected duration of the action on a per AIS basis. Certain types of attacks may
have longer duration than others and thus different filter sizes are used.
    Our Alert Episodes are derived by evaluating each peak of the AIS-based filtered histograms
and the collection alert(s) contained in-between the two local minima of the corresponding peak
make the episode. An example of this process can be seen in Figure 2. Conducting this process
over each attack stage for each source IP, combining the derived episodes, and sorting the by
the peak episode time gives an abbreviated view of the sequence of “actions" performed by that
adversary.


Figure 2: The Gaussian smoothing approach by Moskal et al. accounts for variations in alert
arrival time to create Alert Episodes and creates sequences of episodes by sorting episodes by
peak smoothed volume times.

   This episode representation not only summarizes the alerts but also enables us to define
network-agnostic features to compare episodes. Next, we define our network-agnostic features
used to represent the relationship between two alert episodes and the AEH.


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                          S. Moskal et. al.


3.3   Network Agnostic Features between Alert Episodes
We engineer our features with two elements in mind: 1) the features describe relations between
two episodes so that the AEH value can be determined with respect to a critical episode and
2) the features are network agnostic so that the model does not learn network specific HeAT
relations that cannot be applied to other attack types or network configurations. As the episodes
contain set of alerts with a wide variety of complex data types such as IP addresses, alert
signatures, etc., we manually define a set of episode features to represent each of these data
types. Each alert episode contains the attributes shown in Table 2 which are derived from the
alerts contained within the episode.

           Table 2: Definitions of the attributes contained within an alert episode.

                        Name              Symbol            Description
                Ep. Peak                  epeak     Time of peak alert volume
                Ep. Start                 estart    Time of earliest alert
                Ep. End                   eend      Time of latest alert
                Distinct Source(s)        esrc      Set of distinct source IP(s)
                Distinct Target(s)        etgt      Set of distinct target IP(s)
                Distinct Sig(s)           esig      Set of distinct signatures
                Distinct Dest. Port(s)    eport     Set of distinct dest. ports
                AIS                       eais      AIS of the episode

    We define three types of features to capture different aspects of common characteristics
between episodes: 1) Time, 2) IP, and 3) Action based features, shown in Table 3. The time-
based features capture the differences between the critical alert episode and the prior episodes.
Our IP-based features compare if there are similarities between IP addresses of the two episodes
without defining any details of the IP addresses themselves. Lastly, our action-based features
capture similarities between attack stages, signatures, and port numbers to determine if the two
episodes have a similar network impact.

  Table 3: The set of network agnostic features relating the attributes of two alert episodes.

                Type            Feature                        Description
                                                   Overlap between the start &
                        Ep. Interval Overlap
                                                   end times of ec and ep
                Time    Ep. Peak Time Diff.        ec,peak − ep,peak
                        Ep. Start Time Diff.       ec,start − ep,start
                        Ep End Time Diff.          ec,end − ep,end
                        Has Matching Source        1 if ec,src ∩ ep,src else 0
                        Has Matching Target        1 if ec,tgt ∩ ep,tgt else 0
                        Matching Source Ratio      Ratio of matching source IPs
                 IP
                        Matching Target Ratio      Ratio of matching target IPs
                        Crit. Source as Target     1 if ec,src ∩ ep,tgt else 0
                        Crit. Target as Source     1 if ec,tgt ∩ ep,src else 0
                        Critical Ep. AIS           1-hot encoded ec,AIS
                        Prior Ep. AIS              1-hot encoded ep,AIS
               Action   Has Matching Sigs.         1 if ec,sig ∩ ep,sig else 0
                        Matched Sig. Ratio         Ratio of matching signatures
                        Matching Dest. Port        1 if ec,port ∩ ep,port else 0

   Our hypothesis is that these network agnostic features will allow us to uncover a variety


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                        S. Moskal et. al.


of attack campaigns without detailed network topology or system vulnerabilities. We propose
that these network agnostic features can be used to predict the AEH for other attack types and
be applied to other networks. In the next section we describe our methodology for creating
the AEH Generator and how we leverage a small amount of labeled AEH values to determine
HeATed Attack Campaigns (HAC).

3.4     Alert Episode Heat Generator
When defining the concept of the AEH generator significant challenges arise as labeled data
describing an attack campaign with respect to IDS alerts generated is few and far between.
Data sets that do exist within the research community are typically either outdated (irrelevant
attack types), unlabeled, and/or represented in a different domain (i.e., packet captures) than
IDS alerts. Instead we have the user conduct an initial “triage" of their IDS alerts, label AEH
values to episodes related to a known IoC, and then use the network-agnostic features to create
a predictive model to “generate" heat given other IoC’s.

3.4.1    HeATing Episodes to Develop HAC
Given an AEH-labeled data set, we train a AEH predictive model known as the AEH generator
as h(ep |ec ) = f (ep , ec ) where {h ∈ R|0 ≤ h ≤ 3}. We define a HAC for given a critical episode
ec as the set of all prior episodes ep ∈ E where the AEH generator applies a non-zero AEH.
Our requirements for a selecting a machine learning model for this application are bound by our
non-linear features and that the AEH value must be a continuous value. HeAT is implemented
in Python and our AEH generator leverages Fast.AI’s Tabular learner [6] to predict the AEH.
All features within our data are standardized to have a zero mean and unit variance and we
report our 5-fold cross-validated mean squared error (MSE) for the training data.
    The process of extracting the HAC from the our data is similar to our training process where
a critical alert is given by the user, HeAT finds the corresponding episode containing the critical
alert, and then the AEH generator is used to “HeAT" all prior episodes with respect to the
critical episode. We apply HeAT to all prior episodes to give our model the opportunity to
discover episodes that may have significantly contributed to the attack campaign that may not
immediately obvious. The set of HeATed episodes with a non-zero AEH are then considered to
be apart of the HeATed attack campaign of the critical alert. As the episodes may contain many
alerts, we foresee the generator finding small relations to the critical episode and apply a small
amount of HeAT to episodes that may not contribute much to the overall campaign; a minimum
AEH threshold can be applied if the user desires. However, we expect truly impactful episodes
to have significantly higher AEH levels than those with just a few similarities between features.


4     Datasets for Training and Testing
To demonstrate HeAT’s ability to uncover insightful attack campaigns within IDS alerts, we
choose to use data that is known to have actual examples adversarial behavior. In this work
we use publicly available data from penetration competitions such as CPTC ‘18 [21] and CCDC
’18 [9] as it contains the impacts of multiple different adversaries in a controlled and isolated
network scenario. Competition sets like this give us the opportunity to use HeAT to discover
minute differences between attacker strategies with the same critical alert. We set-aside the
alerts of one of the 10 teams as “team train" to create our initial triage data and compare the
results of HeAT against the remaining teams as "team test." The Suricata IDS alerts from the
CPTC ’18 event are publicly accessible from [19] and we leverage the Suricata-to-AIS mapping
provided by Moskal et al. [13] to gain a better representation the attack stage of each alert
than what is provided from only Suricata. CCDC data is only available as packet captures from
[24] and we convert the PCAP into Suricata alerts using the PCAP mode of Suricata using

    Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                       S. Moskal et. al.


the default Suricata rule set. Table 4 summarizes the characteristics of each of these datasets
demonstrating the difference in the number of sources and targets, the large disparity in alert
volume, and reduction of alerts when represented as alert episodes.

Table 4: Characteristics of the alerts contained within the CPTC and CCDC data including the
total count of alert episodes.

                                  Unique     Unique    Unique      Total       Total
                                  Sources    Targets    Sigs       Alerts    Episodes
               CCDC               485        2903      348       8,738,994   2852
          CPTC (All Teams)        45         81        265       169,448     3200
          CPTC “Team Train"       29         49        171       53,362      529

    To create the training data for the AEH generator, we ask the user to perform a short initial
triage of the prior episodes with respect to a critical IoC and apply AEH values representing
their opinion of the attackers progress contributing towards the IoC. For each prior episode to
the IoC, we compare attributes between the two episodes such as: time difference, IP address
similarities , AIS, and signatures to derive an appropriate AEH given our experience of the
network, data characteristics, and our own security knowledge. In practice we recognize that
analysts could be faced with an excessive amount of data to be labeled. However in this case, we
assume our network agnostic feature set can lessen the reliance on massive amounts of labeled
data typically needed for machine learning applications along with our intuitive AEH definition
to create a more efficient labeling process.
    To test this assumption, we use the CPTC team set aside, “Team Train", and use those
episodes to create our initial AEH Generator training data. We selected three critical IoC’s to
initially manually triage: 1) “ETPRO ATTACK_RESPONSE MongoDB Database Enumeration
Request", 2) “ET EXPLOIT Possible ETERNALBLUE MS17-010 Heap Spray", and 3) “GPL
EXPLOIT CodeRed v2 root.exe access." These signatures describe data exfiltration, arbitrary
code execution, and root privilege escalation actions respectively and can lead to significant
access and impact if successful. These were selected for the purposes of our intended case
studies where: CodeRed was observed in both CPTC and CCDC; Mongo is exclusive to CPTC;
and Eternal Blue is exclusive to Team Train. We then manually apply AEH to prior episodes
to the critical IoC’s as shown in Table 5 with the distribution of AEH values given in Table 6.

Table 5: Count of alerts and episodes containing each IoC for CPTC “Team Train" and the
entire CCDC dataset along with the distribution of episodes with AEH manually applied.

                   IoC          # IoC Alerts    # IoC Episodes     # Prior Eps.
                Signature      Train CCDC       Train CCDC         AEH Applied
               Mongo            125      0       69       0            756
               Eternal Blue      4       0        4       0            231
               CodeRed          146     27       63       4            436


Table 6: Distribution of Team Train’s AEH values of prior episodes given the IoC’s from Table
5

                                 Training AEH     AEH Count
                                       0             720
                                       1             202
                                       2             154
                                       3             347


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                        S. Moskal et. al.


    This training data relating two episodes together through the AEH value given is then
converted into our feature space and used to create the AEH Generator to be used to reveal
future attack campaigns with a small portion of the original alert set. As we show in the next
section, our HeAT process is able to extract key characteristics of the attack campaigns given
the limited amount of labeled data. While we show that the capabilities with just this amount
of data is impressive, we plan on conducting another study in the future with multiple analysts
and across other scenarios to assess how the quality of the initial triage affects the resulting
HAC.


5     HeATed Attack Campaign (HAC) Analysis Using AEH
Demonstrating the effectiveness of a process like HeAT is extremely challenging due to the lack
of standardized training sets. Instead we take an alternate approach by performing a set of
case studies that mimics the type of analysis we would expect to see if HeAT was deployed in
a real world scenario. First we perform standard ML techniques to assess our performance of
predicting AEH values given the training data described prior. Performing 5-fold cross validation
and using mean-squared error as our performance metric, we report average MSE across the 5
folds as .175 with a maximum error of .335. This is promising as it seems that our model can
capture the AEH values of our training data appropriately and can be applied to other IoC’s.
    Leveraging this model to predict the HeAT of other attack campaigns, we apply HeAT to the
remaining CPTC teams, “Team Test", using Mongo and CodeRed as our IoC’s. We measure our
reduction of relevant events by comparing the number of HeATed episodes to the total number
of “related" alerts and episodes. A related alert (or episode) in this context is any alert that
shares an IP address (source or destination) with the critical IoC. For the two IoC’s there was
16 HAC’s generated and the summary of those HAC’s are given in Table 7.

Table 7: Summary of the HAC’s generated for Team Test given the latest in time Mongo and
CodeRed IoC’s.

                      “Related"   “Related"   HeATed     HeATed         Reduction
                        Alerts     Episodes    Alerts    Episodes      of Episodes
            Average     4,228       196.89    668.33      22.28     174.62 (-88.7%)
            Min          73           83        150         5
            Max         32741        389       4470        149

    From the perspective of a SOC analyst, this 88% reduction of events needed to be triaged
may lead to more timely response to threats as less time will be spent going through irrelevant
alerts. However this will only be true if the HeATed episodes realized by HeAT are an accurate
representation of the attack campaign conducted by the adversary. We now conduct a set of
case studies to assess the fidelity of the HAC’s generated by HeAT.

5.1     Case Study: Tactics towards CodeRed
Our CPTC dataset gives us the unique opportunity to investigate differences in attack strategies
of different adversaries given the same objective, or in this case the same critical alert. We
propose the scenario where a user has identified this critical alert in the past, gone through the
HeAT training process for this critical alert, and then re-observes the same critical alert in the
future. Although the impact of the vulnerability is the same, we expect the approach taken
could be different and thus a different mitigation strategy would be required. We use apply
HeAT to different adversaries to demonstrate how HeAT is used to gain additional insight into
attack strategies. We find through this case study that HeAT can reveal attacker strategies of
adversarial behavior unknown to HeAT.

    Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                        S. Moskal et. al.


    For this case study, we used the alert “GPL EXPLOIT CodeRed v2 root.exe access" as it was
prevalent across many adversaries and it leads to a significant amount of access if successful.
To demonstrate the diversity of the HAC’s that can be extracted, we begin by comparing two
particularly interesting HAC’s who’s approaches were vastly different from each other. Figure
3a shows the resulting HAC for a “calculated" adversary, where we have discovered four key
steps that leads to the CodeRed signature out of a total of 19 HeATed episodes. Whereas the
HAC for a “script kiddie", shown in Figure 3b, demonstrates a significantly different approach
where we show a HAC consisting of 144 episodes to achieve the same objective. Due to the
complexities of the script kiddie HAC, our annotations are as follows: 1) Rapid POP3 & IMAP
attempts, 2) CVE-2014-6271 ’Shellshock’, 3) CodeRed First Attempt, and 4) CodeRed on other
targets, same source. The HeATed episodes of the calculated and script kiddie represents 1458
and 29,015 individual alerts respectively, thus our HAC’s are a substantial reduction of events
an analyst needs to assess.


              (a) “Calculated" HAC                           (b) “Script Kiddie" HAC

Figure 3: HAC’s of two separate CPTC adversaries given the CodeRed IoC. Each circle is an
episode where the size corresponds to the number of alerts within the episode. Y-axis represents
AIS and color corresponds to the AEH value of the episode (red = high AEH)

    Our rationale for distinguishing the “calculated" adversary versus the “script kiddie" is from
our analysis of the HeATed episodes. The “calculated" episodes had a single target through-
out this HAC and significant time between episodes. Whereas the script kiddie maintained a
consistent time between actions and targeted many different IP’s with the same action which is
indicative of a scripted process. Without HeAT the count of all episodes occurring within these
time-frames for the calculated and script kiddie adversary was 198 and 498 episodes, HeAT
provides a reduction of the number of episodes to be considered by 92% and 71% respectively.
By investigating the alerts within the smaller amount of HeATed episodes, we were able to gain
insight into the individual campaigns as shown by our annotations.
    Due to our alert episodes representing groups of alerts occurring within a similar time frame
and AIS, we find that interfacing with the individual alerts to be much more manageable com-
pared to the raw alerts. Here we found indications that vulnerabilities within a mail server is
likely be targeted when CodeRed is seen as both of our adversaries have early episodes targeting
POP and IMAP. By looking further into the HAC we see that the mail server maybe initial access
and both adversaries used other actions such as the ColdFusion vulnerability and the Shellshock
attack to further position themselves within the targeted asset. While other evidence is needed
to definitively prove these observations, we believe the resultant HAC’s provides substantial
evidence for an SOC analyst to escalate investigations to the next tier of analysis. Given that
a network may produce seemingly endless volume of alerts, the realization of attack campaigns
becomes overwhelming to conduct manually and we believe HeAT alleviates the strain so that

  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                        S. Moskal et. al.


prevention measures acted on quicker.

5.2   Case Study: HeATing a Different Network
We now ask, can HeATed episodes from one network explain an attack campaign on another
network? For this case study we use the CCDC dataset, which was conducted over a shorter
time period, contains orders of magnitude more alerts, and we have little to no knowledge about
the real network topology or the adversaries themselves. We find this to be a realistic use-case
where a user may use HeAT from another network to gain insight into new network that they
may be unfamiliar with. Using the exact same AEH generator, CodeRed as our critical alert,
and directly applying HeAT to the CCDC dataset with no extra configuration or training, we
present the HAC for this case study in Figure 4


Figure 4: The HAC of CodeRed on the CCDC set using the AEH Generator trained on CPTC
data

    While visually the HAC’s of the CPTC and CCDC do not seem to be similar, upon deeper
inspection into the alerts contained within the episodes, we do find some striking similarities
between the HAC’s. Initially, we see step 1 making the initial discovery of the critical asset and
step 2 describes a Peer-to-Peer connection made to a SMB network share. The corresponding
signatures within this episode report that data was transferred to and from this asset. This
is a substantial discovery as it indicates that the SMB share is vulnerable which enabled the
adversary to deliver malware as shown in Step 3. Steps 3-5 in Figure 4 are interestingly similar
episodes that we have seen previously including references to POP3 & IMAP brute force, Shell-
shock, and the SMTP verification of root access. Without HeAT we believe that finding such
campaign similarities would be much more difficult as the CCDC data contained two orders of
magnitude more alerts. However our HAC does not tell the whole store and further research
would be needed to determine if these alerts were caused by similar tools used by the adversaries
or purely coincidental.
    The capabilities of HeAT to reveal convincing attack campaigns across multiple networks
seems to be impressive and we demonstrate that in some cases such as CodeRed the campaigns
are similar regardless of the network. These observations indicate that prior observations of
attack campaigns can be generalized using HeAT’s network-agnostic features and can be used to
find similar strategies in other networks. This sort of capability could provide immense value to
analysts hunting threats as HeAT could be applied regardless of the network context while still
providing high quality automated triages. In our future work, we investigate the validity of this
observation through a comprehensive study of many more critical events across more networks.
Given our observations with our case studies however, we have confidence in HeAT’s ability to


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                        S. Moskal et. al.


provide users the first round of insight into attack campaigns occurring in their network with
simply a single observed alert and a straightforward training process.


6     Limitations
Throughout our research of HeAT we have identified several limitations, some technical and
others are methodological limitations. First for technical limitations, we only evaluate HeAT
as a post-mortem analysis tool and do not initially intend for HeAT to be used in real-time
applications. Also the HeAT process only considers network-based IDS Suricata alerts which
does not have the ability to resolve actions that do not produce network traffic nor does it
handle encrypted traffic well. For extended detail in HAC’s we would consider the addition of
other security monitors such as the host-based IDS Zeek or ther vulnerability scanner Nessus to
further enhance HeAT.
    Our methodological limitations stems from our training process: i.e., the data used, the
quality of the analyst’s AEH assessments, and the verification of the produced HAC’s. HeAT
relies on the analyst to provide accurate AEH values but you may ask: 1) What if the AEH
values given are incorrect?, 2) What if you have multiple analysts?, or 3) What if the data is a
bad representation of attack campaigns? Each of these questions could have detrimental impact
to the utility of HeAT, especially in a high-stakes field such as network security. We propose to
conduct human studies of different analysts under controlled scenarios to answer these questions.
A future iteration of this work will conduct significantly more case studies of attacker behaviors,
provide more statistical analysis of the completeness/utilities of the HAC’s, and be applied to
a real SOC operation to determine the true real-world capabilities of HeAT. In this case we
have the luxury of knowing our data well and certainly is abundant with adversarial behavior,
something that cannot always be said about real-world data.


7     Conclusion
The case studies presented in this paper demonstrated how an analyst could use HeAT to
identify attack campaigns leading to a critical IoC with minimal expert inputs. Our HeATed
Attack Campaigns represents the attack stages in time related to a given critical IoC as set
of Alert Episodes with associated Action-Intent Stage. Defined as the “Alert Episode Heat"
(AEH), the AEH captured the analysts reflection of an episode’s contribution to a specified
attack campaign. We found that our implementation of HeAT with network agnostic feature
descriptions was able to extract attack campaigns from alerts novel to HeAT, show the difference
between diverse attacker strategies, and be applied to multiple networks to perform a HeATed
triage. We demonstrated through analysis of our cyber-competition datasets that HeAT can
quickly show the difference between attacker strategies with the same objective and do so in
a concise alert episode representation without ever interfacing with alerts directly. We also
demonstrate HeAT’s ability to maintain the important attack campaign characteristics defined
by the analyst for multiple networks; where we captured non-coincidental similarities in attacker
strategies between our CPTC and CCDC datasets.
    For the foreseeable future, triaging networks to assess attack campaigns is only going to get
more difficult and time consuming to the point that manual triaging becomes infeasible. As
seen in the related works within the private sector, we believe that automated triaging with
AI/ML will be a necessary component to all SoC operations. We believe that HeAT is a viable
solution that requires minimal expertise and analyst’ effort, and can inspire more research into
AI-based automated triage. We are in the process of making HeAT available and open sourced.
Aside from the future works proposed within our case studies, we plan for HeAT to become
integrated as a plug-in with Splunk or other SIEMs. We also plan to extend HeAT to account
for a variety of intrusion alerts and event logs. Particularly, we consider integrating with Zeek

    Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                       S. Moskal et. al.


logs and phishing email detection with HeAT to broaden the insights and address more complex
attack types.


References
 [1] Alchemer.   Number of choices in survey questions: How much is too much?
     https://www.alchemer.com/resources/blog/survey-questions-
     how-much-is-too-much/, 2011. [Online; accessed 21-June-2021].

 [2] F. M. Alserhani. Alert correlation and aggregation techniques for reduction of security
     alerts and detection of multistage attack. International Journal of Advanced Studies in
     Computers, Science and Engineering, 5(2):1, 2016.

 [3] Y. Chen, Z. Liu, Y. Liu, and C. Dong. Distributed attack modeling approach based on
     process mining and graph segmentation. Entropy, 22(9):1026, 2020.

 [4] Darktrace.          Darktrace cyber ai analyst:    Autonomous investigations.
     https://www.darktrace.com/en/resources/wp-cyber-ai-analyst.pdf, 2021.   [On-
     line; accessed 21-June-2021].

 [5] S. C. De Alvarenga, S. Barbon Jr, R. S. Miani, M. Cukier, and B. B. Zarpelão. Process min-
     ing and hierarchical clustering to help intrusion alert visualization. Computers & Security,
     73:474–491, 2018.

 [6] Fast.AI. Fastai python library v1.0.57. http://docs.fast.ai/, 2020. [Online; accessed
     28-April-2020].

 [7] O. B. Fredj. A realistic graph-based alert correlation system. Security and Communication
     Networks, 8(15):2477–2493, 2015.

 [8] I. Ghafir, K. G. Kyriakopoulos, S. Lambotharan, F. J. Aparicio-Navarro, B. AsSadhan,
     H. BinSalleeh, and D. M. Diab. Hidden markov models and alert correlations for the
     prediction of advanced persistent threats. IEEE Access, 7:99508–99520, 2019.

 [9] F. Hassanabad. suricata-sample-data. https://github.com/FrankHassanabad/suricata-
     sample-data/blob/master/README.md, 2019. [Online; accessed 5-May-2020].

[10] IBM. Ibm qradar advison with watson. https://www.ibm.com/products/
     cognitive-security-analytics, 2021. [Online; accessed 21-June-2021].

[11] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner, and A. Rauber. A framework
     for cyber threat intelligence extraction from raw log data. In Proceedings of 2019 IEEE
     International Conference on Big Data (Big Data), pages 3200–3209. IEEE, 2019.

[12] L.   Martin.           Cyber      kill   chain     |    lockheed        martin     security.
     http://cyber.lockheedmartin.com/solutions/
     cyber-kill-chain, 2016. [Online; accessed 11-April-2016].

[13] S. Moskal and S. J. Yang. Framework to describe intentions of a cyber attack action. arXiv
     preprint arXiv:2002.07838, 2020.

[14] S. Moskal, S. J. Yang, and M. E. Kuhl. Extracting and evaluating similar and unique
     cyber attack strategies from intrusion alerts. In Proceedings of 2018 IEEE International
     Conference on Intelligence and Security Informatics (ISI), pages 49–54. IEEE, 2018.


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021
HeAT: Heated Alert Triage                                                     S. Moskal et. al.


[15] A. Nadeem, S. Verwer, S. Moskal, and S. J. Yang. Alert-driven attack graph generation
     using s-pdfa. Accepted to Appear in 2021 ACM KDD Workshop on Artificial Intelligence-
     enabled Cybersecurity Analytics (AI4Cyber), 2021.

[16] J. Navarro, V. Legrand, S. Lagraa, J. François, A. Lahmadi, G. De Santis, O. Festor,
     N. Lammari, F. Hamdi, A. Deruyver, et al. Huma: A multi-layer framework for threat
     analysis in a heterogeneous log environment. In International Symposium on Foundations
     and Practice of Security, pages 144–159. Springer, 2017.

[17] J. Navarro, V. Legrand, A. Deruyver, and P. Parrend. Omma: open architecture for
     operator-guided monitoring of multi-step attacks. EURASIP Journal on Information Se-
     curity, 2018(1):1–25, 2018.

[18] C. Networks. Ai-analyst. Centrcentripetalnetworks.com/hubfs/Data%20Sheets
     /CI_AI_Analyst_Brief.pdf, 2018. [Online; accessed 21-June-2021].

[19] R. I. of Technology. Index of cptc2018. http://mirror.rit.edu/cptc/2018/, 2021. [On-
     line; accessed 21-June-2021].

[20] X. Qin and W. Lee. Discovering novel attack strategies from infosec alerts. In Proceedings
     of 2004 European Symposium on Research in Computer Security, pages 439–456. Springer,
     2004.

[21] Rochester Institute of Technology.     Collegiate penetration testing competition.
     hhttp://nationalcptc.org, 2018. [Online; accessed 19-July-2018].

[22] C.-H. Wang and Y.-C. Chiou. Alert correlation system with automatic extraction of at-
     tack strategies by using dynamic feature weights. International Journal of Computer and
     Communication Engineering, 5(1):1, 2016.

[23] L. Wang, A. Liu, and S. Jajodia. Using attack graphs for correlating, hypothesizing, and
     predicting intrusion alerts. Computer communications, 29(15):2917–2933, 2006.

[24] WRCCDC. Wrccdc public archive. https://archive.wrccdc.org/, 2021. [Online; ac-
     cessed 21-June-2021].

[25] B. Zhu and A. A. Ghorbani. Alert correlation for extracting attack strategies. IJ Network
     Security, 3(3):244–258, 2006.


  Proceedings of the Conference on Applied Machine Learning for Information Security, 2021