<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Diagnosing Advanced Persistent Threats: A Position Paper</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rui Abreu</string-name>
          <email>rui@parc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danny Bobrow</string-name>
          <email>bobrow@parc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hoda Eldardiry</string-name>
          <email>hoda.eldardiry@parc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Feldman</string-name>
          <email>afeldman@parc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dave Archer and David Burke Galois, Inc.</institution>
          <addr-line>421 SW 6th Avenue, Suite 300 Portland, OR 97204</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>John Hanley and Tomonori Honda and Johan de Kleer and Alexandre Perez Palo Alto Research Center 3333</institution>
          <addr-line>Coyote Hill Rd Palo Alto, CA 94304</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>193</fpage>
      <lpage>200</lpage>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>When a computer system is hacked,
analyzing the root-cause (for example entry-point
of penetration) is a diagnostic process. An
audit trail, as defined in the National
Information Assurance Glossary, is a
securityrelevant chronological (set of) record(s),
and/or destination and source of records that
provide evidence of the sequence of
activities that have affected, at any time, a specific
operation, procedure, or event. After
detecting an intrusion, system administrators
manually analyze audit trails to both isolate the
root-cause and perform damage impact
assessment of the attack. Due to the sheer
volume of information and low-level activities
in the audit trails, this task is rather
cumbersome and time intensive. In this
position paper, we discuss our ideas to automate
the analysis of audit trails using machine
learning and model-based reasoning
techniques. Our approach classifies audit trails
into the high-level activities they represent,
and then reasons about those activities and
their threat potential in real-time and
forensically. We argue that, by using the outcome
of this reasoning to explain complex
evidence of malicious behavior, we are
equipping system administrators with the proper
tools to promptly react to, stop, and mitigate
attacks.
Today, enterprise system and network behaviors are
typically “opaque”: stakeholders lack the ability to
assert causal linkages in running code, except in very
simple cases. At best, event logs and audit trails can
offer some partial information on temporally and
spatially localized events as seen from the viewpoint of
individual applications. Thus current techniques give
operators little system-wide situational awareness, nor
any viewpoint informed by a long-term perspective.
Adversaries have taken advantage of this opacity by
adopting a strategy of persistent, low-observability
operation from inside the system, hiding effectively
through the use of long causal chains of system and
application code. We call such adversaries advanced
persistent threats, or APTs.</p>
      <p>To address current limitations, this position
paper discusses a technique that aims to track
causality across the enterprise and over extended periods of
time, identify subtle causal chains that represent
malicious behavior, localize the code at the roots of such
behavior, trace the effects of other malicious actions
descended from those roots, and make
recommendations on how to mitigate those effects. By doing so,
the proposed approach aims to enable stakeholders to
understand and manage the activities going on in their
networks. The technique exploits both current and
novel forms of local causality to construct higher-level
observations, long-term causality in system
information flow. We propose to use a machine learning
approach to classify segments of low-level events by the
activities they represent, and reasons over these
activities, prioritizing candidate activities for
investigation. The diagnostic engine investigates these
candidates looking for patterns that may represent the
presence of APTs. Using pre-defined security policies and
related mitigations, the approach explains discovered
APTs and recommends appropriate mitigations to
operators. We plan to leverage models of APT and
normal business logic behavior to diagnose such threats.
Note that the technique is not constrained by
availability of human analysts, but can benefit by
human-onthe-loop assistance.</p>
      <p>The approach discussed in the paper will offer
unprecedented capability for observation of long-term,
subtle system-wide activity by automatically
constructing such global, long-term causality
observations. The ability to automatically classify causal
chains of events in terms of abstractions such as
activities, will provide operators with a unique
capability to orient to long-term, system-wide evidence of
possible threats. The diagnostic engine will provide
a unique capability to identify whether groups of such
activities likely represent active threats, making it
easier for operators to decide whether long-term threats
are active, and where they originate, even before those
threats are identified by other means. Thus, the
approach will pave the way for the first automated,
longhorizon, continuously operating system-wide support
for an effective defender Observe, Orient, Decide, and
Act (OODA) loop.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Running Example</title>
      <p>The methods proposed in this article are illustrated on
a realistic running example. The attackers in this
example use sophisticated and recently discovered
exploits to gain access to the victim’s resources. The
attack is remote and does not require social engineering
or opening a malicious email attachment. The
methods that we propose, however, are not limited to this
class of attacks.
router
hacker</p>
      <p>Internet
victim’s local network
web server
front-end
data storage
back end</p>
      <p>system
administrator</p>
      <p>The network topology used for our running example
is shown in figure 1. The attack is executed over
several days. It starts by (1) compromising the web server
front-end, followed by (2) a reconnaissance phase and
(3) compromising the data storage back end and
ultimately extracting and modifying sensitive information
belonging to the victim.</p>
      <p>Both the front-end and the back end in this example
run unpatched UBUNTU 13.1 LINUX OS on an
INTEL R SANDY BRIDGETM architecture.</p>
      <p>What follows is a detailed chronology of the events:
1. The attacker uses the APACHE httpd server, a
cgi-bin script, and the SHELLSHOCK
vulnerability (GNU bash exploit registered in the
Common Vulnerabilities and Exposures database as
CVE 2014-6271 (see https://nvd.nist.
gov/) to gain remote shell access to the victim’s
front-end. It is now possible for the attacker to
execute processes on the front-end as the
nonprivileged user www-data.
2. The attacker notices that the front-end is
running an unpatched UBUNTU LINUX OS version
13.1. The attacker uses the nc Linux utility to
copy an exploit for obtaining root privileges. The
particular exploit that the attacker uses utilizes
the x32 recvmmsg() kernel vulnerability
registered in the Common Vulnerabilities and
Exposures (CVE) database as CVE 2014-0038. After
running the copied binary for a few minutes the
attacker gains root access to the front-end host.
3. The attacker installs a root-kit utility that
intercepts all input to ssh;
4. A system administrator uses the compromised
ssh to connect to the back-end revealing his
backend password to the attacker;
5. The attacker uses the compromised front-end to
bypass firewalls and uses the newly acquired
back-end administrator’s password to access the
back-end;
6. The attacker uses a file-tree traversing utility on
the back-end that collects sensitive data and
consolidates it in an archive file;
7. The attacker sends the archive file to a third-party
hijacked computer for analysis.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Auditing and Instrumentation</title>
      <p>Almost all computing systems of sufficiently
highlevel (with the exception of some embedded systems)
leave detailed logs of all system and application
activities. Many UNIX variants such as LINUX log via the
syslog daemon, while WINDOWSTM uses the event
log service. In addition to the usual logging
mechanisms, there is a multitude of projects related to
secure and detailed auditing. An audit log is more
detailed trail of any security or computation-related
activity such as file or RAM access, system calls, etc.</p>
      <p>Depending on the level of security we would like
to provide, there are several methods for collecting
input security-related information. On one extreme, it is
possible to use the existing log files. On the other
extreme there are applications for collecting detailed
information about the application execution. One such
approach [1] runs the processes of interest through a
debugger and logs every memory read and write
access.</p>
      <p>It is also possible to automatically inject logging
calls in the source files before compiling them,
allowing us to have static or dynamic logging or a
combination of the two. Log and audit information can be
signed, encrypted and sent in real-time to a remote
server to make system tampering and activity-hiding
more difficult. All these configuration decisions
impose different trade-offs in security versus
computational and RAM load [2] and depend on the
organizational context.
front_end.secure_access_log:11.239.64.213 - [22/Apr/2014
06:30:24 +0200] "GET /cgi-bin/test.cgi HTTP/1.1" 401 381
front_end.rsyslogd.log:recvmsg(3, msg_name(0) =
NULL, msg_iov(1) = ["29/Apr/2014 22:15:49 ...", 8096],
msg_controllen = 0, msg_flags = MSG_CTRUNC,
MSG_DONTWAIT) = 29
back_end:auditctl:type = SYSCALL msg =
audit(1310392408.506:36): arch = c000003e syscall = 2
success = yes exit = 3 a0 = 7fff2ce9471d a1 = 0 a2 = 61f768
a3 = 7fff2ce92a20 items = 1 ppid = 20478 pid = 21013 auid
= 1000 uid = 0 gid = 0 euid = 0 suid = 0 fsuid = 0 egid = 0
sgid = 0 fsgid = 0 ses = 1 comm = "grep" exe = "/bin/grep"
.
.
.
abstractions may be composed of lower-order
abstractions that are in turn abstractions of low-level events.
For example, a sequential set of logged events such as
‘browser forking bash’, ‘bash initiating Netcat’, and
‘Netcat listening to new port’, might be abstracted as
the activity ‘remote shell access’. The set of activities,
‘remote shell access’, and ‘escalation of privilege’ can
be abstracted as the activity ‘remote root shell access’.</p>
      <p>We approach activity annotation as a supervised
learning problem that uses classification techniques to
generate activity tags for audit trails. Table 1 shows
multiple levels of activity classifications for the above
APT example.</p>
      <p>To obtain several layers of abstraction for
reasoning over, and thus reduce overall error in
classification, we use a multi-level learning strategy that models
information at multiple levels of semantic abstraction
using multiple classifiers. Each classifier solves the
problem at one abstraction level, by mapping from a
lower-level (fine) feature space to the next higher-level
conceptual (coarse) feature space.</p>
      <p>The activity classifier rely on both a vocabulary of
activities and a library of patterns describing these
activities that will be initially defined manually. This
vocabulary and pattern set reside in a Knowledge Base.</p>
      <p>In our training approach, results from training lower
level classifiers are used as training data for higher
level classifiers. In this way, we coherently train all
classifiers by preventing higher-level classifiers from
being trained with patterns that will never be
generated by their lower-level precursors. We use an
ensemble learning approach to achieve accurate
classification. This involves stacking together both bagged and
boosted models to reduce both variance and bias
error components [3]. The classification algorithm will
be trained using an online-learning technique and
integrated within an Active Learning Framework to
improve classification of atypical behaviors.</p>
      <p>Generating Training Data for Classification To
build the initial classifier, training data is generated
using two methods. First, an actual deployed
system is used to collect normal behavior data, and a
Subject Matter Expert manually labels it. Second,
a testing platform is used to generate data in a
controlled environment, particularly platform dependent
vulnerability-related behavior. In addition, to
generate new training data of previously unknown behavior,
we use an Active Learning framework as described in
Section 5.
As the Activity Classifier annotates audit trails with
activity descriptors, the two (parallel) next steps in our
workflow are to 1) prioritize potential threats to be
referred to the Diagnostic Engine (see Section 6) for
investigation, and 2) prioritize emergent activities that
(after suitable review and labeling) are added to the
activity classifier training data. This module prioritizes
activities by threat severity and confidence level . This
prioritization process presents three key challenges.
One challenge in ranking activities according to their
threat potential is the complex (and dynamic) notion of
what constitutes a threat. Rankings based on matching
to known prior threats is necessary, but not sufficient.
An ideal ranking approach should take known threats
into account, while also proactively considering the
unknown threat potential of new kinds of activities.
Another such challenge is that risk may be assessed
at various levels of activity abstraction, requiring that
overall ranking must be computed by aggregating risk
assessments at multiple abstraction levels.</p>
      <p>We implement two ranking approaches: a
supervised ranker based on previously known threats and an
unsupervised ranker that considers unknown potential
threats.</p>
      <p>Supervised ranking using APT classification to
catch known threats. The goal of APT
classification is to provide the diagnostic engine with critical
APT related information such as APT Phase, severity
of attack, and confidence level associated with APT
tagging for threat prioritization. Since the audit trails
are annotated hierarchically into different granularity
of actions, multiple classifiers will be built to consider
each hierarchical level separately. APT classifiers are
used to identify entities that are likely to be instances
of known threats or phases of an APT attack. Two
types of classifiers are used. The first classifier is
hand-coded and the second classifier is learned from
training data.</p>
      <p>The hand-coded classifier is designed to have high
precision, using hand-coded rules, mirroring SIEM
and IDS systems. Entities tagged by this classifier are
given the highest priority for investigation. The second
classifier, which is learned from training data, will
provide higher recall at the cost of precision. Activities
are ranked according to their threat level by
aggregating a severity measure (determined by classified threat
type) and a confidence measure. We complement the
initial set of training data to calibrate our classifiers by
using an Active Learning Framework, which focuses
on improving the classification algorithm through
occasional manual labeling of the most critical activities
in the audit trails.</p>
      <p>Unsupervised ranking using normalcy
characterization to catch unknown threats. The second
component of the prioritizer is a set of unsupervised
normalcy rankers, which rank entities based on their
statistical “normalcy". Activities identified as
unusual will be fed to the Active Learning framework
to check if any of them are “unknown” APT activities.
This provides a mechanism for detecting “unknown”
threats while also providing feedback to improve the
APT classifier.
5.2</p>
      <sec id="sec-3-1">
        <title>Combining Multiple Rankings</title>
        <p>
          One of the key issues with combining the outputs of
multiple risk ranking is dealing with two-dimensional
risk (severity, confidence) scores that may be on very
different scales. A diverse set of score normalization
techniques have been proposed [4; 5; 6] to deal with
this issue, but no single technique has been found to
be superior over all the others. An alternative to
combining scores is to combine rankings [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Although
converting scores to rankings does lose information, it
remains an open question if the loss in information is
compensated for by the convenience of working with
the common scale of rankings.
        </p>
        <p>
          We will develop combination techniques for
weighted risk rankings based on probabilistic rank
aggregation methods. This approach builds on our own
work [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] that shows the robustness of the weighted
ranking approach. We also build on principled
methods for combining ranking data found in the statistics
and information retrieval literature.
        </p>
        <p>Traditionally, the goal of rank aggregation [9; 10]
is to combine a set of rankings of the same
candidates into a single consensus ranking that is “better”
than the individual rankings. We extend the
traditional approach to accommodate the specific context
of weighted risk ranking. First, unreliable rankers will
be identified and either ignored or down-weighted,
lest their rankings decrease the quality of the
overall consensus [7; 10]. Second, we will discount
excessive correlation among rankers, so that a set of
4
highly redundant rankers do not completely outweigh
the contribution of other alternative rankings. To
address these two issues, we will associate a
probabilistic latent variable Zi with the i’th entity of interest,
which indicates whether the entity is anomalous or
normal. Then, we will build a probabilistic model
that allows us to infer the posterior distribution over
the Zi based on the observed rankings produced by
each of the input weighted risk rankings. This
posterior probability of Zi being normal will then be used
as the weighted risk rank. Our model will make the
following assumptions to account for both unreliable
and correlated rankers: 1) Anomalies are ranked lower
than all normal instances and these ranks tend to be
concentrated near the lower rankings of the provided
weighted risk rankings, and 2) Normal data instances
tend to be uniformly distributed near the higher
rankings of the weighted risk rankings.</p>
        <p>
          There are various ways to build a probabilistic
model that reflects the above assumptions and
allows for the inference of the Zi variables through
Expectation-Maximization [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In addition to these
assumptions, we will explore allowing other factors to
influence the latent Zi variables, such as features of
the entities as well as feedback provided by an expert
analysts.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Diagnosis</title>
      <p>
        We view the problem of detecting, isolating, and
explaining complex APT campaigns behavior from rich
activity data is a diagnostic problem. We will use
an AI-based diagnostic reasoning to guide the global
search for possible vulnerabilities that enabled the
breach. Model-based diagnosis (MBD) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is a
particularly compelling approach as it supports reasoning
over complex causal networks (for example, having
multiple conjunctions, disjunctions, and negations)
and identifies often subtle combinations of root causes
of the symptoms (the breach).
6.1
      </p>
      <sec id="sec-4-1">
        <title>An MBD approach for APT detection and isolation: Motivation</title>
        <p>
          Attack detection and isolation are two distinct
challenges. Often diagnostic approaches use separate
models for detection and isolation [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. MBD
however uses a single model, to combine these two
reasonings. The security model contains both part of
the security policy (that communicating with certain
blacklisted hosts may indicate an information leak)
and information about the possible locations and
consequence of a vulnerability (a privilege escalation may
lead to an information leak). The security model also
contains abstract security constraints such as if a
process requires authentication, a password must be read
and compared against.
        </p>
        <p>
          The diagnostic approach takes into consideration
the bootstrapping of an APT which we consider the
root-cause of the attack. What enables a successful
APT is either a combination of software component
5
vulnerabilities or the combined use of social
engineering and insufficiency of the organizational security
policies. We use MBD for computing the set of
simultaneously exploited vulnerabilities that allowed the
deployment of the APT. Computing such explanations
is possible because MBD reasons in terms of
multiplefaults [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In our running example this set would
include both the fact the the web server has been
exploited due to the Shellshock vulnerability and that a
the attacker gained privileged access on the front-end
due to the use of the X64_32 escalation vulnerability.
        </p>
        <p>
          The abstract security model is used to gather
information about types of attacks the system is vulnerable
to, and to aid deciding the set of actions required to
stop an APT campaign (policy enforcement). Various
heuristics exist to find the set of meaningful diagnosis
candidates. As an example, one might be interested
in the minimal set of actions to stop the attack [15;
16] or select those candidates that capture significant
probability mass [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In the rest of this section, for
illustration purposes, we use minimality as the
heuristic of interest. MBD is the right tool for dealing with
computation of diagnosis candidates as it offers
several ways to address the modeling and computational
complexity [18; 19].
6.2 Detection and Isolation of Attacks from
Abstract Security Model and Sensor
        </p>
        <p>Data
The abstract security model provides an abstraction
mechanism that is originally missing in the audit trails.
More precisely what is not in the audit trails and what
is in the security model is how to connect (possibly
disconnected) activities for the purpose of global
reasoning. The abstract security model and the sensor
data collected from the audit trails are provided as
inputs to an MBD algorithms that performs the
highlevel reasoning about possible vulnerabilities and
attacks similar to what a human security officer would
do.</p>
        <p>The information in the “raw” audit trails is of too
high fidelity [2] and low abstraction to be used by a
“crude” security model. That is the reason the
diagnostic engine needs the machine learning module to
temporally and spatially group nodes in the audit trails
and to provide semantically rich variable/value sensor
data about actions, suitable for MBD. Notice that in
this process, the audit trail structure is translated to
semantic categories, i.e., the diagnostic engine receives
as observations time-series of sensed actions.</p>
        <p>
          The listing that follows next shows an abstract
security model for the running example in the LYDIA
language [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. This bears some resemblance to
PROLOG, except that LYDIA is a language for
modelbased diagnosis of logical circuits while PROLOG is
for Horn-style reasoning. The use of LYDIA is for
illustration purposes only, in reality computer
systems can be much more easily modeled as state
machines. There is a significant body of literature
dealing with diagnosis of discrete-event systems [21; 22;
23], to name just a few.
if (! buffer_overflow_vuln ) { // if healthy
        </p>
        <p>!leak_passwd; // passwords don’t leak
if (! escalation_vuln ) { // if healthy</p>
        <p>! root_shell ; // no root shell is possible
bool access_passwd;
attribute observable (access_passwd) = true ;
bool httpd_shell_vuln ; // vulnerability
bool buffer_overflow_vuln ; // vulnerability
bool escalation_vuln ; // vulnerability
// weak−fault models
if (! httpd_shell_vuln ) { // if healthy</p>
        <p>! httpd_shell ; // forbid shells via httpd
/∗∗
∗ Normal users can only communicate with a
∗ list of permitted hosts .
∗/
if (! know_root_password) {</p>
        <p>comm == true;
/∗∗
∗ Knowing the root password can be explained
∗ by a root shell ( for example there is a
∗ password sniffer ).
∗/
know_root_password =&gt;
(( httpd_shell || leak_passwd) &amp;&amp; root_shell );
system front_end fe (know_root_password);
system back_end be(know_root_password);</p>
        <p>
          LYDIA translates the model to an internal
propositional logic formula. Part of this internal
representation is shown in figure 3, which uses the standard VLSI
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] notation to denote AND-gates, OR-gates, and
NOT-gates. Wires are labeled with variable names.
Boolean circuits (matching propositional logic),
however, have limited expressiveness and modeling
secu6
        </p>
        <p>httpd shell
leak pw1
buffer overflow vuln1
know root password
root shell
2p
2q
2r</p>
        <p>
          &gt;
Legend:
1assumable variable
2internal variable
rity constraints in it is notoriously difficult, hence we
plan to create or use specialized modal logic similar to
the one proposed in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
        <p>
          Notice that the format of the Boolean circuit shown
in figure 3 is very close to the one used in Truth
Maintenance System (TMS) [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. The only assumable
variable in figure 3 is buffer_overflow_vuln and its
default value is false (i.e., there is no buffer overflow
vulnerability in the web server process).
        </p>
        <p>We next show how a reasoning engine can discover
a conflict through forward and backward propagation.
Looking at figure 3, it is clear that r must be true
because it is an input to an AND-gate whose output is set
to true. Therefore either p or q (or both) must be true.
This means that either buffer_overflow_vuln or
leak_pw must be false. If we say that leak_pw is
assumed to be true (measured or otherwise inferred),
then leak_pw and buffer_overflow_vuln are
together part of a conflict. It means that the reasoning
engine has to change one of them to resolve the
contradiction.</p>
        <p>Based on the observation from our running
example and a TMS constructed from the security model
shown in figure 3, the hitting set algorithm computes
two possible diagnostic hypotheses: (1) the attacker
gained a shell access through a web-server
vulnerability and the attacker performed privilege escalation or
(2) the attacker injected binary code through a buffer
overflow and the attacker performed privilege
escalation.</p>
        <p>If we use LYDIA to compute the set of diagnoses for
the running example, we get the following two
(ambiguous) diagnoses for the root-cause of the
penetration:
$ lydia example.lm example.obs
d1 = { fe.escalation_vuln,</p>
        <p>fe.httpd_shell_vuln }
d2 = { fe.buffer_overflow_vuln,
fe.escalation_vuln }</p>
        <p>MBD uses probabilities to computes a sequence of
possible diagnoses ordered by likelihood. This
probability can be used for many purposes: decide which
diagnosis is more likely to be the true fault explanation,
whether there is the need for consider further evidence
from the logs or limit the number of diagnoses that
need to be identified. Many policies exist to compute
these probabilities [27; 28].</p>
        <p>For illustration purposes we consider that the
diagnoses for the running example are ambiguous. Before
we discuss methods for dealing with this ambiguity,
we address the major research challenge of model
generation.
6.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Model Generation</title>
        <p>The abstract vulnerability model can either be
constructed manually or semi-automatically. The
challenge with modeling is that an APT campaign
generally exploits unknown vulnerabilities. Therefore, our
approach to address this issue is to construct the model
which captures expected behavior (known goods) of
the system. Starting from generic parameterized
vulnerability models and security objectives, the abstract
vulnerability model can be extended with information
related to known vulnerabilities (known bads).</p>
        <p>Generating the model can be done either
manually or semi-automatically. We will explore venues to
generate this model manually, which requires
significant knowledge about potential security
vulnerabilities, while being error prone and not detailed enough.
Amongst company specific requirements, we envisage
the abstract vulnerability model to capture the most
common attacks that target software systems, as
described in the Common Attack Pattern Enumeration
and Classification (CAPEC1). The comprehensive list
of known attacks has been designed to better
understand the perspective of an attacker exploiting the
vulnerabilities and, from this knowledge, devise
appropriate defenses.</p>
        <p>As modeling is challenging, we propose to explore
semi-automatic approaches to construct models. The
semi-automatic method is suitable to addressing the
modeling because in security, similarly to
diagnosis, there is (1) component models and (2) structure.
While it is difficult to automate the building of
component models (this may even require natural language
parsing of databases such as CAPEC), it is feasible to
capture diagnosis-oriented information from structure
(physical networking or network communication).</p>
        <p>
          Yet another approach to semi-automatically
generate the model is to learn it from executions of the
system (e.g., during regression testing, just before
deployment). This approach to system modeling is
inspired by the work in automatic software
debugging work [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], where modeling of program
behavior is done in terms of abstraction of program traces
– known as spectra [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], abstracting from modeling
specific components and data dependencies
        </p>
        <p>The outlined approaches to construct the abstract
vulnerability model entail different costs and
diagnostic accuracies. As expected, manually building the
model is the most expensive one. Note that
building the model is a time-consuming and error-prone
task. The two semi-automatic ways also entail
different costs: one exploits the available, static
information and the other requires the system to be executed
to compute a meaningful set of executions. We will
investigate the trade-offs between modeling approaches
and their diagnostic accuracy in the context of
transparent computing.
7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Identifying the root-cause and perform damage
impact assessment of advanced persistent threats can be
framed as a diagnostic problem. In this paper, we
discuss an approach that leverages machine learning and
model-based diagnosis techniques to reason about
potential attacks.</p>
      <p>Our approach classifies audit trails into high-level
activities, and then reasons about those activities and
their threat potential in real-time and forensically. By
using the outcome of this reasoning to explain
complex evi- dence of malicious behavior, the system
administrators is provided with the proper tools to
promptly react to, stop, and mitigate attacks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Kyu</given-names>
            <surname>Hyung Lee</surname>
          </string-name>
          , Xiangyu Zhang, and
          <string-name>
            <given-names>Dongyan</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>High accuracy attack provenance via binarybased execution partition</article-title>
          .
          <source>In Proceedings of the 20th Annual Network and Distributed System Security Symposium</source>
          , San Diego, CA,
          <year>February 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Kyu</given-names>
            <surname>Hyung Lee</surname>
          </string-name>
          , Xiangyu Zhang, and Dongyan Xu.
          <article-title>LogGC: Garbage collecting audit log</article-title>
          .
          <source>In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security</source>
          , pages
          <fpage>1005</fpage>
          -
          <lpage>1016</lpage>
          , Berlin, Germany,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Galar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández</surname>
          </string-name>
          , E. Barrenechea,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bustince</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <article-title>A review on ensembles for the class imbalance problem: Bagging-</article-title>
          ,
          <string-name>
            <surname>boosting-</surname>
          </string-name>
          ,
          <article-title>and hybrid-based approaches</article-title>
          .
          <source>IEEE Transactions onSystems, Man, and Cybernetics</source>
          , Part C:
          <article-title>Applications</article-title>
          and Reviews,
          <volume>42</volume>
          (
          <issue>4</issue>
          ):
          <fpage>463</fpage>
          -
          <lpage>484</lpage>
          ,
          <year>July 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Charu C Aggarwal</surname>
          </string-name>
          .
          <article-title>Outlier ensembles: Position paper</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          ,
          <volume>14</volume>
          (
          <issue>2</issue>
          ):
          <fpage>49</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jing</given-names>
            <surname>Gao</surname>
          </string-name>
          and
          <string-name>
            <surname>Pang-Ning Tan</surname>
          </string-name>
          .
          <article-title>Converting output scores from outlier detection algorithms into probability estimates</article-title>
          .
          <source>In Proceedings of the Sixth International Conference on Data Mining</source>
          , pages
          <fpage>212</fpage>
          -
          <lpage>221</lpage>
          . IEEE,
          <year>December 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
            , Peer Kröger, Erich Schubert, and
            <given-names>Arthur</given-names>
          </string-name>
          <string-name>
            <surname>Zimek</surname>
          </string-name>
          .
          <article-title>Interpreting and unifying outlier scores</article-title>
          .
          <source>In Proceedings of the Eleventh SIAM International Conference on Data Mining</source>
          , pages
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>April 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Erich</given-names>
            <surname>Schubert</surname>
          </string-name>
          , Remigius Wojdanowski, Arthur Zimek, and
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
          </string-name>
          .
          <article-title>On evaluation of outlier rankings and outlier scores</article-title>
          .
          <source>In Proceedings of the Twelfth SIAM International Conference on Data Mining</source>
          , pages
          <fpage>1047</fpage>
          -
          <lpage>1058</lpage>
          ,
          <year>April 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Hoda</given-names>
            <surname>Eldardiry</surname>
          </string-name>
          , Kumar Sricharan, Juan Liu, John Hanley, Bob Price, Oliver Brdiczka, and
          <article-title>Eugene Bart. Multi-source fusion for anomaly detection: using across-domain and across-time peer-group consistency checks</article-title>
          .
          <source>Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA)</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Raj D.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Robert E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yoram</given-names>
            <surname>Singer</surname>
          </string-name>
          .
          <article-title>An efficient boosting algorithm for combining preferences</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>4</volume>
          (Nov):
          <fpage>933</fpage>
          -
          <lpage>969</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ke</surname>
            <given-names>Deng</given-names>
          </string-name>
          , Simeng Han,
          <string-name>
            <surname>Kate J Li</surname>
          </string-name>
          ,
          <article-title>and Jun S Liu. Bayesian aggregation of order-based rank data</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          ,
          <volume>109</volume>
          (
          <issue>507</issue>
          ):
          <fpage>1023</fpage>
          -
          <lpage>1039</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Arthur</surname>
            <given-names>P Dempster</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nan M Laird</surname>
          </string-name>
          , and Donald B Rubin.
          <article-title>Maximum likelihood from incomplete data via the EM algorithm</article-title>
          .
          <source>Journal of the royal statistical society. Series B</source>
          ,
          <volume>39</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Johan de Kleer</surname>
            , Olivier Raiman, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Shirley</surname>
          </string-name>
          .
          <article-title>One step lookahead is pretty good</article-title>
          .
          <source>In Readings in Model-Based Diagnosis</source>
          , pages
          <fpage>138</fpage>
          -
          <lpage>142</lpage>
          . Morgan Kaufmann Publishers, San Francisco, CA,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Feldman</given-names>
          </string-name>
          , Tolga Kurtoglu,
          <string-name>
            <given-names>Sriram</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          , Scott Poll, David Garcia, Johan de Kleer, Lukas Kuhn, and Arjan van Gemund.
          <article-title>Empirical evaluation of diagnostic algorithm performance using a generic framework</article-title>
          .
          <source>International Journal of Prognostics and Health Management</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Johan de Kleer</surname>
            and
            <given-names>Brian</given-names>
          </string-name>
          <string-name>
            <surname>Williams</surname>
          </string-name>
          .
          <article-title>Diagnosing multiple faults</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>97</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Oleg</surname>
            <given-names>Sheyner</given-names>
          </string-name>
          , Joshua Haines, Somesh Jha, Richard Lippmann, and
          <string-name>
            <surname>Jeannette</surname>
            <given-names>M Wing.</given-names>
          </string-name>
          <article-title>Automated generation and analysis of attack graphs</article-title>
          .
          <source>In Proceeding of the 2002 IEEE Symposium on Security and Privacy</source>
          , pages
          <fpage>273</fpage>
          -
          <lpage>284</lpage>
          . IEEE, May
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Seyit</given-names>
            <surname>Ahmet</surname>
          </string-name>
          Camtepe and
          <string-name>
            <given-names>Bülent</given-names>
            <surname>Yener</surname>
          </string-name>
          .
          <article-title>Modeling and detection of complex attacks</article-title>
          .
          <source>In Proceeding of the Third International Conference on Security and Privacy in Communications Networks</source>
          , pages
          <fpage>234</fpage>
          -
          <lpage>243</lpage>
          ,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Rui</given-names>
            <surname>Abreu</surname>
          </string-name>
          and
          <string-name>
            <surname>Arjan JC van Gemund</surname>
          </string-name>
          .
          <article-title>A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis</article-title>
          .
          <source>In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation</source>
          , pages
          <fpage>2</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Feldman</given-names>
          </string-name>
          , Gregory Provan, and Arjan van Gemund.
          <article-title>Approximate model-based diagnosis using greedy stochastic search</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>38</volume>
          :
          <fpage>371</fpage>
          -
          <lpage>413</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Nuno</given-names>
            <surname>Cardoso</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rui</given-names>
            <surname>Abreu</surname>
          </string-name>
          .
          <article-title>A distributed approach to diagnosis candidate generation</article-title>
          .
          <source>In Progress in Artificial Intelligence</source>
          , pages
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Feldman</given-names>
          </string-name>
          , Jurryt Pietersma, and Arjan van Gemund.
          <article-title>All roads lead to fault diagnosis: Model-based reasoning with LYDIA</article-title>
          .
          <source>In Proceedings of the Eighteenth BelgiumNetherlands Conference on Artificial Intelligence (BNAIC'06)</source>
          , Namur, Belgium,
          <year>October 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Meera</surname>
            <given-names>Sampath</given-names>
          </string-name>
          , Raja Sengupta, Stephane Lafortune, Kasim Sinnamohideen, and Demosthenis C Teneketzis.
          <article-title>Failure diagnosis using discreteevent models</article-title>
          .
          <source>Control Systems Technology, IEEE Transactions on, 4</source>
          (
          <issue>2</issue>
          ):
          <fpage>105</fpage>
          -
          <lpage>124</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Alban</surname>
            <given-names>Grastien</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marie-Odile Cordier</surname>
            , and
            <given-names>Christine</given-names>
          </string-name>
          <string-name>
            <surname>Largouët</surname>
          </string-name>
          .
          <article-title>Incremental diagnosis of discreteevent systems</article-title>
          . In DX,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Alban</surname>
            <given-names>Grastien</given-names>
          </string-name>
          , Patrik Haslum, and
          <string-name>
            <given-names>Sylvie</given-names>
            <surname>Thiébaux</surname>
          </string-name>
          .
          <article-title>Conflict-based diagnosis of discrete event systems: theory and practice</article-title>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Behrooz</given-names>
            <surname>Parhami</surname>
          </string-name>
          . Computer Arithmetic: Algorithms and
          <string-name>
            <given-names>Hardware</given-names>
            <surname>Designs</surname>
          </string-name>
          . Oxford University Press, Inc., New York, NY, USA, 2nd edition,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Janice</surname>
            <given-names>Glasgow</given-names>
          </string-name>
          , Glenn Macewen, and
          <string-name>
            <given-names>Prakash</given-names>
            <surname>Panangaden</surname>
          </string-name>
          .
          <article-title>A logic for reasoning about security</article-title>
          .
          <source>ACM Transactions on Computer Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>3</issue>
          ):
          <fpage>226</fpage>
          -
          <lpage>264</lpage>
          ,
          <year>August 1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Kenneth</given-names>
            <surname>Forbus</surname>
          </string-name>
          and Johan de Kleer.
          <article-title>Building Problem Solvers</article-title>
          . MIT Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Johan de Kleer</surname>
          </string-name>
          .
          <article-title>Diagnosing multiple persistent and intermittent faults</article-title>
          .
          <source>In Proceeding of the 2009 International Joint Conference on Artificial Intelligence</source>
          , pages
          <fpage>733</fpage>
          -
          <lpage>738</lpage>
          ,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Rui</surname>
            <given-names>Abreu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Zoeteweij</surname>
          </string-name>
          , and
          <string-name>
            <surname>Arjan J. C. Van Gemund</surname>
          </string-name>
          .
          <article-title>A new bayesian approach to multiple intermittent fault diagnosis</article-title>
          .
          <source>In Proceeding of the 2009 International Joint Conference on Artificial Intelligence</source>
          , pages
          <fpage>653</fpage>
          -
          <lpage>658</lpage>
          ,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Rui</surname>
            <given-names>Abreu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Zoeteweij</surname>
          </string-name>
          , and
          <string-name>
            <surname>Arjan JC Van Gemund</surname>
          </string-name>
          .
          <article-title>Spectrum-based multiple fault localization</article-title>
          .
          <source>In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering</source>
          , pages
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>November 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Mary</given-names>
            <surname>Jean</surname>
          </string-name>
          <string-name>
            <given-names>Harrold</given-names>
            , Gregg Rothermel, Kent Sayre,
            <surname>Rui Wu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Liu</given-names>
            <surname>Yi</surname>
          </string-name>
          .
          <article-title>An empirical investigation of the relationship between spectra differences and regression faults</article-title>
          .
          <source>Software Testing Verification and Reliability</source>
          ,
          <volume>10</volume>
          (
          <issue>3</issue>
          ):
          <fpage>171</fpage>
          -
          <lpage>194</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>