=Paper=
{{Paper
|id=Vol-1507/dx15paper25
|storemode=property
|title=Diagnosing Advanced Persistent Threats: A Position Paper
|pdfUrl=https://ceur-ws.org/Vol-1507/dx15paper25.pdf
|volume=Vol-1507
|dblpUrl=https://dblp.org/rec/conf/safeprocess/AbreuBEFHHKPAB15
}}
==Diagnosing Advanced Persistent Threats: A Position Paper==
<pdf width="1500px">https://ceur-ws.org/Vol-1507/dx15paper25.pdf</pdf>
<pre>
                   Proceedings of the 26th International Workshop on Principles of Diagnosis


           Diagnosing Advanced Persistent Threats: A Position Paper
  Rui Abreu and Danny Bobrow and Hoda Eldardiry and Alexander Feldman and
     John Hanley and Tomonori Honda and Johan de Kleer and Alexandre Perez
                                 Palo Alto Research Center
                                    3333 Coyote Hill Rd
                                Palo Alto, CA 94304, USA
  {rui,bobrow,hoda.eldardiry,afeldman,john.hanley,tomo.honda,dekleer,aperez}@parc.com
                                       Dave Archer and David Burke
                                                 Galois, Inc.
                                        421 SW 6th Avenue, Suite 300
                                          Portland, OR 97204, USA
                                          {dwa,davidb}@galois.com
                      Abstract                                individual applications. Thus current techniques give
                                                              operators little system-wide situational awareness, nor
     When a computer system is hacked, analyz-                any viewpoint informed by a long-term perspective.
     ing the root-cause (for example entry-point              Adversaries have taken advantage of this opacity by
     of penetration) is a diagnostic process. An              adopting a strategy of persistent, low-observability
     audit trail, as defined in the National Infor-           operation from inside the system, hiding effectively
     mation Assurance Glossary, is a security-                through the use of long causal chains of system and
     relevant chronological (set of) record(s),               application code. We call such adversaries advanced
     and/or destination and source of records that            persistent threats, or APTs.
     provide evidence of the sequence of activi-
     ties that have affected, at any time, a specific            To address current limitations, this position pa-
     operation, procedure, or event. After detect-            per discusses a technique that aims to track causal-
     ing an intrusion, system administrators man-             ity across the enterprise and over extended periods of
     ually analyze audit trails to both isolate the           time, identify subtle causal chains that represent ma-
     root-cause and perform damage impact as-                 licious behavior, localize the code at the roots of such
     sessment of the attack. Due to the sheer vol-            behavior, trace the effects of other malicious actions
     ume of information and low-level activities              descended from those roots, and make recommenda-
     in the audit trails, this task is rather cum-            tions on how to mitigate those effects. By doing so,
     bersome and time intensive. In this posi-                the proposed approach aims to enable stakeholders to
     tion paper, we discuss our ideas to automate             understand and manage the activities going on in their
     the analysis of audit trails using machine               networks. The technique exploits both current and
     learning and model-based reasoning tech-                 novel forms of local causality to construct higher-level
     niques. Our approach classifies audit trails             observations, long-term causality in system informa-
     into the high-level activities they represent,           tion flow. We propose to use a machine learning ap-
     and then reasons about those activities and              proach to classify segments of low-level events by the
     their threat potential in real-time and foren-           activities they represent, and reasons over these ac-
     sically. We argue that, by using the outcome             tivities, prioritizing candidate activities for investiga-
     of this reasoning to explain complex evi-                tion. The diagnostic engine investigates these candi-
     dence of malicious behavior, we are equip-               dates looking for patterns that may represent the pres-
     ping system administrators with the proper               ence of APTs. Using pre-defined security policies and
     tools to promptly react to, stop, and mitigate           related mitigations, the approach explains discovered
     attacks.                                                 APTs and recommends appropriate mitigations to op-
                                                              erators. We plan to leverage models of APT and nor-
                                                              mal business logic behavior to diagnose such threats.
1 Introduction                                                Note that the technique is not constrained by availabil-
Today, enterprise system and network behaviors are            ity of human analysts, but can benefit by human-on-
typically “opaque”: stakeholders lack the ability to as-      the-loop assistance.
sert causal linkages in running code, except in very             The approach discussed in the paper will offer un-
simple cases. At best, event logs and audit trails can        precedented capability for observation of long-term,
offer some partial information on temporally and spa-         subtle system-wide activity by automatically con-
tially localized events as seen from the viewpoint of         structing such global, long-term causality observa-


                                                        193
                   Proceedings of the 26th International Workshop on Principles of Diagnosis


tions. The ability to automatically classify causal                     execute processes on the front-end as the non-
chains of events in terms of abstractions such as ac-                   privileged user www-data.
tivities, will provide operators with a unique capabil-
                                                                    2. The attacker notices that the front-end is run-
ity to orient to long-term, system-wide evidence of
                                                                       ning an unpatched U BUNTU L INUX OS version
possible threats. The diagnostic engine will provide
                                                                       13.1. The attacker uses the nc Linux utility to
a unique capability to identify whether groups of such
                                                                       copy an exploit for obtaining root privileges. The
activities likely represent active threats, making it eas-
                                                                       particular exploit that the attacker uses utilizes
ier for operators to decide whether long-term threats
                                                                       the x32 recvmmsg() kernel vulnerability reg-
are active, and where they originate, even before those
                                                                       istered in the Common Vulnerabilities and Expo-
threats are identified by other means. Thus, the ap-
                                                                       sures (CVE) database as CVE 2014-0038. After
proach will pave the way for the first automated, long-
                                                                       running the copied binary for a few minutes the
horizon, continuously operating system-wide support
                                                                       attacker gains root access to the front-end host.
for an effective defender Observe, Orient, Decide, and
Act (OODA) loop.                                                    3. The attacker installs a root-kit utility that inter-
                                                                       cepts all input to ssh;
2 Running Example                                                   4. A system administrator uses the compromised
The methods proposed in this article are illustrated on                ssh to connect to the back-end revealing his back-
a realistic running example. The attackers in this ex-                 end password to the attacker;
ample use sophisticated and recently discovered ex-                 5. The attacker uses the compromised front-end to
ploits to gain access to the victim’s resources. The at-               bypass firewalls and uses the newly acquired
tack is remote and does not require social engineering                 back-end administrator’s password to access the
or opening a malicious email attachment. The meth-                     back-end;
ods that we propose, however, are not limited to this
class of attacks.                                                   6. The attacker uses a file-tree traversing utility on
                                                                       the back-end that collects sensitive data and con-
                                          router                       solidates it in an archive file;
                                                                    7. The attacker sends the archive file to a third-party
                        Internet
                                                                       hijacked computer for analysis.
  hacker

                 victim’s local network
                                                                   3 Auditing and Instrumentation
                                                                   Almost all computing systems of sufficiently high-
                                                                   level (with the exception of some embedded systems)
                                                                   leave detailed logs of all system and application activ-
                                                                   ities. Many UNIX variants such as L INUX log via the
                                                                   syslog daemon, while W INDOWSTM uses the event
    web server      data storage               system              log service. In addition to the usual logging mecha-
     front-end       back end                administrator         nisms, there is a multitude of projects related to se-
                                                                   cure and detailed auditing. An audit log is more de-
      Figure 1: Network topology for the attack                    tailed trail of any security or computation-related ac-
                                                                   tivity such as file or RAM access, system calls, etc.
                                                                      Depending on the level of security we would like
   The network topology used for our running example               to provide, there are several methods for collecting in-
is shown in figure 1. The attack is executed over sev-             put security-related information. On one extreme, it is
eral days. It starts by (1) compromising the web server            possible to use the existing log files. On the other ex-
front-end, followed by (2) a reconnaissance phase and              treme there are applications for collecting detailed in-
(3) compromising the data storage back end and ulti-               formation about the application execution. One such
mately extracting and modifying sensitive information              approach [1] runs the processes of interest through a
belonging to the victim.                                           debugger and logs every memory read and write ac-
   Both the front-end and the back end in this example             cess.
run unpatched U BUNTU 13.1 L INUX OS on an I N -                      It is also possible to automatically inject logging
TEL R S ANDY B RIDGE TM architecture.
                                                                   calls in the source files before compiling them, allow-
   What follows is a detailed chronology of the events:            ing us to have static or dynamic logging or a combi-
  1. The attacker uses the A PACHE httpd server, a                 nation of the two. Log and audit information can be
     cgi-bin script, and the S HELLSHOCK vulnera-                  signed, encrypted and sent in real-time to a remote
     bility (GNU bash exploit registered in the Com-               server to make system tampering and activity-hiding
     mon Vulnerabilities and Exposures database as                 more difficult. All these configuration decisions im-
     CVE 2014-6271 (see https://nvd.nist.                          pose different trade-offs in security versus computa-
     gov/) to gain remote shell access to the victim’s             tional and RAM load [2] and depend on the organiza-
     front-end. It is now possible for the attacker to             tional context.


                                                              2


                                                             194
                   Proceedings of the 26th International Workshop on Principles of Diagnosis


                             ..                                   abstractions may be composed of lower-order abstrac-
                              .
                                                                  tions that are in turn abstractions of low-level events.
front_end.secure_access_log:11.239.64.213 - [22/Apr/2014          For example, a sequential set of logged events such as
06:30:24 +0200] "GET /cgi-bin/test.cgi HTTP/1.1" 401 381          ‘browser forking bash’, ‘bash initiating Netcat’, and
                             ..                                   ‘Netcat listening to new port’, might be abstracted as
                              .                                   the activity ‘remote shell access’. The set of activities,
front_end.rsyslogd.log:recvmsg(3, msg_name(0)       =             ‘remote shell access’, and ‘escalation of privilege’ can
NULL, msg_iov(1) = ["29/Apr/2014 22:15:49 ...", 8096],            be abstracted as the activity ‘remote root shell access’.
msg_controllen = 0, msg_flags = MSG_CTRUNC,
MSG_DONTWAIT) = 29                                                   We approach activity annotation as a supervised
                                                                  learning problem that uses classification techniques to
                             ..                                   generate activity tags for audit trails. Table 1 shows
                              .
                                                                  multiple levels of activity classifications for the above
back_end:auditctl:type = SYSCALL msg = au-                        APT example.
dit(1310392408.506:36): arch = c000003e syscall = 2
success = yes exit = 3 a0 = 7fff2ce9471d a1 = 0 a2 = 61f768          Table 1 represents one possible classification-
a3 = 7fff2ce92a20 items = 1 ppid = 20478 pid = 21013 auid         enriched audit trail for such an APT. There can be
= 1000 uid = 0 gid = 0 euid = 0 suid = 0 fsuid = 0 egid = 0       many relatively small variations. For example, ob-
sgid = 0 fsgid = 0 ses = 1 comm = "grep" exe = "/bin/grep"        scuring the password file could be done using other
                             ..                                   programs. A single classifier only allows for a single
                              .                                   level of abstraction, and a single leap from low-level
                                                                  events to very abstract activities (for example, from
Figure 2: Part of log files related to the attack from the        ‘bash execute perl’ level to ‘extracting modified file’
running example                                                   level) will have higher error caused by these additional
                                                                  variations.
   Figure 2 shows part of the logs collected for our run-            To obtain several layers of abstraction for reason-
ning example. The first entry is when the attacker ex-            ing over, and thus reduce overall error in classifica-
ploits the S HELLSHOCK vulnerability through a CGI                tion, we use a multi-level learning strategy that models
script of the web server. The second entry shows sys-             information at multiple levels of semantic abstraction
log strace-like message resulting from the kernel                 using multiple classifiers. Each classifier solves the
escalation. Finally, the attacker uses the grep com-              problem at one abstraction level, by mapping from a
mand on the back-end server to search for sensitive               lower-level (fine) feature space to the next higher-level
information and the call is recorded by the audit sys-            conceptual (coarse) feature space.
tem.                                                                 The activity classifier rely on both a vocabulary of
   It is often the case that the raw system and secu-             activities and a library of patterns describing these ac-
rity log files are preprocessed and initial causal links          tivities that will be initially defined manually. This vo-
are computed. If we trace the exec, fork, and                     cabulary and pattern set reside in a Knowledge Base.
join POSIX system calls, for example, it is possi-                   In our training approach, results from training lower
ble to add graph-like structure to the log files comput-          level classifiers are used as training data for higher
ing provenance graphs. Another method for comput-                 level classifiers. In this way, we coherently train all
ing local causal links is to consider shared resources,           classifiers by preventing higher-level classifiers from
e.g., two threads reading and writing the same memory             being trained with patterns that will never be gener-
address [1].                                                      ated by their lower-level precursors. We use an ensem-
                                                                  ble learning approach to achieve accurate classifica-
4 Activity Classification                                         tion. This involves stacking together both bagged and
The Activity Classifier continuously annotates audit              boosted models to reduce both variance and bias er-
trails with semantic tags describing the higher-order             ror components [3]. The classification algorithm will
activity they represent. For example, ‘remote shell ac-           be trained using an online-learning technique and in-
cess’, ‘remote file overwrite’, and ‘intra-network data           tegrated within an Active Learning Framework to im-
query’ are possible activity tags. These tags are used            prove classification of atypical behaviors.
by the APT Diagnostics Engine to enable higher-order                 Generating Training Data for Classification To
reasoning about related activities, and to prioritize ac-         build the initial classifier, training data is generated
tivities for possible investigation.                              using two methods. First, an actual deployed sys-
                                                                  tem is used to collect normal behavior data, and a
4.1 Hierarchical semantic annotation of                           Subject Matter Expert manually labels it. Second,
    audit trails                                                  a testing platform is used to generate data in a con-
A key challenge in abstracting low-level events into              trolled environment, particularly platform dependent
higher-order activity patterns that can be reasoned               vulnerability-related behavior. In addition, to gener-
about efficiently is that such patterns can be described          ate new training data of previously unknown behavior,
at multiple levels of semantic abstraction, all of which          we use an Active Learning framework as described in
may be useful in threat analysis. Indeed, higher-order            Section 5.


                                                              3


                                                          195
                   Proceedings of the 26th International Workshop on Principles of Diagnosis


                           Table 1: Sample classification problem for running example
                  Activity 1                      Activity 2                        Activity 3
            Remote Shell Access            Remote File Overwrite            Modified File Download
                Shell Shock                  Trojan Installation              Password Exfiltration
         Browser (Port 80) fork bash     Netcat listen to Port 8443        Netcat listen to Port 8443
              bash fork Netcat          Port 8443 receive binary file          Port 8443 fork bash
          Netcat listen to port 8080   binary file overwrites libns.so          bash execute perl
                                                                         Perl overwrite /tmp/stolen_pw
                                                                         Port 8443 send /tmp/stolen_pw


5 Prioritizer                                                    are ranked according to their threat level by aggregat-
                                                                 ing a severity measure (determined by classified threat
As the Activity Classifier annotates audit trails with
                                                                 type) and a confidence measure. We complement the
activity descriptors, the two (parallel) next steps in our
                                                                 initial set of training data to calibrate our classifiers by
workflow are to 1) prioritize potential threats to be re-
                                                                 using an Active Learning Framework, which focuses
ferred to the Diagnostic Engine (see Section 6) for in-
                                                                 on improving the classification algorithm through oc-
vestigation, and 2) prioritize emergent activities that
                                                                 casional manual labeling of the most critical activities
(after suitable review and labeling) are added to the ac-
                                                                 in the audit trails.
tivity classifier training data. This module prioritizes
                                                                    Unsupervised ranking using normalcy charac-
activities by threat severity and confidence level. This
                                                                 terization to catch unknown threats. The second
prioritization process presents three key challenges.
                                                                 component of the prioritizer is a set of unsupervised
                                                                 normalcy rankers, which rank entities based on their
5.1 Threat-based rank-annotation of
                                                                 statistical “normalcy". Activities identified as un-
    activities                                                   usual will be fed to the Active Learning framework
One challenge in ranking activities according to their           to check if any of them are “unknown” APT activities.
threat potential is the complex (and dynamic) notion of          This provides a mechanism for detecting “unknown”
what constitutes a threat. Rankings based on matching            threats while also providing feedback to improve the
to known prior threats is necessary, but not sufficient.         APT classifier.
An ideal ranking approach should take known threats
into account, while also proactively considering the             5.2 Combining Multiple Rankings
unknown threat potential of new kinds of activities.             One of the key issues with combining the outputs of
Another such challenge is that risk may be assessed              multiple risk ranking is dealing with two-dimensional
at various levels of activity abstraction, requiring that        risk (severity, confidence) scores that may be on very
overall ranking must be computed by aggregating risk             different scales. A diverse set of score normalization
assessments at multiple abstraction levels.                      techniques have been proposed [4; 5; 6] to deal with
   We implement two ranking approaches: a super-                 this issue, but no single technique has been found to
vised ranker based on previously known threats and an            be superior over all the others. An alternative to com-
unsupervised ranker that considers unknown potential             bining scores is to combine rankings [7]. Although
threats.                                                         converting scores to rankings does lose information, it
   Supervised ranking using APT classification to                remains an open question if the loss in information is
catch known threats. The goal of APT classifica-                 compensated for by the convenience of working with
tion is to provide the diagnostic engine with critical           the common scale of rankings.
APT related information such as APT Phase, severity                 We will develop combination techniques for
of attack, and confidence level associated with APT              weighted risk rankings based on probabilistic rank ag-
tagging for threat prioritization. Since the audit trails        gregation methods. This approach builds on our own
are annotated hierarchically into different granularity          work [8] that shows the robustness of the weighted
of actions, multiple classifiers will be built to consider       ranking approach. We also build on principled meth-
each hierarchical level separately. APT classifiers are          ods for combining ranking data found in the statistics
used to identify entities that are likely to be instances        and information retrieval literature.
of known threats or phases of an APT attack. Two                    Traditionally, the goal of rank aggregation [9; 10]
types of classifiers are used. The first classifier is           is to combine a set of rankings of the same candi-
hand-coded and the second classifier is learned from             dates into a single consensus ranking that is “better”
training data.                                                   than the individual rankings. We extend the tradi-
   The hand-coded classifier is designed to have high            tional approach to accommodate the specific context
precision, using hand-coded rules, mirroring SIEM                of weighted risk ranking. First, unreliable rankers will
and IDS systems. Entities tagged by this classifier are          be identified and either ignored or down-weighted,
given the highest priority for investigation. The second         lest their rankings decrease the quality of the over-
classifier, which is learned from training data, will pro-       all consensus [7; 10]. Second, we will discount ex-
vide higher recall at the cost of precision. Activities          cessive correlation among rankers, so that a set of


                                                             4


                                                         196
                  Proceedings of the 26th International Workshop on Principles of Diagnosis


highly redundant rankers do not completely outweigh            vulnerabilities or the combined use of social engineer-
the contribution of other alternative rankings. To ad-         ing and insufficiency of the organizational security
dress these two issues, we will associate a probabilis-        policies. We use MBD for computing the set of si-
tic latent variable Zi with the i’th entity of interest,       multaneously exploited vulnerabilities that allowed the
which indicates whether the entity is anomalous or             deployment of the APT. Computing such explanations
normal. Then, we will build a probabilistic model              is possible because MBD reasons in terms of multiple-
that allows us to infer the posterior distribution over        faults [14]. In our running example this set would in-
the Zi based on the observed rankings produced by              clude both the fact the the web server has been ex-
each of the input weighted risk rankings. This poste-          ploited due to the Shellshock vulnerability and that a
rior probability of Zi being normal will then be used          the attacker gained privileged access on the front-end
as the weighted risk rank. Our model will make the             due to the use of the X64_32 escalation vulnerability.
following assumptions to account for both unreliable              The abstract security model is used to gather infor-
and correlated rankers: 1) Anomalies are ranked lower          mation about types of attacks the system is vulnerable
than all normal instances and these ranks tend to be           to, and to aid deciding the set of actions required to
concentrated near the lower rankings of the provided           stop an APT campaign (policy enforcement). Various
weighted risk rankings, and 2) Normal data instances           heuristics exist to find the set of meaningful diagnosis
tend to be uniformly distributed near the higher rank-         candidates. As an example, one might be interested
ings of the weighted risk rankings.                            in the minimal set of actions to stop the attack [15;
   There are various ways to build a probabilistic             16] or select those candidates that capture significant
model that reflects the above assumptions and al-              probability mass [17]. In the rest of this section, for
lows for the inference of the Zi variables through             illustration purposes, we use minimality as the heuris-
Expectation-Maximization [11]. In addition to these            tic of interest. MBD is the right tool for dealing with
assumptions, we will explore allowing other factors to         computation of diagnosis candidates as it offers sev-
influence the latent Zi variables, such as features of         eral ways to address the modeling and computational
the entities as well as feedback provided by an expert         complexity [18; 19].
analysts.
                                                               6.2 Detection and Isolation of Attacks from
                                                                   Abstract Security Model and Sensor
6 Diagnosis                                                        Data
We view the problem of detecting, isolating, and ex-           The abstract security model provides an abstraction
plaining complex APT campaigns behavior from rich              mechanism that is originally missing in the audit trails.
activity data is a diagnostic problem. We will use             More precisely what is not in the audit trails and what
an AI-based diagnostic reasoning to guide the global           is in the security model is how to connect (possibly
search for possible vulnerabilities that enabled the           disconnected) activities for the purpose of global rea-
breach. Model-based diagnosis (MBD) [12] is a par-             soning. The abstract security model and the sensor
ticularly compelling approach as it supports reasoning         data collected from the audit trails are provided as in-
over complex causal networks (for example, having              puts to an MBD algorithms that performs the high-
multiple conjunctions, disjunctions, and negations)            level reasoning about possible vulnerabilities and at-
and identifies often subtle combinations of root causes        tacks similar to what a human security officer would
of the symptoms (the breach).                                  do.
                                                                  The information in the “raw” audit trails is of too
6.1 An MBD approach for APT detection                          high fidelity [2] and low abstraction to be used by a
    and isolation: Motivation                                  “crude” security model. That is the reason the diag-
                                                               nostic engine needs the machine learning module to
Attack detection and isolation are two distinct chal-          temporally and spatially group nodes in the audit trails
lenges. Often diagnostic approaches use separate               and to provide semantically rich variable/value sensor
models for detection and isolation [13]. MBD how-              data about actions, suitable for MBD. Notice that in
ever uses a single model, to combine these two rea-            this process, the audit trail structure is translated to se-
sonings. The security model contains both part of              mantic categories, i.e., the diagnostic engine receives
the security policy (that communicating with certain           as observations time-series of sensed actions.
blacklisted hosts may indicate an information leak)               The listing that follows next shows an abstract se-
and information about the possible locations and con-          curity model for the running example in the LYDIA
sequence of a vulnerability (a privilege escalation may        language [20]. This bears some resemblance to P RO -
lead to an information leak). The security model also          LOG , except that LYDIA is a language for model-
contains abstract security constraints such as if a pro-       based diagnosis of logical circuits while P ROLOG is
cess requires authentication, a password must be read          for Horn-style reasoning. The use of LYDIA is for
and compared against.                                          illustration purposes only, in reality computer sys-
   The diagnostic approach takes into consideration            tems can be much more easily modeled as state ma-
the bootstrapping of an APT which we consider the              chines. There is a significant body of literature deal-
root-cause of the attack. What enables a successful            ing with diagnosis of discrete-event systems [21; 22;
APT is either a combination of software component              23], to name just a few.


                                                           5


                                                       197
                       Proceedings of the 26th International Workshop on Principles of Diagnosis


 1   system front_end (bool know_root_password)                                               know root password


 2   {                                                                                                  root shell
 3       bool httpd_shell_vuln ; // vulnerability                             httpd shell
                                                                                                                                                      >
         bool buffer_overflow_vuln ; // vulnerability
                                                                                                                     2

 4
                                                                                                                         r


         bool escalation_vuln ; // vulnerability
                                                                          leak pw1
 5                                                                                                        2
                                                                                                              p
 6                                                                                                                           Legend:
                                                                                                                               1
                                                                                                                                 assumable variable
 7       bool httpd_shell ;                                        buffer overflow vuln   1
                                                                                                          2
                                                                                                              q                2
                                                                                                                                 internal variable


 8       bool root_shell ;
 9       bool leak_passwd;
10                                                                 Figure 3: Part of the abstract security model for the
11       // weak−fault models
                                                                   running example
12       if (! httpd_shell_vuln ) { // if healthy
13           ! httpd_shell ; // forbid shells via httpd
14       }                                                         rity constraints in it is notoriously difficult, hence we
15                                                                 plan to create or use specialized modal logic similar to
16       if (! escalation_vuln ) { // if healthy
17           ! root_shell ; // no root shell is possible
                                                                   the one proposed in [25].
18       }                                                            Notice that the format of the Boolean circuit shown
19                                                                 in figure 3 is very close to the one used in Truth Main-
20       if (! buffer_overflow_vuln ) { // if healthy              tenance System (TMS) [26]. The only assumable vari-
21           !leak_passwd; // passwords don’t leak                 able in figure 3 is buffer_overflow_vuln and its
22       }                                                         default value is false (i.e., there is no buffer overflow
23                                                                 vulnerability in the web server process).
24       bool access_passwd;                                          We next show how a reasoning engine can discover
25       attribute observable (access_passwd) = true;              a conflict through forward and backward propagation.
26
                                                                   Looking at figure 3, it is clear that r must be true be-
27        !access_passwd => !leak_passwd;
28                                                                 cause it is an input to an AND-gate whose output is set
29       /∗∗                                                       to true. Therefore either p or q (or both) must be true.
30        ∗ Knowing the root password can be explained             This means that either buffer_overflow_vuln or
31        ∗ by a root shell ( for example there is a               leak_pw must be false. If we say that leak_pw is
32        ∗ password sniffer ).                                    assumed to be true (measured or otherwise inferred),
33        ∗/                                                       then leak_pw and buffer_overflow_vuln are to-
34       know_root_password =>                                     gether part of a conflict. It means that the reasoning
35       (( httpd_shell || leak_passwd) && root_shell );           engine has to change one of them to resolve the con-
36   }                                                             tradiction.
37
38   system back_end(bool know_root_password)
                                                                      Based on the observation from our running exam-
39   {                                                             ple and a TMS constructed from the security model
40      bool comm;                                                 shown in figure 3, the hitting set algorithm computes
41       attribute observable (comm) = true;                       two possible diagnostic hypotheses: (1) the attacker
42                                                                 gained a shell access through a web-server vulnerabil-
43       /∗∗                                                       ity and the attacker performed privilege escalation or
44        ∗ Normal users can only communicate with a               (2) the attacker injected binary code through a buffer
45        ∗ list of permitted hosts .                              overflow and the attacker performed privilege escala-
46         ∗/                                                      tion.
47       if (! know_root_password) {                                  If we use LYDIA to compute the set of diagnoses for
48             comm == true;
49       }
                                                                   the running example, we get the following two (am-
50   }                                                             biguous) diagnoses for the root-cause of the penetra-
51                                                                 tion:
52   system main()                                                 $ lydia example.lm example.obs
53   {                                                             d1 = { fe.escalation_vuln,
54       bool know_root_password;
                                                                          fe.httpd_shell_vuln }
55
56       system front_end fe (know_root_password);                 d2 = { fe.buffer_overflow_vuln,
57       system back_end be(know_root_password);                          fe.escalation_vuln }
58   }                                                                MBD uses probabilities to computes a sequence of
                                                                   possible diagnoses ordered by likelihood. This proba-
        LYDIA translates the model to an internal proposi-         bility can be used for many purposes: decide which di-
     tional logic formula. Part of this internal representa-       agnosis is more likely to be the true fault explanation,
     tion is shown in figure 3, which uses the standard VLSI       whether there is the need for consider further evidence
     [24] notation to denote AND-gates, OR-gates, and              from the logs or limit the number of diagnoses that
     NOT-gates. Wires are labeled with variable names.             need to be identified. Many policies exist to compute
     Boolean circuits (matching propositional logic), how-         these probabilities [27; 28].
     ever, have limited expressiveness and modeling secu-             For illustration purposes we consider that the diag-


                                                               6


                                                           198
                      Proceedings of the 26th International Workshop on Principles of Diagnosis


noses for the running example are ambiguous. Before            and their diagnostic accuracy in the context of trans-
we discuss methods for dealing with this ambiguity,            parent computing.
we address the major research challenge of model gen-
eration.                                                       7 Conclusions
6.3 Model Generation                                           Identifying the root-cause and perform damage im-
The abstract vulnerability model can either be con-            pact assessment of advanced persistent threats can be
structed manually or semi-automatically. The chal-             framed as a diagnostic problem. In this paper, we dis-
lenge with modeling is that an APT campaign gener-             cuss an approach that leverages machine learning and
ally exploits unknown vulnerabilities. Therefore, our          model-based diagnosis techniques to reason about po-
approach to address this issue is to construct the model       tential attacks.
which captures expected behavior (known goods) of                 Our approach classifies audit trails into high-level
the system. Starting from generic parameterized vul-           activities, and then reasons about those activities and
nerability models and security objectives, the abstract        their threat potential in real-time and forensically. By
vulnerability model can be extended with information           using the outcome of this reasoning to explain com-
related to known vulnerabilities (known bads).                 plex evi- dence of malicious behavior, the system
   Generating the model can be done either manu-               administrators is provided with the proper tools to
ally or semi-automatically. We will explore venues to          promptly react to, stop, and mitigate attacks.
generate this model manually, which requires signif-
icant knowledge about potential security vulnerabili-          References
ties, while being error prone and not detailed enough.
Amongst company specific requirements, we envisage             [1] Kyu Hyung Lee, Xiangyu Zhang, and Dongyan
the abstract vulnerability model to capture the most               Xu. High accuracy attack provenance via binary-
common attacks that target software systems, as de-                based execution partition. In Proceedings of
scribed in the Common Attack Pattern Enumeration                   the 20th Annual Network and Distributed System
and Classification (CAPEC1 ). The comprehensive list               Security Symposium, San Diego, CA, February
of known attacks has been designed to better under-                2013.
stand the perspective of an attacker exploiting the vul-       [2] Kyu Hyung Lee, Xiangyu Zhang, and Dongyan
nerabilities and, from this knowledge, devise appro-               Xu. LogGC: Garbage collecting audit log. In
priate defenses.                                                   Proceedings of the 2013 ACM SIGSAC Confer-
   As modeling is challenging, we propose to explore               ence on Computer and Communications Secu-
semi-automatic approaches to construct models. The                 rity, pages 1005–1016, Berlin, Germany, 2013.
semi-automatic method is suitable to addressing the
modeling because in security, similarly to diagno-             [3] M. Galar, A. Fernández, E. Barrenechea,
sis, there is (1) component models and (2) structure.              H. Bustince, and F. Herrera. A review on ensem-
While it is difficult to automate the building of com-             bles for the class imbalance problem: Bagging-,
ponent models (this may even require natural language              boosting-, and hybrid-based approaches. IEEE
parsing of databases such as CAPEC), it is feasible to             Transactions onSystems, Man, and Cybernetics,
capture diagnosis-oriented information from structure              Part C: Applications and Reviews, 42(4):463–
(physical networking or network communication).                    484, July 2012.
   Yet another approach to semi-automatically gener-           [4] Charu C Aggarwal. Outlier ensembles: Position
ate the model is to learn it from executions of the                paper. ACM SIGKDD Explorations Newsletter,
system (e.g., during regression testing, just before               14(2):49–58, 2013.
deployment). This approach to system modeling is
inspired by the work in automatic software debug-              [5] Jing Gao and Pang-Ning Tan. Converting out-
ging work [29], where modeling of program behav-                   put scores from outlier detection algorithms into
ior is done in terms of abstraction of program traces              probability estimates. In Proceedings of the Sixth
– known as spectra [30], abstracting from modeling                 International Conference on Data Mining, pages
specific components and data dependencies                          212–221. IEEE, December 2006.
   The outlined approaches to construct the abstract           [6] Hans-Peter Kriegel, Peer Kröger, Erich Schu-
vulnerability model entail different costs and diagnos-            bert, and Arthur Zimek. Interpreting and unify-
tic accuracies. As expected, manually building the                 ing outlier scores. In Proceedings of the Eleventh
model is the most expensive one. Note that build-                  SIAM International Conference on Data Mining,
ing the model is a time-consuming and error-prone                  pages 13–24, April 2011.
task. The two semi-automatic ways also entail differ-
ent costs: one exploits the available, static informa-         [7] Erich Schubert, Remigius Wojdanowski, Arthur
tion and the other requires the system to be executed              Zimek, and Hans-Peter Kriegel. On evaluation of
to compute a meaningful set of executions. We will in-             outlier rankings and outlier scores. In Proceed-
vestigate the trade-offs between modeling approaches               ings of the Twelfth SIAM International Confer-
                                                                   ence on Data Mining, pages 1047–1058, April
   1
       http://capec.mitre.org/                                     2012.


                                                           7


                                                        199
                  Proceedings of the 26th International Workshop on Principles of Diagnosis


[8] Hoda Eldardiry, Kumar Sricharan, Juan Liu,                      sis using greedy stochastic search. Journal of Ar-
     John Hanley, Bob Price, Oliver Brdiczka, and                   tificial Intelligence Research, 38:371–413, 2010.
     Eugene Bart. Multi-source fusion for anomaly              [19] Nuno Cardoso and Rui Abreu. A distributed
     detection: using across-domain and across-time                 approach to diagnosis candidate generation. In
     peer-group consistency checks.          Journal of             Progress in Artificial Intelligence, pages 175–
     Wireless Mobile Networks, Ubiquitous Com-                      186. Springer, 2013.
     puting, and Dependable Applications (JoWUA),
                                                               [20] Alexander Feldman, Jurryt Pietersma, and Ar-
     5(2):39–58, 2014.
                                                                    jan van Gemund.          All roads lead to fault
[9] Yoav Freund, Raj D. Iyer, Robert E. Schapire,                   diagnosis: Model-based reasoning with LY-
     and Yoram Singer. An efficient boosting al-                    DIA . In Proceedings of the Eighteenth Belgium-
     gorithm for combining preferences. Journal of                  Netherlands Conference on Artificial Intelli-
     Machine Learning Research, 4(Nov):933–969,                     gence (BNAIC’06), Namur, Belgium, October
     2003.                                                          2006.
[10] Ke Deng, Simeng Han, Kate J Li, and Jun S Liu.            [21] Meera Sampath, Raja Sengupta, Stephane Lafor-
     Bayesian aggregation of order-based rank data.                 tune, Kasim Sinnamohideen, and Demosthenis C
     Journal of the American Statistical Association,               Teneketzis. Failure diagnosis using discrete-
     109(507):1023–1039, 2014.                                      event models. Control Systems Technology, IEEE
[11] Arthur P Dempster, Nan M Laird, and Donald B                   Transactions on, 4(2):105–124, 1996.
     Rubin. Maximum likelihood from incomplete                 [22] Alban Grastien, Marie-Odile Cordier, and Chris-
     data via the EM algorithm. Journal of the royal                tine Largouët. Incremental diagnosis of discrete-
     statistical society. Series B, 39(1):1–38, 1977.               event systems. In DX, 2005.
[12] Johan de Kleer, Olivier Raiman, and Mark                  [23] Alban Grastien, Patrik Haslum, and Sylvie
     Shirley. One step lookahead is pretty good. In                 Thiébaux. Conflict-based diagnosis of discrete
     Readings in Model-Based Diagnosis, pages 138–                  event systems: theory and practice. 2012.
     142. Morgan Kaufmann Publishers, San Fran-
                                                               [24] Behrooz Parhami. Computer Arithmetic: Algo-
     cisco, CA, 1992.
                                                                    rithms and Hardware Designs. Oxford Univer-
[13] Alexander Feldman, Tolga Kurtoglu, Sriram                      sity Press, Inc., New York, NY, USA, 2nd edi-
     Narasimhan, Scott Poll, David Garcia, Johan                    tion, 2009.
     de Kleer, Lukas Kuhn, and Arjan van Gemund.
                                                               [25] Janice Glasgow, Glenn Macewen, and Prakash
     Empirical evaluation of diagnostic algorithm
                                                                    Panangaden. A logic for reasoning about secu-
     performance using a generic framework. In-
                                                                    rity. ACM Transactions on Computer Systems,
     ternational Journal of Prognostics and Health
                                                                    10(3):226–264, August 1992.
     Management, pages 1–28, 2010.
                                                               [26] Kenneth Forbus and Johan de Kleer. Building
[14] Johan de Kleer and Brian Williams. Diagnosing
                                                                    Problem Solvers. MIT Press, 1993.
     multiple faults. Artificial Intelligence, 32(1):97–
     130, 1987.                                                [27] Johan de Kleer. Diagnosing multiple persistent
[15] Oleg Sheyner, Joshua Haines, Somesh Jha,                       and intermittent faults. In Proceeding of the 2009
                                                                    International Joint Conference on Artificial In-
     Richard Lippmann, and Jeannette M Wing. Au-
                                                                    telligence, pages 733–738, July 2009.
     tomated generation and analysis of attack graphs.
     In Proceeding of the 2002 IEEE Symposium                  [28] Rui Abreu, Peter Zoeteweij, and Arjan J. C.
     on Security and Privacy, pages 273–284. IEEE,                  Van Gemund. A new bayesian approach to multi-
     May 2002.                                                      ple intermittent fault diagnosis. In Proceeding of
[16] Seyit Ahmet Camtepe and Bülent Yener. Mod-                     the 2009 International Joint Conference on Arti-
                                                                    ficial Intelligence, pages 653–658, July 2009.
     eling and detection of complex attacks. In Pro-
     ceeding of the Third International Conference on          [29] Rui Abreu, Peter Zoeteweij, and Arjan JC
     Security and Privacy in Communications Net-                    Van Gemund. Spectrum-based multiple fault lo-
     works, pages 234–243, September 2007.                          calization. In Proceedings of the 24th IEEE/ACM
[17] Rui Abreu and Arjan JC van Gemund. A                           International Conference on Automated Soft-
                                                                    ware Engineering, pages 88–99, November
     low-cost approximate minimal hitting set algo-
                                                                    2009.
     rithm and its application to model-based diagno-
     sis. In Proceedings of the Eighth Symposium on            [30] Mary Jean Harrold, Gregg Rothermel, Kent
     Abstraction, Reformulation and Approximation,                  Sayre, Rui Wu, and Liu Yi. An empirical in-
     pages 2–9, July 2009.                                          vestigation of the relationship between spectra
[18] Alexander Feldman, Gregory Provan, and Arjan                   differences and regression faults. Software Test-
                                                                    ing Verification and Reliability, 10(3):171–194,
     van Gemund. Approximate model-based diagno-
                                                                    2000.


                                                           8


                                                       200

</pre>