<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adaptive Autonomous Agent Responses to Targeted Malware Attacks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Steven Noel The MITRE Corporation McLean</string-name>
          <email>snoel@mitre.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Virginia USA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arun Lakhotia Cythereal</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LLC Lafayette</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Louisiana USA</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>We describe the application of machine learning and data mining techniques for defensive autonomous agent responses to targeted cyberattacks. This approach clusters and classifies captured enterprise malware, and fuses the inferred classes with other relevant threat data to detect targeted attacks indicative of malware delivery attempts by advanced persistent adversaries. Defensive autonomous agents, whose behaviors are learned through specialized process modeling algorithms, are then improved through our enhanced situational knowledge of targeted attacks. The autonomous agents are guided by high-level process models, with which human operators interact for orchestrating lower-level autonomous agents. Agent responses are focused on protecting critical cyber assets, leveraging knowledge of potential paths of exploitation through the network. We employ agent-based simulation for rapid testing and refinement of process orchestration and agent behaviors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.0</p>
    </sec>
    <sec id="sec-2">
      <title>RESEARCH CHALLENGES</title>
      <p>
        Organizations undergo continual attacks from a range of threat actors with varying capabilities and intent. When
defenders are better able to understand their adversaries, they are better able to respond. Indeed, it has been
recognized that incident response would greatly benefit from improved capabilities for malware analysis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Among the most serious threats are advanced adversaries who are targeting a specific organization. Attackers
morph their malware to help evade detection by anti-malware systems, and target different individuals in the
organization with different malware delivery methods. Since malware campaigns are tracked using hashes
computed using byte code of malware, morphing of the malware also makes it difficult to recognize targeted
campaigns. Security operations suffer from largely manual processes for correlating attack indicators, making it
difficult to keep pace with adversary activities.
      </p>
      <p>There are numerous open problems in recognizing targeted malware attacks and mounting adaptive autonomous
responses against them. While advanced methods exist for computing malware semantic similarity, malware is
readily shared through cybercrime industry, so that features derived from other malware artifacts and context are
needed for distinguishing among adversaries. Knowledge about defensive responses by human operators needs to
be captured by autonomous agents, and continually updated as defensive processes improve; operators also need
to interrogate and orchestrate autonomous agents when needed.</p>
      <p>Ideally, attack responses should ideally be guided by paths of potential adversary movement through the network,
and focused on protecting critical cyber assets. Overall, this problem involves rich webs of interrelated data, which
requires a flexible and manageable knowledge base to be maintained and shared among defensive agents. There
are also scalability issues, since the space of malware is large and their correlations scale quadratically, as do other
kinds of relationships such as potential adversary paths.</p>
      <p>The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to
convey or imply MITRE's concurrence with, or support for, the positions, opinions, or viewpoints expressed by the
author.
2.0</p>
    </sec>
    <sec id="sec-3">
      <title>APPROACH</title>
      <p>
        We propose an intelligent autonomous agent
architecture for recognizing and responding to
targeted malware attacks. Through unsupervised
learning on malware features, this discover clusters of
related malware indicative of targeted attacks, and
then classifies those clusters according to known
adversaries. For this, we can leverage our previous
work in symbolic interpretation to extract generalized
semantics from binaries, for fast similarity matching
against large malware repositories [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These
malware inferences can be fused with other threat
information (delivery mechanisms, social
engineering employed, known threat actor behaviors,
etc.), for more accurate and fine-grained classification
of adversary campaigns. This enhanced situational
awareness can guide the responses of our defensive
autonomous agents, e.g., applying similar responses
for similar attacks.
We propose to automate the learning of response behaviors through process mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This extracts hierarchical
Markov models from cyber defender event logs, for learning patterns in defender operational processes (e.g.,
adversary hunters). We then map the discovered lower-level processes to autonomous agents, which communicate
with high-level process models for organizational vetting and agent orchestration. The autonomous agents are
informed by our knowledge of targeted attacks. Through Monte Carlo simulation and machine learning, the agent
models are trained to adapt to different targeted attack situations.
      </p>
      <p>
        Through machine learning of optimal processes, we thus adapt the autonomous agents to best respond to targeted
attacks. We define orchestration processes that capture the high-level flow of an organization’s security operations.
An agent-based simulation framework then simulates attacker and defender agents, which generate simulation
event logs for further iterations of process refinement. The agent responses are guided by an understanding of
potentially exploitable paths through the enterprise network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], as well as historical patterns of communication
among mission-critical cyber assets [5]. For knowledge management and situational awareness within this
complex web of interrelated information, we leverage our previous work in representing such interrelationships as
a knowledge graph [6], with flexible schema-free design and ad hoc query-based analysis and visualization.
      </p>
      <sec id="sec-3-1">
        <title>Detection</title>
      </sec>
      <sec id="sec-3-2">
        <title>Agents</title>
      </sec>
      <sec id="sec-3-3">
        <title>Detection</title>
      </sec>
      <sec id="sec-3-4">
        <title>Agents</title>
      </sec>
      <sec id="sec-3-5">
        <title>Response</title>
      </sec>
      <sec id="sec-3-6">
        <title>Agents</title>
      </sec>
      <sec id="sec-3-7">
        <title>Response</title>
      </sec>
      <sec id="sec-3-8">
        <title>Agents</title>
        <sec id="sec-3-8-1">
          <title>Detected</title>
        </sec>
        <sec id="sec-3-8-2">
          <title>Malware</title>
        </sec>
      </sec>
      <sec id="sec-3-9">
        <title>Malware</title>
      </sec>
      <sec id="sec-3-10">
        <title>Jail</title>
        <sec id="sec-3-10-1">
          <title>Quarantined</title>
        </sec>
        <sec id="sec-3-10-2">
          <title>Malware</title>
        </sec>
      </sec>
      <sec id="sec-3-11">
        <title>Security</title>
      </sec>
      <sec id="sec-3-12">
        <title>Posture</title>
      </sec>
      <sec id="sec-3-13">
        <title>Mission</title>
      </sec>
      <sec id="sec-3-14">
        <title>Mapping</title>
        <sec id="sec-3-14-1">
          <title>Exploitation</title>
        </sec>
        <sec id="sec-3-14-2">
          <title>Paths</title>
        </sec>
        <sec id="sec-3-14-3">
          <title>Service</title>
        </sec>
        <sec id="sec-3-14-4">
          <title>Dependencies</title>
        </sec>
      </sec>
      <sec id="sec-3-15">
        <title>Process</title>
      </sec>
      <sec id="sec-3-16">
        <title>Mining</title>
        <sec id="sec-3-16-1">
          <title>Response</title>
        </sec>
        <sec id="sec-3-16-2">
          <title>Processes</title>
        </sec>
        <sec id="sec-3-16-3">
          <title>Response</title>
        </sec>
        <sec id="sec-3-16-4">
          <title>Decisions</title>
        </sec>
      </sec>
      <sec id="sec-3-17">
        <title>Response</title>
      </sec>
      <sec id="sec-3-18">
        <title>Orchestrator</title>
        <sec id="sec-3-18-1">
          <title>Threat</title>
        </sec>
        <sec id="sec-3-18-2">
          <title>Intelligence</title>
        </sec>
        <sec id="sec-3-18-3">
          <title>Malware</title>
        </sec>
        <sec id="sec-3-18-4">
          <title>Similarities</title>
        </sec>
      </sec>
      <sec id="sec-3-19">
        <title>Threat Intelligence</title>
      </sec>
      <sec id="sec-3-20">
        <title>Services</title>
      </sec>
      <sec id="sec-3-21">
        <title>Automated</title>
      </sec>
      <sec id="sec-3-22">
        <title>Unpacking</title>
        <sec id="sec-3-22-1">
          <title>Unpacked</title>
        </sec>
        <sec id="sec-3-22-2">
          <title>Malware</title>
        </sec>
      </sec>
      <sec id="sec-3-23">
        <title>Semantic</title>
      </sec>
      <sec id="sec-3-24">
        <title>Analysis</title>
        <sec id="sec-3-24-1">
          <title>Malware</title>
        </sec>
        <sec id="sec-3-24-2">
          <title>Semantics</title>
        </sec>
      </sec>
      <sec id="sec-3-25">
        <title>Similarity</title>
      </sec>
      <sec id="sec-3-26">
        <title>Analysis</title>
      </sec>
      <sec id="sec-3-27">
        <title>Trusted</title>
      </sec>
      <sec id="sec-3-28">
        <title>Partners</title>
        <sec id="sec-3-28-1">
          <title>Shared</title>
        </sec>
        <sec id="sec-3-28-2">
          <title>Semantics</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>PREVIOUS WORK</title>
      <p>
        Previous work under the DARPA Cyber Genome Program has yielded a fast and robust capability for malware
analysis and attribution based on “genomic correlation” [7] [8] [9]. That capability has subsequently been extended
for mining relationships over numerous malware artifact types, including code, code semantics, dynamic
behaviors, malware metadata, distribution sites, and e-mails [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and is available commercially as Cythereal
MAGIC [10]. We leverage this mature capability for unpacking malware, extracting semantic “juice” (generalized
semantics), and for performing similarity analysis over large malware corpuses.
      </p>
      <p>Some cybersecurity tool vendors have begun incorporating machine learning techniques for malware detection.
Such tools perform binary classification, in which malware-related events are classified as either “good” or “bad,”
e.g., based on surface-level (bytecode and file structure) analysis or behavioural analysis from log data. These
capabilities are inadequate for our purposes. Beyond classification (supervised learning), we require unsupervised
clustering for detecting patterns of coordinated malware attacks, through using features that characterize semantic
invariants of related malware. This is exactly what Cythereal MAGIC provides, through a combination of dynamic
analysis to unpack malware and static analysis to extract features and compute malware similarities.
There are many sources available (both commercial and open source) for shared threat intelligence that can
enhance the discriminatory power of detecting and classifying targeted attacks. The area of process modeling is
well established, including tools for process mining algorithms and standardized languages for expressing process
mining inputs (event logs). Agent-based simulation frameworks are also available, as well as process modeling
and simulation tools.</p>
      <p>STO-MP-IST-148
4.0</p>
    </sec>
    <sec id="sec-5">
      <title>IMPACT</title>
      <p>This line of research addresses many challenges in cyber defensive operations, in terms of enhanced situational
awareness of targeted attacks by advanced adversaries, autonomous agents for rapid adaptive attack response, and
high-level orchestration of low-level autonomous agents. In combining these aspects, the value of this synergistic
solution is greater than the sum of its parts.
5.0
[5] S. Musman, "Automagical Cyber Dependency Mapping," The MITRE Corporation.
[6] S. Noel, E. Harley, K. H. Tam, M. Limiero and M. Share, "CyGraph: Graph-Based Analytics and
Visualization for Cybersecurity," in Cognitive Computing: Theory and Applications, Handbook of
Statistics 35, Elsevier, 2016.
[7] A. Lakhotia, M. Preda and R. Giacobazzi, "Fast Location of Similar Code Fragments using Semantic
'Juice'," in 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, 2013.
[8] M. Preda, R. Giacobazzi, A. Lakhotia and I. Mastroeni, "Abstract Symbolic Automata: Mixed
Syntactic/Semantic Similarity Analysis of Executables," ACM SIGPLAN Notices, vol. 50, no. 1, pp.
329-341, 2015.
[9] A. Pfeffer, C. Call, J. Chamberlain, L. Kellogg, J. Ouellette, T. Patten, G. Zacharias, A. Lakhotia, S.</p>
      <p>Golconda, J. Bay, R. Hall and D. Scofield, "Malware Analysis and Attribution using Genetic
Information," in 7th IEEE International Conference on Malicious and Unwanted Software, 2012.
[10] Cythereal, "Changing the Rules of Cyber Engagement," [Online]. Available: http://www.cythereal.com.</p>
      <p>[Accessed 5 June 2017].
Noel and Lakhotia</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pingree</surname>
          </string-name>
          ,
          <article-title>MacDonald and Neil, "Best Practices for Mitigating Advanced Persistent Threats,"</article-title>
          <source>Gartner Report G00224682</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Miles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lakhotia</surname>
          </string-name>
          , C. LeDoux,
          <string-name>
            <given-names>A.</given-names>
            <surname>Newsom</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Notani</surname>
          </string-name>
          ,
          <article-title>"VirusBattle: State-of-the-Art Malware Analysis for Better Cyber Threat Intelligence,"</article-title>
          <source>in 7th IEEE International Symposium on Resilient Control Systems</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Szimanski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ralha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wagner</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <article-title>"Improving Business Process Models with Agentbased Simulation and Process Mining,"</article-title>
          <source>in Enterprise, Business-Process and Information Systems Modeling</source>
          , Springer,
          <year>2013</year>
          , pp.
          <fpage>124</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Noel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Harley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Tam</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gyor</surname>
          </string-name>
          ,
          <article-title>"Big-Data Architecture for Cyber Attack Graphs: Representing Security Relationships in NoSQL Graph Databases,"</article-title>
          <source>in IEEE Symposium on Technologies for Homeland Security (HST)</source>
          , Boston, Massachusetts,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>