IST-152 Workshop on Intelligent Autonomous Agents for Cyber Defence and Resilience

Enhancing Cyber Defense with Autonomous Agents Managing Dynamic Cyber
                       Deception (Position Paper)

                   Cho-Yu Jason Chiang, Alex Poylisher and Ritu Chadha, Vencore Labs
                            <jchiang, apoylisher, rchadha>@vencorelabs.com
                                      Hasan Cam, Army Research Lab
                                         hasan.cam.civ@mail.mil

1 Introduction
Today’s cyber defenses do not prevent all malicious intrusions, which compromise enterprise network
environments by exploiting both human errors and system vulnerabilities. This state of affairs is likely to
persist in the foreseeable future. Cyber attack tactics such as phishing e-mail, SQL injections, SMB exploits,
cross-site scripting, etc. enable adversaries to inject malware into enterprise networks. After establishing
an initial foothold, malware typically conducts reconnaissance, followed by lateral movements by
compromising other hosts/systems on the network. Although existing cyber defense tools can detect a
wide range of potential intrusion activities, a certain share of the generated alarms are false-positives.
Since administrators of enterprise networks generally lack the resources to investigate every alarm, they
set threshold values for different types of alarms to investigate only those more likely caused by intrusions,
according to the enterprise threat models. This approach allows stealthy malware, in particular Advanced
Persistent Threats (APTs), to stay undetected if they manage to generate no alarm or very few alarms with
potential to be considered false-positives in a very long period of time.

We are investigating novel use of dynamic cyber deception techniques to aid the detection of activities of
stealthy APTs. Our long-term goal is to develop autonomous agents engaged in investigations of every
alarm that could indicate malware activities, rather than taking actions only when preset alarm threshold
values are crossed. To set up the foundation for achieving the above goal, our current research spans the
following areas: (i) minimizing the number of devices/hosts that are observable/accessible from any given
host in order to reduce both the number of different types of alarms and the total number of alarms that
could be raised by cyber security sensors; (ii) enabling automated creation of deceptive network views for
each host and allowing such views to be changed on demand; (iii) using both low-interaction and proactive
honeypots to generate illusive augmented false attack interfaces to increase the chance of detecting
hidden threats; and (iv) investigating novel approaches for developing autonomous agents that manage
the use of cyber deception schemes on-the-fly.

The remainder of this paper is organized as follows. Section 2 provides a discussion about cyber deception
along with tactics that we have developed and plan to leverage, including the generation of deceptive
network views to minimize the number of genuine hosts accessible by a host and the insertion of
honeypots as fake hosts in deceptive network views. Section 3 discusses the currently considered
approach for developing autonomous agents managing cyber deception, game theory problem
formulation, space search heuristics, and our strategy for training autonomous agents. In Section 4 we
present our progress to date with some discussions. We conclude this position paper in Section 5.
2 Cyber Deception
Cyber deception is being considered as an approach to boost cyber defense [1]. In general, cyber
deception approaches provide false information to (hidden) adversaries without significant effect on the
normal cyber activities in enterprise networks. There are multiple research areas under cyber deception,
such as camouflage, disinformation, decoy, etc. [2] With respect to building autonomous cyber deception
agents, our current focus is on making use of the following tactics: Moving Target Defense [3] (camouflage)
and honeypots [4] (decoy). We follow an SDN-based [6] deception approach that can generate network
views for individual hosts, such that: (i) hosts may have very different network addresses, even though
they are connected to the same physical switch, (ii) any network service, (e.g., DNS, e-mail, HTTP, etc.), is
accessed by each host at a different IP address, and (iii) hosts appearing in a view may be true hosts or
honeypots. Next, we describe the SDN-based MTD and honeypots we have been investigating.

2.1 SDN-based Cyber Deception (Camouflage)
Moving target defense has been a heavily researched topic as it provides camouflage for enterprise
networks. In general, MTD techniques change network and device configuration settings on the fly, with
an objective to both confuse (i.e., slow down/dissuade) adversaries and increase the chance of detecting
further adversarial activities. In our research, we are building on top of ACyDS [10], an adaptive cyber
deception system that provides a unique virtual network view to each host in an enterprise network. A
host’s view of its network, including the subnet topology and IP address assignments of reachable hosts
and servers, generally does not reflect the actual network configuration and is different from the view of
any other host in the network. For example, all hosts may send DNS queries to the same DNS server, but
each host sends its requests to an IP address unique for that host, i.e., the address assigned to the DNS
server is valid only in the network view that a host observes. ACyDS can change a node’s network view
with the desired properties on demand. It enforces dynamic network view changes to invalidate, to a
desired degree, the intelligence collected by the adversary from prior reconnaissance activities, as subnet
topology and IP address assignments can be changed in every view update. In a nutshell, ACyDS’s
deception approach (i) deters/delays/directs the adversary’s reconnaissance activities, (ii) encumbers
collusion if multiple hosts have been compromised, and (iii) increases the likelihood and confidence of
detecting the presence of intruders. ACyDS leverages OpenFlow [7] switches and controllers (Open
vSwitch [8] and RYU SDN controller [9] in the current prototype) to consistently handle the most common
network layer protocols used in the enterprise networks, currently including ARP, DHCP, DNS, UDP, TCP,
and ICMP—by dynamically modifying IP header fields at the SDN switch according to the installed flow
rules that implement network views for individual hosts. ACyDS also allows masking of
multicast/broadcast messages such as ARP queries that allow malware to map a network passively;
meanwhile similar but fake multicast/broadcast messages could be sent to direct malware to take actions
against honeypots.
                                                                (c)                                       Web
                           Web                                                                                                Honeypot
                           Server                                                                         Server
                                                                        Mail
       Mail                                                             Server                                                    DNS
       Server                            DNS                                                                                      Server
                                         Server
                                                               Honeypot
       Host 2                                                                                                    DHCP              Honeypot
                                                                                                                 Server
                                DHCP                                     Host 2
                                Server
        Host 1                                                               A Deceptive Network View rendered by the
                                                                                       setup below to Host 2
                                                                                                                                   DHCP
                                                                                                                                  Server
                                                                                                                                 DNS Server
                                                                                              Notify DHCP Derver & DVG
                                                                                   DHCP       of nodes joining the network        Update
                                                                                   Relay                                          configurations
                                                                                 OpenFlow       Read                         Deception
                                                                                                                 Write
                                                                                 Controller                                    View
                                                                                                              Read           Generator
                            Web                                                                        DVDB                  Deception
                            Server                                                                                            Server
                                                                      OpenFlow
       Mail                                                            Switch
       Server                            DNS
                                         Server


                                DHCP
                                Server
         Host 1
                                                                 Host 2            Host 1      Mail            Web           Honeypot
                                                                                              Server          Server


Figure 1. ACyDS enables rendering of different network views to hosts on the same physical network

We illustrate the concept of ACyDS using Figure 1. We assume that both Host 1 and Host 2 are on the
same network, but are presented with two entirely different network views. Host 1’s network topology
is different than Host 2’s, and the IP address of a given server (e.g., HTTP), is different in the two views. In
the figure, Host 2’s view includes 3 honeypots, while Host 1’s view has none. ACyDS’s capability is enabled
by the various components shown in Figure 1(c). Readers interested in the functions of the components
and ACyDS implementation are referred to [10]. We have successfully implemented a proof-of-concept
prototype of ACyDS software and demonstrated its function in 2016.

2.2 Honeypots (Decoy)
Honeypots are fake hosts that are not used to perform any production task. Their primary purpose is to
direct intruders in their lateral movement from already compromised devices to provide more
information to network defenders for determining whether certain hosts have been compromised (low-
interaction honeypots). A secondary purpose is to keep the adversary engaged in fruitless activity (high-
interaction honeypots). There has been a wide array of research activities in this area, mostly are low-
interaction honeypots. Open source (e.g. honeyd [5]) and commercial honeypot products facilitating
honeypot deployment and customization are also available. In our work, we plan to make enhancements
to honeyd such that it can (i) send and receive fake traffic flows with specific purposes such as informing
others via broadcast messages that it is running a particular service and (ii) distinguish whether incoming
packets are from other honeypots or the potential adversary, based on decoding of certain header fields
of packets.

2.3 Combination of Camouflage and Decoy
We consider the combination of camouflage and decoy to enhance cyber security defense a promising
research direction. Existing technologies allow static setup of camouflage and decoy; however, once the
adversary recognizes the scheme and the usage pattern, both tactics become futile. However, providing
dynamic and variable camouflage and ever-changing decoy deployment/placement is a challenge because
of the need to ensure consistent management of changing defense with minimal impact on normal
network operations. Needless to say, humans are ill-fit for this task in any real network, but an
autonomous agent that is able to manage various deception techniques on-the-fly has the potential of
significantly raising the level of difficulty for adversarial reconnaissance.

3 Approach
To manage the combination of camouflage and decoy, an autonomous agent needs to assess changes to
the current system state, determine the next moves, and then adjust the configuration of deception
tactics. Our plan is to develop such agents by using the following approach, illustrated in Figure 2. Thanks
to the isolated network environment that ACyDS provides for each host, the problem can be formulated
as a two-player game between the dynamic deception management agent and a potential adversary on a
host (or a set of hosts). As shown in the figure, the agent receives sensory input from hosts, honeypots,
and SDN controller. Since the network view is controlled by the agent, the number of nodes and therefore
the amount of sensory input is under its control, too. Based on the sensory input the agent receives, it
may decide to keep the current network view for a given host intact, or it can produce a new view using
deception tactics such as changing IP addresses of all the nodes in the network view of a host, inserting
additional honeypots into a network view, configuring honeypots to change their behaviors such as
becoming more interactive and running a database server containing false data, and so on. In particular,
new tactics are used to further investigate potential intrusion events. For example, an alarm is raised by
a real host about a failed connection from another host to a closed port. This could be accidental and
hence a false alarm, or it could be an intentional probing attempt by malware on the probing host. In
addition to collecting information from the probing host to find out which process attempts the
connection, the agent may remove the probed node from the view of the probing host and replace it with
a proxy or a honeypot, or engage another proactive honeypot to communicate with the probing node,
which may get the attention of the malware, if present on the probing host, to probe the newly found
honeypots. If further honeypot probing happens, it would be a strong indicator that malware is present
on the probing node, and additional moves may be planned by the agent.

                                                         Agent
                        Directives to actuators                               Input from sensors


               A host’s
               deceptive
               network
               view                                                   Honeypot                   SDN
                                    Honeypot
                                       Honeypot                        Honeypot
                                                                         Server or
                                     Honeypot                                                  Controller
                                        disguised as
                                            host
                                                                           Host


                                                        Potentially
                                                       compromised
                                                           host


Figure 2. Agent interacting with a potentially compromised host through a deceptive network view
As the reader can easily extrapolate, the game has a large number of possible moves and states. Given
the large state space, we plan to explore the following methodology to train the agent, allowing it to grow
its capability over many simulated games. This will also help us achieve an understanding about the Nash
equilibrium for different state spaces we consider. Our training approach works as follows.

First, with the help of human experts, we plan to build a semi-cognitive synthetic adversary that specializes
in reconnaissance and lateral movement. We will start with the assumption that the adversary is a stealthy
APT, and it makes prudent moves following a given adversary model. Under uncertainties this adversary
may make probabilistic decisions. For the deceptive agent, we plan to explore the concept of deep
learning [12] by starting with a set of deceptive tactics each with multiple configurable parameters and a
basic operation model. The model is expected to evolve through the learning process, which is aided by
simulation in an ACyDS network environment. Like many other reported deep-learning research, we also
plan to review the generated model in the learning process and explore ways to insert experts’ input to
guide the learning process to a certain extent. The simulation will start with a Monto Carlo Tree Search to
perform a search of the game tree that has a gigantic state space. Each game will end either when the
agent successfully identifies the adversary or the adversary successfully compromises another host. This
game, in a sense, can be made somewhat similar to the games of Chess and Go, in which two players
exchange moves. Since the search tree state space is large and cannot be fully traversed, similar to the
strategy adopted by Google’s AlphaGo [13] we will develop evaluation functions to assess possible
outcomes of subtrees that have only been partially traversed. The agent’s learning process will be
supported by the following three metrics, among possibly others: (i) the size of the true attack surface
that is exposed to adversary (to be minimized), (ii) the likelihood that an intruding adversary identifies the
true attack surface and successfully stages attacks (i.e., by making a lateral move to compromise another
host), to be minimized, and (iii) the cost of moves. As different moves have different cost, cost may be
evaluated by using multiple metrics, including CPU cycles, memory, number of IP addresses needed,
number of rules to be used in the SDN switch, and time taken to switch to a different deception tactics.

4 Research Progress and Discussion
To develop an agent for managing cyber deception on-the-fly, we adopt a multi-phased approach. In
Phase 1 we enumerate the following: (i) possible reconnaissance actions that could be used by adversary,
with and without sending probes, (ii) sensor information to be collected from hosts by the agent and (iii)
deception tactics that the agent may take, along with the parameters to configure and ranges of the
parameters. In Phase 2 we plan to develop a stealthy semi-cognitive adversary, a deception-managing
agent, and realistic scenarios first in simulation and then for the CyberVAN testbed [11] (a high-fidelity
testing and evaluation environment) that allow an adversary (synthetic or human) and a deception agent
to be engaged in many rounds of games. In Phase 3 we plan to tune the training environment for the
agent to learn from the game, and investigate how to improve learning efficiency and train an effective
agent making decisions no worse than a human cyber expert.

We are currently at the end of Phase 1, and part of the Phase 2 work is under way, including the
development of a stealthy adversary.

5 Summary
In this paper, we outline the goals of managing dynamic cyber deception, its basic mechanisms and an
approach to creating an autonomous agent to automate the task. Thanks to the use of ACyDS to limit the
scope of the problem space, we are able to significantly reduce the state space compared to a non-ACyDS
network environment. Our plan is to continue this research under the U.S. Army Cyber Security
Collaborative Research Alliance (CRA) program.

6 Acknowledgement
The authors want to thank the U.S. Army Research Laboratory Cyber Security Collaborative Research
Alliance program for supporting this research.

7 References
1. K. L. Tan, “Confronting cyberterrorism with cyber deception”, Thesis, Naval Postgraduate School,
    2003.
2. David Poarch, David O'Leary, Jason Nelson, Anne Grahn, ”Six ways to deceive cyber attackers,”
    http://focus.forsythe.com/articles/337/6-Ways-to-Deceive-Cyber-Attackers.
3. Sushil Jajodia, Anup K. Ghosh, Vipin Swarup, Cliff Wang, X. Sean Wang, ed., “Moving Target Defense:
    Creating Asymmetric Uncertainty for Cyber Threats”, Springer Book, 2011.
4. Lance Spitzner, “Honeypots: Catching the Insider Threat”, Proceedings of Computer Security
    Application Conference, 2003.
5. Honeyd, http://www.honeyd.org”
6. SDN, Software Defined Networking, https://www.opennetworking.org/
7. OpenFlow. http://https://www.opennetworking.org/sdn-resources/openflow, retrieved on 4/9/16.
8. Open vSwitch. http://openvswitch.org/
9. RYU, https://osrg.github.io/ryu/
10. Cho-Yu J. Chiang, Yitzchak Gottlieb, Shridatt J. Sugrim, Ritu Chadha, Constantin Serban, Alex
    Poylisher, Lisa Marvel and Jon Santos, “ACyDS, An Adaptive Cyber Deception System” MILCOM 2016.
11. Ritu Chadha, Thomas Bowen, Cho-Yu J. Chiang, Yitzchak M. Gottlieb, Alex Poylisher, Angelo Sapello,
    Constantin Serban, Shridatt Sugrim, Gary Walther, Lisa Marvel, Allison Newcomb, and Jonathan
    Santos, “CyberVAN: A Cyber Security Virtual Assured Network Testbed”, MILCOM 2016.
12. Jurgen Schmidhuber, “Deep Learning in Neural Networks: An Overview”, Technical Report IDSIA-03-
    14, 2014.
13. D. Silver et al., “Mastering the game of Go with deep neural networks and tree search”, Nature 529,
    484-489, 2016.
14. Tao Ye and Shivkumar Kalyanaraman, “A recursive random search algorithm for large-scale network
    parameter configuration”, Proceedings of 2003 ACM SIGMETRICS Conference, 2003.