<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IST-152 Workshop on Intelligent Autonomous Agents for Cyber Defence and Resilience Combining Game Theory and Learning for Dynamic Network Defense Strategies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Branislav Bosansky</string-name>
          <email>bosansky@fel.cvut.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher Kiekintveld</string-name>
          <email>cdkiekintveld@utep.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence Center Department of Computer Science Faculty of Electrical Engineering Czech Technical University in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science University of Texas at El Paso</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Securing computer networks is a critical ongoing challenge for both private enterprise and government agencies and defense. Computer networks provide an access to valuable systems that can be damaged (e.g., by disabling a physical defense mechanism), and have valuable information that can be exploited (e.g., in a real-world con ict) or used for pro t (e.g., selling a list of stolen credit card information). Attackers exploit the complex, living environment of automated systems and human users performing routine tasks to nd gaps in security and conceal their actions. Interactions between defenders and attackers play out in this rich and ever-changing environment, and improving security poses a fundamental question for defense: How should the defender choose security policies that improve security against sophisticated, adaptive attackers, without violating cost and usability constraints? We highlight three characteristics that make this decision problem particularly challenging. (1) Computer security is adversarial, with multiple decision-makers making decisions that interact and jointly a ect the overall outcome. (2) The interaction is highly dynamic (sequential) in nature, with attackers and defenders able to make observations and react to one another over time. (3) There are many unknown and changing elements of the environment as well as the capabilities and motivations of the other agents. The mathematical framework best suited to reasoning about adversarial decision problems is non-cooperative game theory. Game theory provides a variety of models for describing interactions, as well as solution concepts for nding good (ideally optimal) strategies, and computational sub elds that provide algorithms for computing these solutions. Game theoretic strategies are widely used not only in network security [13, 9] but also in securing critical infrastructures [12] and protecting wildlife [3]. However, many basic game models do not handle the dynamic and unknown elements typical of cybersecurity decisions. We argue the addressing cybersecurity using game theory will require developing and integrating novel techniques from both dynamic game theory and machine learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Cybersecurity domains often involve sequential decisions. Each side performs some action,
expects a reaction from the system or from the opponent, and reacts accordingly. For example,
the attacker may adapt her strategy if one type of the attack fails, or she tries to hide her trace
if there is a suspicion of a detection. The attack can thus be much more strategic and can last
for a longer period of time, which is often in contrast with the types of games used for modeling
physical security. One reason is that gathering information is easier and less risky in a cyber
scenario than in an attack on a physical system (e.g., observing the network tra c costs much
less than conducting on-site reconnaissance, the chance of being detected by the defender is
smaller).</p>
      <p>
        Solving dynamic games (known as extensive-form games that model interactions with nite
steps or stochastic games that model interactions with in nite (or inde nite) number of steps)
is a computationally challenging task. Despite the computational challenges, there exists a
collection of algorithms that can be used to solve dynamic games [
        <xref ref-type="bibr" rid="ref1 ref14 ref15 ref4 ref7">15, 4, 1, 14, 7</xref>
        ] and recent
results show that state-of-the-art algorithms are able to nd super-human strategies in
nontrivial games [
        <xref ref-type="bibr" rid="ref10 ref11 ref2">2, 11, 10</xref>
        ].
      </p>
      <p>Game theory can provide optimal reasoning, but only if a model is fully speci ed and has
only so-called \known unknowns." For example, the defender may not have full information
about the current situation in the network, but has useful beliefs about the possible states the
network could be in, and is able to maintain accurate probabilistic beliefs over these possibilities
as new information is observed. However, it is often the case that a complete model is very
challenging to specify in network security. There may also be \unknown unknowns" that violate
fundamental assumptions, such as an attacker who uses a zero-day exploit that is not known
to the defender. In these situations classical game-theoretic models can be cumbersome and
of limited value, while methods that focus on adaptation and learning are well-suited to these
problems.</p>
      <p>
        When little is known about the environment for a decision problem, the focus is generally
on exploring the environment and learning from the experience to improve future decisions. A
wide variety of machine learning methods have been developed for learning from data, but the
sub eld of reinforcement learning is particularly relevant since it focuses on learning from direct
experience of an environment. A class of learning problems known as multi-armed bandits [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
highlight the balance between exploring for new information and exploiting the current
information to improve decisions, and many algorithms have been developed for optimizing learning
behavior in various contexts. We have recently applied learning methods from the multi-armed
bandit literature to the problem of detecting exploits in cybersecurity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, most
learning methods do not account for the adversarial nature of the cybersecurity domain, and do no
have a strong model of the underlying dynamic structure to guide learning and decision-making.
      </p>
      <p>
        We argue that new research is needed to combine the strengths of game theory and machine
learning to address the novel challenges that arise in cybersecurity decision-making. The
underlying interactions can be e ectively modeled as a dynamic game where there is a competition
between the defender and the attacker, and most of the possible actions and results for these
actions are known. However, there are often signi cant parts of the model that are initially
unknown, or which must react to a changing environment that cannot be fully anticipated.
Therefore, the defender must be able to explore for possibilities that the current game-theoretic
model of the system does not account for and adapt the model accordingly. We have done
preliminary work on combining equilibrium methods with machine learning in the context of border
security [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but much more work is needed to apply these methods in cybersecurity. However,
there is great potential since the combination of these theoretical frameworks can provide new
guarantees on the optimality and adaptivity of decision-making in dynamic adversarial settings,
ultimately allowing defenders to make better decisions to reduce cybersecurity risks.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bosansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiekintveld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lisy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechoucek</surname>
          </string-name>
          .
          <article-title>An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          ,
          <volume>51</volume>
          :
          <fpage>829</fpage>
          {
          <fpage>866</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bowling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Johanson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Tammelin</surname>
          </string-name>
          .
          <article-title>Heads-up limit hold'em poker is solved</article-title>
          .
          <source>Science</source>
          ,
          <volume>347</volume>
          (
          <issue>6218</issue>
          ):
          <volume>145</volume>
          {
          <fpage>149</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tambe</surname>
          </string-name>
          .
          <article-title>When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing</article-title>
          . In
          <source>In Proceedings of 24th International Joint Conference on Arti cial Intelligence (IJCAI)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gilpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pena</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Sandholm</surname>
          </string-name>
          .
          <article-title>First-Order Algorithm with O(ln(1/epsilon</article-title>
          ))
          <article-title>Convergence for epsilon-Equilibrium in Two-Person Zero-Sum Games</article-title>
          .
          <source>Mathematical Programming</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gittins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Glazebrook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Weber</surname>
          </string-name>
          <article-title>. Multi-armed bandit allocation indices</article-title>
          . John Wiley &amp; Sons,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiekintveld</surname>
          </string-name>
          .
          <article-title>Bandits for cybersecurity: Adaptive intrusion detection using honeypots</article-title>
          .
          <source>In AAAI Workshop: Arti cial Intelligence for Cyber Security</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Horak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bosansky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechoucek</surname>
          </string-name>
          .
          <article-title>Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games</article-title>
          . In In Proceedings of AAAI (to appear),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kl</surname>
          </string-name>
          ma, V. Lisy, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiekintveld</surname>
          </string-name>
          .
          <article-title>Combining online learning and equilibrium computation in security games</article-title>
          .
          <source>In International Conference on Decision and Game Theory for Security</source>
          , pages
          <volume>130</volume>
          {
          <fpage>149</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Laszka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Vorobeychik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Koutsoukos</surname>
          </string-name>
          .
          <article-title>Optimal thresholds for intrusion detection systems</article-title>
          .
          <source>In Proceedings of the Symposium and Bootcamp on the Science of Security</source>
          , pages
          <volume>72</volume>
          {
          <fpage>81</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Moravcik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lisy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Morrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Waugh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Johanson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bowling</surname>
          </string-name>
          . DeepStack:
          <article-title>Expert-Level Arti cial Intelligence in No-Limit Poker</article-title>
          . Science,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Maddison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          , G. van den Driessche, J. Schrittwieser,
          <string-name>
            <given-names>I.</given-names>
            <surname>Antonoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Panneershelvam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanctot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dieleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grewe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kalchbrenner</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Graepel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Hassabis</surname>
          </string-name>
          .
          <article-title>Mastering the game of go with deep neural networks and tree search</article-title>
          .
          <source>Nature</source>
          ,
          <volume>529</volume>
          (
          <issue>7587</issue>
          ):
          <volume>484</volume>
          {
          <fpage>489</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tambe</surname>
          </string-name>
          .
          <source>Security and Game Theory: Algorithms</source>
          , Deployed Systems, Lessons Learned. Cambridge University Press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bosansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tambe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechoucek</surname>
          </string-name>
          .
          <article-title>Game-theoretic Resource Allocation for Malicious Packet Detection in Computer Networks</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)</source>
          , pages
          <fpage>902</fpage>
          {
          <fpage>915</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cermak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bosansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Durkota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lisy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiekintveld</surname>
          </string-name>
          .
          <article-title>Using Correlated Strategies for Computing Stackelberg Equilibria in Extensive-Form Games</article-title>
          .
          <source>In Proceedings of AAAI Conference on Arti cial Intelligence</source>
          , pages
          <fpage>439</fpage>
          {
          <fpage>445</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zinkevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Johanson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bowling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Piccione</surname>
          </string-name>
          .
          <article-title>Regret Minimization in Games with Incomplete Information</article-title>
          .
          <source>Advances in Neural Information Processing Systems (NIPS)</source>
          ,
          <volume>20</volume>
          :
          <fpage>1729</fpage>
          {
          <fpage>1736</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>