<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shield Synthesis for Safe Reinforcement Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roderick Bloem TU Graz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Austria</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>You have a reinforcement learning system? Sure, it works great, but does it give you any guarantees? I thought not. We will describe methods to use reactive synthesis to construct runtime enforcement modules (shields) that can ensure that a system works correctly, even if the system has bugs. If the system doesn't have too many bugs, the behavior of the shielded system will stay close to the behavior of the original system. We will show extensions in probabilistic and timed settings.</p>
      </abstract>
    </article-meta>
  </front>
  <body />
  <back>
    <ref-list />
  </back>
</article>