<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Inferring and Conveying Intentionality: Beyond Numerical Rewards to Logical Intentions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Susmit Jha</string-name>
          <email>susmit.jha@sri.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Rushby</string-name>
          <email>rushby@csl.sri.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Laboratory SRI International Menlo Park CA 94025</institution>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Shared intentionality is a critical component in developing conscious AI agents capable of collaboration, self-re ection, deliberation, and reasoning. We formulate inference of shared intentionality as an inverse reinforcement learning problem with logical reward speci cations. We show how the approach can infer task descriptions from demonstrations. We also extend our approach to actively convey intentionality. We demonstrate the approach on a simple grid-world example.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>There are many theories of consciousness; most propose some biological or other
mechanism as a cause or correlate of consciousness, but do not explain what
consciousness is for, nor what it does. We take the contrary approach: we
postulate that consciousness implements or is associated with a fundamental aspect of
human behavior, and then we ask what mechanisms could deliver this capability
and what AI approximations might help explore and validate (or refute) this
speculation.</p>
      <p>
        We postulate that shared intentionality [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is the attribute of human cognition
whose realization requires consciousness. Shared intentionality is the ability of
humans to engage in teamwork with shared goals and plans. There is no doubt
that the unconscious mind is able to generate novel and complex goals and
plans; the interesting question is how are these communicated from the mind of
one individual (let's call her Alice) to those of others so that all can engage in
purposeful collaboration. The goal or plan is generated by some con guration of
chemical and electrical potentials in Alice's neurophysiology and one possibility
is that salient aspects of these are abstracted to yield a concise explanation or
description that Alice can communicate to others by demonstration, mime, or
language. The description is received by the other participants (let's call the
prototypical one Bob) who then interpret or \concretize" it to enrich their own
unconscious neurophysiological con guration so that it is now likely to generate
behaviors that advance the common goal.
      </p>
      <p>
        This account suggests a dual-process cognitive architecture [
        <xref ref-type="bibr" rid="ref1 ref2 ref4">1, 2, 4</xref>
        ] where we
identify consciousness with the upper level (\System 2") that operates on
abstracted representations of salient aspects of the lower, unconscious level
(\System 1"). It can also be seen as a form of Higher-Order Thought (HOT, that is
thoughts about thoughts) and thus related to HOT theories of consciousness [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>We posit that the conscious level is concerned with the construction and
exploitation of shared intentionality: it generates, interprets, and communicates
succinct descriptions and explanations about shared goals and plans. For
succinctness, it operates on abstracted entities|symbols or concepts|and
presumably has some ability to manipulate and reason about these. When Alice builds a
description to communicate to Bob, she must consider his state of knowledge and
point of view, and we might suppose that this \theory of mind" is represented
in her consciousness and parameterizes her communication.</p>
      <p>We noted that Alice could communicate to Bob by demonstration, mime (i.e.,
demonstration over symbols), or language. For the latter two, Alice must have the
abstracted description in her consciousness, but it is possible that demonstration
could be driven directly by her unconscious: we have surely all heard or said \I
cannot explain it, but I can show you how to do it." In fact, it could be that
Alice constructs her abstraction by mentally demonstrating the task to herself.</p>
      <p>In this paper, we focus on demonstration as a means for communication and
construction of abstract descriptions. In particular, we investigate how AI agents
could use demonstrations to construct approximations to shared intentionality
that allow them to engage in teamwork with humans or other AI agents, and to
understand the activities of their own lower-level cognitive mechanisms.</p>
      <p>The computer science topic that seems most closely related to the task of
inferring intentionality is inverse reinforcement learning (IRL). In classical IRL,
the objective is to learn the reward function underlying the (System 1)
behavior exhibited in the demonstrations. Here, we employ an extension to IRL that
infers logical speci cations that can enable self-re ective analysis of learned
information, compositional reasoning, and integration of learned knowledge, which
enable the System 2 functions of a conscious AI agent.</p>
      <p>
        While modern deep learning methods [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] show great promise in building
AI agents with human-level System 1 cognitive capabilities for some tasks [
        <xref ref-type="bibr" rid="ref7 ref8">7,
8</xref>
        ], and decades of research in automated reasoning [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] can be exploited for
logical deduction in System 2, our goal is to bridge these levels by inferring
and conveying logical intentions. In this paper, we build on previous work on
logical speci cation mining, including our own recent work [11{13]. The key
novel contributions of this paper are:
{ Formulating intentionality inference as IRL with logical reward speci cation.
{ Methods for actively seeking and conveying intentions.
{ Demonstration of the proposed approach on a simple grid-world example.
      </p>
      <p>In Section 2, we formulate the problem of inferring intentionality as an inverse
reinforcement learning problem and point out the de ciencies of using numerical
rewards to represent intentions. In Section 3, we present an inverse reinforcement
learning method for logical speci cations, and illustrate how it can be used to
infer intentionality. We extend our approach to convey intentionality interactively
in Section 4, and conclude in Section 5 by discussing the current limitations.</p>
    </sec>
    <sec id="sec-2">
      <title>IRL and Intentionality Inference</title>
      <p>
        In traditional Inverse Reinforcement Learning (IRL) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], there is a learner and
a demonstrator. The demonstrator operates in a stochastic environment (e.g., a
Markov Decision Process), and is assumed to attempt to (approximately)
optimize some unknown reward function over its behavior trajectories. The learner
attempts to reverse engineer this reward function from the demonstrations. This
problem of learning rewards from the demonstrations can be cast as a Bayesian
inference problem [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to predict the most probable reward function. Ideally, this
reward function encodes the intentionality of the demonstrator and enables the
observer to understand the goal behind the demonstrations.
      </p>
      <p>This classical form of IRL can be seen as a communication at Level 1: that
is, of an opaque low-level representation. We enrich this communication to allow
inference of reasoning-friendly representations such as logical speci cations that
are suitable for Level 2 manipulation. Once the agent has learned the goal in this
form, it can use its own higher-level skills and knowledge to achieve or contribute
to the goal, either independently or composed with other goals. Further, the
agent also can use this representation to collaborate and plan activities with
other agents as illustrated in Figure 1.</p>
      <p>
        Logical speci cation mining has been studied in the traditional formal
methods community [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] including our own past work [
        <xref ref-type="bibr" rid="ref12 ref13 ref15">12, 15, 13</xref>
        ], but these methods
are not robust to noise and rely on intelligent oracles to produce behaviors that
cover the space of legal behaviors for the speci cation. This is not realistic for
general AI problems where demonstrations such as handing over a glass of
water, or crossing a street, are inherently noisy. In contrast, IRL algorithms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
formulate this inference procedure using the principle of maximum entropy [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
This results in a likelihood of inferred reward over the demonstrations which
is no more committed to any particular behavior than is required for matching
the empirically observed reward expectation. Traditionally, this approach was
limited to structured scalar rewards, often assumed to be linear combinations of
feature vectors. But more recently, these have been adapted to arbitrary
function approximators such as Gaussian processes [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and neural networks [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
While powerful, these existing IRL methods provide no principled mechanism
for composing or reasoning with the resulting rewards. The inference of intention
as numerical reward function lacks a form that is amenable for self-re ection and
collaboration, and has several limitations:
{ First, numerical reward functions lack logical structure, making it di cult to
reason over them|which is critical for self-re ection: a conscious AI agent
must be able to analyze its understanding of intention. This inference of
intention could be from behaviors (either real or mental rehearsals) of its
own low-level cognitive system, or from behaviors of other conscious agents.
{ Second, combining numerical rewards to understand intention in a
compositional manner is di cult. Demonstrations for two tasks can be learned
individually using numerical rewards but these cannot be combined by the
AI agent to perform the tasks in a concurrent or coordinated manner. A
conscious AI agent cannot just infer each task's intention separately, but
needs a global view of its own inference and understanding.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>IRL with Logical Intention Discovery</title>
      <p>
        In this section, we brie y summarize how our recent work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] on inferring logical
speci cations in IRL can be used to answer the foundational Question 1 stated
below. This is the rst step required to build self-aware and self-re ective AI
agents capable of inferring and conveying intentions.
      </p>
      <p>
        Question 1. How does Alice infer logical speci cation of intention by
observing a set of demonstrative behaviors (either Alice's own behavior
generated by lower-level cognitive engines, or that of another agent)?
We assume that the demonstrator (Alice or Bob) operates within a Markov
Decision Process and the speci cation of the intent is a bounded trace
property. More precisely, we de ne a demonstration/trajectory, , to be a sequence
of state-action pairs. Alice attempts to infer past-time linear temporal logic
(PLTL) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] from the demonstrations. Such a PLTL property, , can be
identi ed as a binary non-Markovian reward function : ! 1 if j= , and
0 otherwise. The candidate set of speci cations corresponding to the space of
possible intentions is denoted by . Inferring intention from demonstrations in
the set X can be formulated as a maximum posterior probability inference
problem: = arg max 2 P r( jX). Under assumptions of uniform prior over the
intention space, and applying maximum entropy principle (see [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for technical
details), the posterior probability of a speci cation is given by:
      </p>
      <p>
        P r( jM; X; ) / 1[
^] exp jXj DKL(B( )jjB( ^))
where M is the stochastic dynamics model known to the agent, X is the set
of demonstrations, denotes the average number of times the speci cation
was satis ed by the demonstrations, ^ denotes the average number of times the
speci cation is satis ed by a random sequence of actions, and DKL denotes the
KL divergence between the two Bernoulli distributions denoted by B. Intuitively,
the rst component is an indicator function that the demonstrator is better
than random, and the second component measures the information gain over
the random actions. We can obtain the most likely logical speci cation from a
set of demonstrations by maximizing the posterior probability. An algorithm for
this optimization using partitioning of the logical speci cations is presented in
our previous work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>We use a simple grid world example to
demonstrate this approach illustrated in Figure 2. In this
task, the agent moves in a discrete gridworld and can
take actions to move in the cardinal directions (north,
south, east, west). Further, the agent can sense
abstract features of the domain represented as colors.</p>
      <p>
        The task is to reach any of the yellow (recharge) tiles
without touching a red tile (lava) { we refer to this
sub-task as YR. Additionally, if a blue tile (water)
is stepped on, the agent must step on a brown tile
(drying tile) before going to a yellow tile { we refer
to this sub-task as BBY. The last constraint requires Fig. 2.
recall of two state bits of history (and is thus not
Markovian and infeasible to learn using traditional IRL): one bit for whether
the robot is wet and another bit encoding if the robot recharged while wet.
Demonstrations correspond to simultaneously satisfying both requirements. The
space of logical speci cations [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] consist of PLTL properties using atomic
propositions that indicate the nature of the square on which the robot is at
a given instant. These demonstrations are interesting because they incidentally
include noisy demonstrations for incorrect intentions, for instance, the robot
should wet and dry itself before charging. But our algorithm using max entropy
principle infers the following correct requirement using approximately 95
seconds and after exploration of 172 '^ candidates ( 18% of the concept class):
      </p>
      <p>
        F H:red ^ O yellow ^ H (yellow ^ O blue) ) (:blue S brown) , where
H is \historically," O is \once," and S is \since" [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Passive Inference to Active Transfer of Intention</title>
      <p>A conscious agent must be capable of active transfer of intention beyond passive
inference of intent discussed above. Such active intent transfer includes:
Question 2. How does Alice infer (and then correct) a gap in the logical
speci cation of her intention learned by Bob ?
Question 3. How does Alice seek clarifying behaviors from Bob to
disambiguate her currently inferred intentions of Bob ?</p>
      <p>The key to addressing both questions lies in de ning a divergence measure
over the set of candidate speci cations representing possible intention. One such
divergence measure is the ratio of log likelihoods of two speci cations and 0:
D( ; 0) = log(P r( jM; X; )=P r( 0jM; X; 0))
= DKL(B( )jjB( ^))</p>
      <p>DKL(B( 0)jjB( ^0))
We also assume both Alice and Bob have common intent inference mechanism
which allows them to run the algorithm over demonstrations, and infer what the
other agent might be concluding so far. Extension of this approach to agents
who use di erent background knowledge, and will have noisy simulation of the
other agent's intention inference mechanism is beyond the scope of this paper.</p>
      <p>To demonstrate the use of this divergence measure, we consider a scenario
where the demonstrations on the grid-world are restricted to a subset X0 of
original set X, and X0 does not contain any trajectories going through blue or brown
tiles. Using these demonstrations, Alice infers Y R H:red ^ O yellow as the
most likely explanation, which only corresponds to the sub-task of avoiding lava
and reaching the recharge tile. Alice can evaluate other speci cations and, if
there are other candidate speci cations with low divergence measure, she can
attempt to disambiguate her inferred intent. Let us say one such speci cation is</p>
      <p>
        H:red ^ O yellow ^ O blue. Alice can generate demonstrations consistent
with this speci cation by planning from temporal logic [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. These
demonstrators will pass through wet blue tiles, and reach recharge without visiting brown
drying tiles. Bob runs the intent inference approach on these demonstrations
to realize that Alice has inferred , and not the intended Y R. He can
provide additional behaviors (for e.g., the original set jXj) that help disambiguate
both speci cations. This is continued until Alice converges to F , and all other
candidate speci cations having high divergence from F .
      </p>
    </sec>
    <sec id="sec-5">
      <title>5 Conclusion</title>
      <p>In this paper, we presented a rst step towards building AI agents capable of
inferring and conveying intentionality as logical speci cations. The goal is to
develop AI agents that not only learn intentions of other agents from
demonstrations, or their own intentions by observing actions of lower-level cognitive
engines, but also to provide and seek clari cations when inferred intentions are
ambiguous. Our proposed approach is currently limited to behaviors which are
represented as time traces, and intentions that can be expressed in temporal
logic. But several creative tasks such as proving theorems or writing a mystery
novel cannot be easily formulated in this framework. A hierarchical
representation mechanism that can exploit the inferred intentions and goals to
compositionally learn new intentions is essential to building self-aware self-re ective AI
that can collaborate to perform creative endeavors.</p>
      <p>Acknowledgement: The authors acknowledge support from the National Science
Foundation(NSF) Cyber-Physical Systems #1740079 project, NSF Software &amp;
Hardware Foundation #1750009 project, and US ARL Cooperative Agreement
W911NF-17-2-0196 on Internet of Battle Things (IoBT).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>J.S.B.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stanovich</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          :
          <article-title>Dual-process theories of higher cognition: Advancing the debate</article-title>
          .
          <source>Perspectives on Psychological Science</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>223</volume>
          {
          <fpage>241</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Frankish</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Dual-process and dual-system theories of reasoning</article-title>
          .
          <source>Philosophy Compass</source>
          <volume>5</volume>
          (
          <issue>10</issue>
          ),
          <volume>914</volume>
          {
          <fpage>926</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gennaro</surname>
          </string-name>
          , R.J.:
          <source>Higher-Order Theories of Consciousness: An Anthology, Advances in Consciousness Research</source>
          , vol.
          <volume>56</volume>
          . John Benjamins Publishing (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kahneman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Thinking, Fast and Slow. Farrar, Straus and Giroux (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Tomasello</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carpenter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Call</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Behne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moll</surname>
          </string-name>
          , H.:
          <article-title>Understanding and sharing intentions: The origins of cultural cognition</article-title>
          .
          <source>Behavioral and Brain Sciences</source>
          <volume>28</volume>
          (
          <issue>5</issue>
          ),
          <volume>675</volume>
          {
          <fpage>691</fpage>
          (
          <year>2005</year>
          ),
          <article-title>see also the commentaries on pages 691{721 and the authors' response \In Search of the Uniquely Human" on pages 721{727</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Havelund</surname>
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rosu</surname>
            <given-names>G.</given-names>
          </string-name>
          <article-title>E cient Monitoring of Safety Properties</article-title>
          .
          <source>Journal of STTT</source>
          , Volume
          <volume>6</volume>
          (
          <issue>2</issue>
          ), pp
          <fpage>158</fpage>
          -
          <lpage>173</lpage>
          .
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sermanet</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anguelov</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rabinovich</surname>
            <given-names>A..</given-names>
          </string-name>
          <article-title>Going deeper with convolutions</article-title>
          .
          <source>CVPR</source>
          , pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Taigman</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>and Wolf L..</surname>
          </string-name>
          <article-title>Deepface: Closing the gap to human-level performance in face veri cation</article-title>
          .
          <source>In CVPR</source>
          , pp.
          <volume>1701</volume>
          {
          <fpage>1708</fpage>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bengio</surname>
            <given-names>Y.</given-names>
          </string-name>
          , and Hinton G..
          <article-title>Deep learning</article-title>
          .
          <source>Nature 521.7553</source>
          (
          <year>2015</year>
          ):
          <fpage>436</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voronkov</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (Eds.). (
          <year>2001</year>
          ).
          <source>Handbook of automated reasoning</source>
          (Vol.
          <volume>1</volume>
          ). Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Vazquez-Chanlatte</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jha</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiwari</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Seshia S</surname>
          </string-name>
          .
          <article-title>A. Learning Task Speci cations from Demonstrations</article-title>
          . In NeurIPS,
          <year>2018</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jha</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tiwari</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seshia</surname>
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shankar</surname>
            <given-names>N.</given-names>
          </string-name>
          , and Sahai T..
          <article-title>TeLEx: Passive STL Learning Using Only Positive Examples</article-title>
          ,
          <string-name>
            <surname>RV</surname>
          </string-name>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jha</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahai</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raman</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Pinto</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Francis</surname>
          </string-name>
          .
          <article-title>Explaining AI Decisions Using E cient Methods for Learning Sparse Boolean Formulae</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <year>2018</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raman</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sadigh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Seshia</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Safe autonomy under perception uncertainty using chance-constrained temporal logic</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          ,
          <volume>60</volume>
          (
          <issue>1</issue>
          ),
          <fpage>43</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jha</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahai</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raman</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinto</surname>
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Francis</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>On Learning Sparse Boolean Formulae For Explaining AI Decisions</article-title>
          ,
          <source>NASA Formal Methods</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ng</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russell</surname>
            <given-names>S.</given-names>
          </string-name>
          , et al.
          <article-title>Algorithms for inverse reinforcement learning</article-title>
          .
          <source>In ICML</source>
          , pages
          <volume>663</volume>
          {
          <fpage>670</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ramachandran</surname>
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Amir E.</surname>
          </string-name>
          <article-title>Bayesian inverse reinforcement learning</article-title>
          .
          <source>IJCAI'07.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Shoham</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahav</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fink</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pistoia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Static speci cation mining using automata-based abstractions</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          ,
          <volume>34</volume>
          (
          <issue>5</issue>
          ),
          <volume>651</volume>
          {
          <fpage>666</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ziebart B. D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Maas</surname>
            <given-names>A. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagnell</surname>
            <given-names>J. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dey</surname>
            <given-names>A. K.</given-names>
          </string-name>
          <article-title>Maximum entropy inverse reinforcement learning</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>8</volume>
          , pages
          <fpage>14331438</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Jaynes</surname>
            <given-names>E. T.</given-names>
          </string-name>
          <article-title>Information theory and statistical mechanics</article-title>
          .
          <source>Physical review</source>
          ,
          <volume>106</volume>
          (
          <issue>4</issue>
          ):
          <fpage>620</fpage>
          ,
          <year>1957</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Levine</surname>
            <given-names>S.</given-names>
          </string-name>
          , Popovic
          <string-name>
            <surname>Z.</surname>
          </string-name>
          , and Koltun V.
          <article-title>Nonlinear inverse reinforcement learning with gaussian processes</article-title>
          .
          <source>In NIPS</source>
          , pages
          <year>1927</year>
          .
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Finn</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levine</surname>
            <given-names>S.</given-names>
          </string-name>
          , and Abbeel P.
          <article-title>Guided cost learning: Deep inverse optimal control via policy optimization</article-title>
          .
          <source>In ICML</source>
          , pages
          <volume>49</volume>
          {
          <fpage>58</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Manna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pnueli</surname>
            <given-names>A.</given-names>
          </string-name>
          <article-title>The temporal logic of reactive and concurrent systems: Speci cation</article-title>
          .
          <source>Springer Science and Business Media</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>