<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Baert);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Learning Logic Constraints from Demonstration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mattijs Baert</string-name>
          <email>mattijs.baert@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sam Leroux</string-name>
          <email>sam.leroux@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pieter Simoens</string-name>
          <email>pieter.simoens@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Constraint Inference, Learning from Demonstrations, Rule Induction</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Department of Information Technology at Ghent University - imec</institution>
          ,
          <addr-line>Technologiepark 126, Ghent, B-9052</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Autonomous agents operating in real-world settings are often required to eficiently accomplish a task while adhering to certain environmental constraints. For instance, a self-driving car must transport its passengers to their intended destination as fast as possible while complying with trafic regulations. Inverse Constrained Reinforcement Learning (ICRL) is a technique that enables the learning of a policy from demonstrations of expert agents. When these expert agents adhere to the environmental constraints, ICRL thus allows for compliant policies to be learned without the need to define constraints beforehand. However, this approach provides no insight into the constraints themselves although this is desired for safety-critical applications such as autonomous driving. In such settings, it is important to verify what is learned from the given demonstrations. In this work, we propose a novel approach for learning logic rules that represent the environmental constraints given demonstrations of agents that comply with them, thus providing an interpretable representation of the environmental constraints.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Social norms play a crucial role in shaping individual behavior in modern society, promoting
safety and eficiency in human interactions. Artificial agents seeking to integrate into the real
world must also adhere to these norms in order to achieve success [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These norms can be
viewed as constraints on an agent’s behavior, and in the framework of reinforcement learning
(RL), a constraint-abiding agent can be trained by solving a min-max problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], maximizing
the reward function (reflecting the goal) while minimizing the cost function (capturing constraint
violations). However, in complex environments where constraints are implicit or unknown,
it may be necessary to use Inverse Constrained Reinforcement Learning (ICRL) methods to
learn these constraints from expert demonstrations. Current ICRL methods iterate over the
complete state-action space to determine the most likely constraints [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or parameterize the cost
function using a neural network [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. The first group of methods ofers explainability as the
constraints are represented by a set of states (or state-action pairs) [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] or logic rules extracted
from this set [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The second group of techniques ofers scalability to complex problems but has
to concede in terms of interpretability although this is crucial for safety-critical applications.
In this work, we present a novel approach for obtaining an interpretable description of
constraints in environments with a high-dimensional state space. First, we generate states that
2
3
are likely to be constrained from trajectories of agents that adhere to the constraints (i.e.
expert agents). Next, we utilize a rule induction method to learn a set of rules that capture
the constraints in the environment from a set of positive examples (states visited by expert
agents) and negative examples (states generated in the previous step). The final outcome is
a set of logic rules in disjunctive normal form that represent the environmental constraints.
An additional advantage of our method is that it is robust against constraint violations in the
expert demonstrations. This is important when learning from human demonstrations because
it is possible that some human demonstrations are non-compliant with the constraints.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <p>The aim of this study is to learn logical rules that describe the constraints in a specific
environment by utilizing trajectories of constraint-abiding agents (i.e., experts) within that environment.
Anomalous behavior is rare in real-world scenarios, leading to a limited number of constraint
violations in the set of expert trajectories. Therefore, we regard the expert trajectories as a
dataset containing only positive examples. However, many current classification and rule
induction techniques necessitate both positive and negative examples for training. Therefore,
we propose a method for generating negative examples (possible constrained states) given a
model of the unconstrained environment (see Sec. 2.1). We acknowledge it is possible that the
expert dataset also comprises negative examples, which we will address later in the paper. Once
we obtain a dataset comprising both positive and negative labels, rules are learned that can
diferentiate between the positive and negative examples, i.e. it is possible to determine if an
example is positive or negative by evaluating the example on the learned rules (see Sec. 2.2).</p>
      <sec id="sec-2-1">
        <title>2.1. Generating Negative Examples</title>
        <p>Figure 1 depicts the procedure for generating negative examples, the diferent steps are numbered
and referred to from the text. In a first step, (1) we model the distribution of the states visited
by the expert agents   . Since calculating   is intractable for all but very simple environments,
a Variational AutoEncoder (VAE) is trained on the states visited during the expert trajectories
  optimizing the evidence lower bound objective. The VAE consists of an encoder network
modelling the posterior distribution (|) of a latent variable  given the observed state  .
The decoder network maps  back to the state space using the likelihood distribution (|) .</p>
        <p>Literal layer</p>
        <p>Conjunction
layer</p>
        <p>Disjunction
layer
The reconstruction error can then be used as a measure of how likely the input originates
from a distribution similar to   . We assume that a Markov Decision Process (MDP) ℳ of
the unconstrained environment is available, we refer to this as the nominal MDP. Given ℳ,
the optimal nominal policy   is obtained using reinforcement learning (RL) (2). Next, a set
of nominal trajectories   is sampled from the nominal policy   (3). States occurring in
trajectories sampled from   are high-value states since   is optimal. When such a state results
in a high reconstruction error when passed through the trained VAE, this means this state is
not very likely to be visited by the expert. We reason there should be some constraints which
prevents the expert from visiting this high-value state. Following this rationale, we identify
possible constraints as high-value states which are not likely to be visited by the expert thus
resulting in a high reconstruction error. At last, a labeled dataset is build by calculating the
normalized reconstruction error for all states visited during trajectories sampled from both the
nominal policy   and the expert trajectories   (4). Until now, our assumption was that the
expert trajectories do not include any constraint violations. However, when gathering human
trajectories, it is plausible that some constrained states are present in the obtained trajectories.
In cases where the number of constraint violations are minimal, the impact on the learned
distributions by the VAE is insignificant. Consequently, constrained states will still cause a
substantial reconstruction error.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Rule Induction</title>
        <p>
          In this study, we utilize a neural-symbolic architecture that is based on relational rule networks,
as proposed by Kusters et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This fully diferentiable neural network can, after convergence,
be interpreted as a logical formula in disjunctive normal form. An overview of the architecture
is presented in Figure 2. The input of the network is a k-dimensional real-valued vector
representation of the state, denoted as () . The first layer, known as the literal layer, learns
literals as hyperplanes dividing the feature space. The output of this layer is an L-dimensional
vector, denoted as   () ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]  , where each dimension corresponds to the evaluation of one of
the L literals. The conjunction layer produces a C-dimensional vector, denoted as   () ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]  ,
where each dimension is the result of a weighted conjunction of   () . Finally, the last layer
takes a weighted disjunction of all values of   () resulting in a value () ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] indicating if
 is constrained. The use of this architecture allows us to learn using gradient descent while
also having the ability to interpret the rules in a human-understandable format. We refer to
appendix A for details on the implementation and values of hyperparameters.
(a) Expert
10
x
(b) Nominal
10
x
10
x
(a) Ground truth
(b) Rule network
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminary Results</title>
      <p>
        We perform a preliminary experiment on a simple navigation task in a continuous environment.
The agent’s goal is to navigate from the bottom left corner to the top right corner in as few
steps as possible, but some part of the environment is inaccessible, e.g. reflecting a newly laid
lane. The environment is depicted in figure 4a with the ground truth constraints (in yellow).
The nominal policy reflects the desire line that an agent takes which does not adhere to the
constraints. The nominal policy is learned using Proximal Policy Optimization (PPO) [
        <xref ref-type="bibr" rid="ref10 ref20">10</xref>
        ]. To
obtain expert trajectories, we learn the true expert policy using Reward Constrained Policy
Optimization (RCPO) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] given the ground truth constraints. The expert and the nominal
trajectories are depicted in figure 3a and 3b respectively. The state’s vector representation ()
is a two dimensional vector which contains the x- and y-coordinates. Because of the simplicity
of the environment we could iterate over the complete state space and visualize the classification
boundary of the learned classifier (see fig 4b). The following rule is extracted from the network,
defining constrained states:
This corresponds with an intersection over union (IoU) of 0.86 with the ground truth constraints.
We conclude that the learned rule is a good estimate of the ground truth constraints presented
in figure 4a. Section B in the appendix provides additional results on the robustness against
constraints violations by the expert agents.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>
        In this work we outlined a novel method for learning behavioral constraints from expert
demonstrations represented as a logical formula. This is the first method which is able to
learn constraints in environments with a continuous state space while representing the learned
constraints in an interpretable fashion. We presented preliminary results on a simple navigation
task. In future work, we will validate our method on more complex environments with intricate
constraints. This includes real-world trafic scenarios where demonstrations are obtained from
human agents [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. These datasets interface with CommonRoadRL [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] which can
provide the nominal MDP (i.e. model of the unconstrained environment). We could extend this
method to learning constraints in high-order logics by using neural-symbolic classifiers which
can learn first-order logic [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] or signal temporal logic formulae [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Another interesting
directive is on how these logic constraints can be transferred to an autonomous agent for
guaranteeing constraints are never violated. One possibility is to augment the learned logic
formulae on the policy network [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Another interesting use case is anomaly detection by
validating observations on the learned rules.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was partially funded by the Flemish Government (Flanders AI Research Program).</p>
    </sec>
    <sec id="sec-6">
      <title>A. Implementation Details</title>
      <p>
        In this section, we elaborate on the implementation for the experiment presented in section 3.
The VAE is configured with three linear layers each with a ReLU activation function. We train
the VAE for 500 epochs using Adam optimizer [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] with a learning rate of 0.01 and a batch size
of 64. For the relational rule net, we configure  = 10 (number of literal layers) and  = 25
(number of conjunctions). We train this network for 1500 epochs, using Adam optimizer with a
learning rate of 0.001 and a batch size of 64. Other parameters are set to the values mentioned
in the original paper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A weighted random sampler is used to select training samples to
ensure the network is trained on a balanced dataset because the number of valid states (low
reconstruction error) is almost always larger then the number of constrained states encountered
during the obtained trajectories,
      </p>
    </sec>
    <sec id="sec-7">
      <title>B. Additional Results</title>
      <p>We provide additional results on the robustness of our method against constraint violations
by the expert. Figure 5 illustrates the learned constraints in cases where expert trajectories
include a portion of trajectories from agents that ignore the constraints. The following rules
were extracted from the network.</p>
      <p>When 10% of the expert trajectories originate from agents ignoring the constraints:
When 20% of the expert trajectories originate from agents ignoring the constraint:
When 30% of the expert trajectories originate from agents ignoring the constraint:
 &lt; 11 ∧  &gt; 6.2.
 &lt; 10 ∧  &gt; 4.4.
 &lt; 10 ∧  &gt; 13.8.
(2)
(3)
(4)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <article-title>Human compatible: Artificial intelligence and the problem of control</article-title>
          ,
          <source>Penguin</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Mankowitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mannor</surname>
          </string-name>
          ,
          <article-title>Reward constrained policy optimization</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2019</year>
          . URL: https://openreview.net/ forum?id=
          <fpage>SkfrvsA9FX</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Scobee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <article-title>Maximum likelihood constraint inference for inverse reinforcement learning</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2020</year>
          . URL: https://openreview.net/forum?id=BJliakStvH.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Anwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aghasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <article-title>Inverse constrained reinforcement learning</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>7390</fpage>
          -
          <lpage>7399</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gaurav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rezaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Poupart</surname>
          </string-name>
          ,
          <article-title>Benchmarking constraint inference in inverse reinforcement learning</article-title>
          ,
          <source>in: The Eleventh International Conference on Learning Representations</source>
          ,
          <year>2023</year>
          . URL: https://openreview.net/forum?id=vINj_
          <fpage>Hv9szL</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gaurav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rezaee</surname>
          </string-name>
          , G. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Poupart</surname>
          </string-name>
          ,
          <article-title>Learning soft constraints from constrained expert demonstrations</article-title>
          ,
          <source>in: The Eleventh International Conference on Learning Representations</source>
          ,
          <year>2023</year>
          . URL: https://openreview.net/forum?id=8sSnD78NqTN.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Glazier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Loreggia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mattei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rahgooy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Venable</surname>
          </string-name>
          ,
          <article-title>Learning behavioral soft constraints from demonstrations</article-title>
          ,
          <source>in: Workshop on Safe and Robust Control of Uncertain Systems at the 35th Conference on Neural Information Processing Systems (NeurIPS</source>
          <year>2021</year>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Leroux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Simoens</surname>
          </string-name>
          ,
          <article-title>Inverse reinforcement learning through logic constraint inference</article-title>
          ,
          <source>Machine Learning</source>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kusters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Collery</surname>
          </string-name>
          , C. d. S. Marie,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Diferentiable rule induction with learned relational features</article-title>
          ,
          <source>in: NeSy 22: 16th International Workshop on Neural-Symbolic Learning and Reasoning</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Klimov</surname>
          </string-name>
          ,
          <article-title>Proximal policy optimization algorithms</article-title>
          .,
          <source>CoRR abs/1707</source>
          .06347 (
          <year>2017</year>
          ). URL: http://dblp.uni-trier.de/db/journals/corr/ corr1707.html#SchulmanWDRK17.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Clausse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kümmerle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Königshof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stiller</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. de La Fortelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Tomizuka</surname>
          </string-name>
          , INTERACTION Dataset:
          <article-title>An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps</article-title>
          , arXiv:
          <year>1910</year>
          .03088 [cs, eess] (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krajewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Moers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Runde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vater</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Eckstein,</surname>
          </string-name>
          <article-title>The ind dataset: A drone dataset of naturalistic road user trajectories at german intersections</article-title>
          ,
          <source>in: 2020 IEEE Intelligent Vehicles Symposium (IV)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1929</fpage>
          -
          <lpage>1934</lpage>
          . doi:
          <volume>10</volume>
          .1109/IV47402.
          <year>2020</year>
          .
          <volume>9304839</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krajewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kloeker</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Eckstein,</surname>
          </string-name>
          <article-title>The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems</article-title>
          ,
          <source>in: 2018 21st International Conference on Intelligent Transportation Systems (ITSC)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2118</fpage>
          -
          <lpage>2125</lpage>
          . doi:
          <volume>10</volume>
          .1109/ITSC.
          <year>2018</year>
          .
          <volume>8569552</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Krasowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Althof</surname>
          </string-name>
          ,
          <article-title>Commonroad-rl: A configurable reinforcement learning environment for motion planning of autonomous vehicles</article-title>
          ,
          <source>in: IEEE International Conference on Intelligent Transportation Systems (ITSC)</source>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1109/ITSC48978.
          <year>2021</year>
          .
          <volume>9564898</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Neural logic machines</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2019</year>
          . URL: https://openreview.net/forum?id=
          <fpage>B1xY</fpage>
          -hRctX.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Riegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Luus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Makondo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Y.</given-names>
            <surname>Akhalwaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barahona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , et al.,
          <article-title>Logical neural networks</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>13155</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Julius</surname>
          </string-name>
          ,
          <article-title>Neural network for weighted signal temporal logic</article-title>
          ,
          <source>arXiv preprint arXiv:2104.05435</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Teso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. V.</given-names>
            <surname>den Broeck</surname>
          </string-name>
          , A. Vergari,
          <article-title>Semantic probabilistic layers for neuro-symbolic learning</article-title>
          , in: A. H.
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Belgrave</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Cho (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/ forum?id=
          <fpage>o</fpage>
          -
          <lpage>mxIWAY1T8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>10 x 10 x 10 x 10 x</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>