<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis,
November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Clock Specifications for Temporal Tasks in Planning and Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe De Giacomo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Favorito</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Patrizi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Banca d'Italia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sapienza University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>7</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Recently, Linear Temporal Logics on finite traces, such as ltl (or ldl ), have been advocated as high-level formalisms to express dynamic properties, such as goals in planning domains or rewards in Reinforcement Learning (RL). This paper addresses the challenge of separating high-level temporal specifications from the low-level details of the underlying environment (domain or MDP), by allowing for expressing the specifications at a diferent time granularity than the environment. We study the notion of a clock which progresses the high-level ltl specification, whose ticks are triggered by dynamic (low-level) properties defined on the underlying environment. The obtained separation enables terse high-level specifications while allowing for very expressive forms of clock expressed as general ltl properties over low-level features, such as counting or occurrence/alternation of special events. We devise an automata-based construction to compile away the clock into a deterministic automaton that is polynomial in the size of the automata characterizing the high-level and clock specifications. We show the correctness of the approach and discuss its application in several contexts, including FOND planning, RL with ltl Restraining Bolts, and Reward Machines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Temporal Logics</kwd>
        <kwd>Automata Theory</kwd>
        <kwd>Planning and Learning for Temporal Tasks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Linear Temporal Logic on finite traces ( ltl ) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has been advocated as a proper variant of ltl
interpreted over finite traces. Moreover, at no cost of computational complexity but higher
expressive power, the authors propose a novel formalism, Linear Dynamic Logic on finite traces
(ldl ); it is as expressive as regular expressions, while retaining the declarative nature and
intuitive appeal of ltl . Both ltl and ldl have been quite successful in the AI and Formal
Methods communities in recent years. For example, they have been used for finite temporal
synthesis [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ], in Fully-Observable Non-Deterministic (FOND) Planning for ltl Goals
[
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
        ], for reward function specification in the theory of Markov Decision Processes (MDP)
[
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] and in Reinforcement Learning (RL) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with temporal logic rewards [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].
      </p>
      <p>The use of task specification languages, e.g., in the form of ltl /ldl formulas, allowed
greater richness in goal specifications, and improved modularity of the AI system by providing
a clear separation between the goal and the environment. However, despite their successes,
there is a crucial issue that, to the best of our knowledge, has not been studied yet. So far, it
has been implicitly assumed that the time granularity of the task specification and the time
granularity of the acting of the agent in the world are synchronized. In other words, each agent
timestep is in one-to-one correspondence with each task timestep. While this assumption is not
limiting in terms of what specifications can be expressed, we argue that it is limiting in terms
of how. Conceptually, the synchrony assumption between the designer and the agent is not
realistic, as these are two diferent entities which might have diferent cognitive systems, and
therefore diferent perceptions of the world. In particular, the designer and the agent might
have diferent temporal processing capacities. The task desired by the designer is expressed from
the designer’s perspective but has to be executed by the agent, which has its own understanding
of the world and the task.</p>
      <p>
        Consider the following scenario: a RL agent (a computer program) playing the Atari game
Breakout [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and a human designer that assigns the task of breaking the columns of bricks from
left to right (as in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). The designer task can be expressed in ltl in terms of “next” operator,
denoted with “∘ ”: ∘ (1 ∧ ∘ (2 ∧ . . . )). However, these two entities have completely diferent
perceptions of the world. On the one hand, the RL agent observes the pixels of the game screen,
and has access to the Atari Breakout simulator; hence, the timestep of the environment is under
control of the agent itself. On the other hand, the designer has a common-sense understanding
of the environment, and has proposed a task based on its perception. In particular, here we
focus on the notion of what is the “next timestep” for such entities. While for the agent, the
“next timestep” coincides with the “next frame”, for the designer it makes more sense to consider
more abstract or higher-level timesteps, such as “the next removed brick”, or “the next removed
column”. Given this unavoidable discrepancy, the designer should instruct the agent about
how to interpret the designer’s task according to the time resolution of the agent’s perception.
Without any further instruction, the original designer’s task cannot be correctly interpreted by
the agent, because the meaning of the “next” operator is based on the agent’s timestep resolution,
i.e. the next frame. Therefore, the designer is forced to express the goal specification in terms
of stutter-invariant operators [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], e.g. in terms of eventually operators: ◇(1 ∧◇(2 ∧. . . )).
The task specification might be more naturally expressed in a diferent time granularity than
the agent’s, but there must still be a sort of “glue” between the two granularities.
Related works. The topic of diferent temporal abstractions within the same information
system has been investigated for decades in computer science. Several diferent formalisms
to finitely represent infinite-time granularities have been proposed in the literature, based
on algebraic [17, 18, 19, 20], logical [21, 22, 23, 24], string- based [25], and automaton-based
[26, 27, 28, 29] approaches; see [30] for a survey on the topic. However, instead of devising
ad-hoc temporal goal specification languages, or specific automata-based techniques, as the
references above, we would like to keep intact both the ltl formalism and rely on classic
automata theory, while allowing the designer to specify the clock specification and automated
techniques to use it. This would give us broader impact in the wide community that is using
ltl , and better reliance on the wide availability of supporting tools. Another line of research
aimed to extend temporal logic with the so-called clock operator is described in [31, 32]. The
⊤
¬
0
      </p>
      <p>1


¬

0

1
¬
⊤
¬ ∨ ¬
0
1</p>
      <p>∧</p>
      <p>clock  clock ¬ clock  clock ¬ clock¬ clock  clock
clock operator was proposed in the context of modern hardware design, in which there is no
notion of a single clock. Such an operator allows us to disambiguate which clock to use in
order to evaluate a temporal formula or, in other words, what is the “next timestep”. Both LTL@
[31] and PSL [32] extend LTL to support clock operators. Again, our purpose is not to change
the amenable syntax of ltl , but to provide a tool for AI designers to specify the timestep
granularity for semantic evaluation. Moreover, in their case, the clock only depends on the
current instant, while we consider the clock specified using temporal logic formulas.
Contributions. In this work, we are interested in the notion of clock specification , i.e. the
explicit specification of what is “the next step” for the task given by the designer. The core
contribution of this paper is to formalize and study the properties and expressivity of clock
specifications in the context of temporal goal specifications. We formalize our approach by
introducing a clock specification formula for a temporal goal, we show how these two can be
compiled together in order to change the time granularity for the evaluation of the goal formula,
by means of an automata-based construction. This technique can be used to solve the problem
of temporal goal satisfaction, both in planning and in learning, in the presence of clocks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Clock Specifications</title>
      <p>Let  be a set of propositions that capture facets of interest. In the context of clock specifications,
we have a ltl /ldl formula  goal specifying the desired temporal task, i.e. the goal formula. In
addition, we have a ltl /ldl formula  clock, called clock formula, describing the timesteps to
consider when evaluating the goal formula. We call the pair ( goal,  clock) clocked specification
and say that  goal is under clock specification  clock. We assume, without loss of generality,
that both  clock and  goal are defined over . Figure 1 intuitively explains the scenario we are
considering. Circles represent trace timesteps. The bottom trace has finest time granularity .
The formula  clock is evaluated on every prefix of the trace. If the trace prefix at some time
 makes the formula  clock true, then the timestep is passed to the evaluation of  goal, and
becomes a timestep of the coarser-grained timestep sequence ′. On the other hand, if for some
timestep , the trace prefix up to that timestep does not satisfy  clock, then the current timestep
is ignored at the higher level ′.</p>
      <p>We now formalize the semantics of the evaluation of  goal under clock formula  clock. We
start with the notion of trace projection. The projection of  onto clock formula  clock is the trace
 | clock = 0, 1, . . . , , where  =  [], if  (0,  + 1) |=  clock, and  =  , otherwise. We
define the clocked semantics of a ltl /ldl formula  under clock formula  clock in terms of
the original semantics but considering projection of a trace  onto clock formula  clock. That is,
we say that  models  under clock formula  clock, written  |= clock  , if  | clock |=  .</p>
      <p>Now we introduce an automata-based construction to reason over clocked ltl /ldl
specifications. This technique will be useful for automata-based construction in planning and learning
for ltl /ldl goals. Let ( goal,  clock) be a ltl /ldl clocked specification. Firstly, we
compute the dfas goal = ⟨, 2 , 0,  , ⟩ and clock = ⟨, 2 , 0,  , ⟩ of  goal and  clock,
respectively. Then, we compute the clocked product goal× clock = ⟨′, 2 , 0′,  ′,  ′⟩, defined
as follows: ′ =  × , 0′ = (0, 0),  ′ =  × ,  ′((, ), ) = ( (, ),  (, ))
if  (, ) ∈ , otherwise (,  (, )). Intuitively, the clocked product is like the classical
synchronous product between DFAs, except that the state component coming from the goal
automaton  is progressed only if the clock component  transitions into an accepting state
of clock. An example is shown in Figure 2. We have the following result:
Theorem 1. Let ( goal,  clock) be a clocked specification, and goal× clock be clocked product of
goal and clock. For any finite trace  ,  |= clock  goal if  ∈ ℒ(goal× clock)
Theorem 1 tells us that clocked ldl specifications are not more expressive than regular
expressions and, therefore, than ldl . On the other hand, it is easy to see that ldl is not more
expressive than clocked ldl specifications:
Theorem 2. Given a ltl /ldl formula  , the clocked specification (, tt ) is equivalent.
We say that a formula  is unclocked-equivalent to  under clock formula  clock if, for every
trace  , we have  |= clock  if  |=  . Here we show that we can automatically find “unclocked”
ltl /ldl formulas that are semantically equivalent to clocked ltl /ldl specifications.
Theorem 3. Given a clocked specification ( goal,  clock), there exists a ldl formula  that is
unclocked-equivalent to ( goal,  clock).</p>
      <p>Proof sketch. Compute the regular expression  equivalent to goal× clock, and take  = ⟨⟩.
Correctness follows by construction and by Theorem 1.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion</title>
      <p>
        We have sketched the theoretical bases for clock specifications for temporal tasks. This
framework can be applied to FOND planning for ltl /ldl goals [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], by using goal× clock (instead
of goal) in the cross-product with the DFA of the domain, or for specifying non-Markovian
“clocked” rewards in Non-Markovian Reward Decision Processes (NMRDP) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], by means of
the usual product construction between the MDP and the reward specification represented by
goal× clock. The same approach can be combined with logic-based reward specifications in a
Reinforcement Learning setting, as in RL with Restraining Bolts [
        <xref ref-type="bibr" rid="ref14">14, 33</xref>
        ]; the reward is given
only when both the goal formula and the clock formula are satisfied. A similar construction can
be obtained when dealing with Reward Machines [34].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by the EU H2020 project AIPlan4EU (No. 101016442),
the ERC-ADG White- Mech (No. 834228), the EU ICT-48 2020 project TAILOR (No. 952215), the
PRIN project RIPER (No. 20203FFYLK), and the PNRR MUR project FAIR (No. PE0000013).
[17] C. Bettini, S. Jajodia, X. S. Wang, Time granularities in databases, data mining, and temporal
reasoning, Springer, 2000.
[18] B. Leban, D. McDonald, D. Forster, A representation for collections of temporal intervals,
in: AAAI, Morgan Kaufmann, 1986, pp. 367–371.
[19] M. Niezette, J. Stevenne, An eficient symbolic representation of periodic time, in:
Proceedings of the International Conference on Information and Knowledge Management
(CIKM), 1992, pp. 161–168.
[20] P. Ning, X. S. Wang, S. Jajodia, An algebraic representation of calendars, Ann. Math. Artif.</p>
      <p>Intell. 36 (2002) 5–38.
[21] C. Combi, M. Franceschet, A. Peron, Representing and reasoning about temporal
granularities, J. Log. Comput. 14 (2004) 51–77.
[22] S. Demri, LTL over integer periodicity constraints, Theor. Comput. Sci. 360 (2006) 96–123.
[23] H. Bowman, S. J. Thompson, A decision procedure and complete axiomatization of finite
interval temporal logic with projection, J. Log. Comput. 13 (2003) 195–239.
[24] G. Hariharan, B. Kempa, T. Wongpiromsarn, P. H. Jones, K. Y. Rozier, MLTL multi-type
(MLTLM): A logic for reasoning about signals of diferent types, in: NSV/FoMLAS@CAV,
volume 13466 of Lecture Notes in Computer Science, Springer, 2022, pp. 187–204.
[25] J. Wijsen, A string-based model for infinite granularities, in: Proceedings of the AAAI</p>
      <p>Workshop on Spatial and Temporal Granularities, 2000, pp. 9–16.
[26] U. D. Lago, A. Montanari, Calendars, time granularities, and automata, in: SSTD, volume
2121 of Lecture Notes in Computer Science, Springer, 2001, pp. 279–298.
[27] D. Bresolin, A. Montanari, G. Puppis, Time granularities and ultimately periodic automata,
in: JELIA, volume 3229 of Lecture Notes in Computer Science, Springer, 2004, pp. 513–525.
[28] U. D. Lago, A. Montanari, G. Puppis, Compact and tractable automaton-based
representations of time granularities, Theor. Comput. Sci. 373 (2007) 115–141.
[29] U. D. Lago, A. Montanari, G. Puppis, On the equivalence of automaton-based
representations of time granularities, in: TIME, IEEE Computer Society, 2007, pp. 82–93.
[30] J. Euzenat, A. Montanari, Time granularity, in: Handbook of Temporal Reasoning in
Artificial Intelligence, volume 1 of Foundations of Artificial Intelligence , Elsevier, 2005, pp.
59–118.
[31] C. Eisner, D. Fisman, J. Havlicek, A. McIsaac, D. V. Campenhout, The definition of a
temporal clock operator, in: ICALP, volume 2719 of Lecture Notes in Computer Science,
Springer, 2003, pp. 857–870.
[32] C. Eisner, D. Fisman, A Practical Introduction to PSL, Series on Integrated Circuits and</p>
      <p>Systems, Springer, 2006.
[33] G. De Giacomo, M. Favorito, L. Iocchi, F. Patrizi, A. Ronca, Temporal logic monitoring
rewards via transducers, in: KR, 2020, pp. 860–870.
[34] R. T. Icarte, T. Q. Klassen, R. A. Valenzano, S. A. McIlraith, Teaching multiple tasks to an
RL agent using LTL, in: AAMAS, International Foundation for Autonomous Agents and
Multiagent Systems Richland, SC, USA / ACM, 2018, pp. 452–461.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          ,
          <article-title>Linear temporal logic and linear dynamic logic on finite traces</article-title>
          ,
          <source>in: IJCAI, IJCAI/AAAI</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>854</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          ,
          <article-title>Synthesis for LTL and LDL on finite traces</article-title>
          , in: IJCAI, AAAI Press,
          <year>2015</year>
          , pp.
          <fpage>1558</fpage>
          -
          <lpage>1564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          ,
          <article-title>Ltlf and ldlf synthesis under partial observability</article-title>
          , in: IJCAI, IJCAI/AAAI Press,
          <year>2016</year>
          , pp.
          <fpage>1044</fpage>
          -
          <lpage>1050</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Camacho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Baier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Muise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>McIlraith</surname>
          </string-name>
          ,
          <string-name>
            <surname>Finite</surname>
            <given-names>LTL</given-names>
          </string-name>
          <article-title>synthesis as planning</article-title>
          , in: ICAPS, AAAI Press,
          <year>2018</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Tabajara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          , Symbolic ltlf synthesis,
          <source>in: IJCAI</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <article-title>Automata-theoretic foundations of FOND planning for ltlf and ldlf goals, in: IJCAI, ijcai</article-title>
          .org,
          <year>2018</year>
          , pp.
          <fpage>4729</fpage>
          -
          <lpage>4735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. I.</given-names>
            <surname>Brafman</surname>
          </string-name>
          , G. De Giacomo,
          <article-title>Planning for ltlf /ldlf goals in non-markovian fully observable nondeterministic domains, in: IJCAI, ijcai</article-title>
          .org,
          <year>2019</year>
          , pp.
          <fpage>1602</fpage>
          -
          <lpage>1608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Camacho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>McIlraith</surname>
          </string-name>
          ,
          <article-title>Strong fully observable non-deterministic planning with LTL and ltlf goals, in: IJCAI, ijcai</article-title>
          .org,
          <year>2019</year>
          , pp.
          <fpage>5523</fpage>
          -
          <lpage>5531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>De Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Favorito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fuggitti</surname>
          </string-name>
          ,
          <article-title>Planning for temporally extended goals in pure-past linear temporal logic: A polynomial reduction to standard planning</article-title>
          ,
          <source>CoRR abs/2204</source>
          .09960 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R. I.</given-names>
            <surname>Brafman</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. De Giacomo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Patrizi</surname>
          </string-name>
          ,
          <article-title>Ltlf/ldlf non-markovian rewards</article-title>
          , in: AAAI, AAAI Press,
          <year>2018</year>
          , pp.
          <fpage>1771</fpage>
          -
          <lpage>1778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. I.</given-names>
            <surname>Brafman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <article-title>Regular decision processes: A model for non-markovian domains</article-title>
          ,
          <source>in: IJCAI, ijcai.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>5516</fpage>
          -
          <lpage>5522</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Barto</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning - an introduction, Adaptive computation and machine learning</article-title>
          , MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Camacho</surname>
          </string-name>
          , R. T. Icarte,
          <string-name>
            <given-names>T. Q.</given-names>
            <surname>Klassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Valenzano</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>A. McIlraith, LTL and beyond: Formal languages for reward function specification in reinforcement learning</article-title>
          ,
          <source>in: IJCAI, ijcai.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6065</fpage>
          -
          <lpage>6073</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>G. De Giacomo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Iocchi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Favorito</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Patrizi</surname>
          </string-name>
          ,
          <article-title>Foundations for restraining bolts: Reinforcement learning with ltlf/ldlf restraining specifications</article-title>
          , in: ICAPS, AAAI Press,
          <year>2019</year>
          , pp.
          <fpage>128</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mnih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Rusu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Veness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Bellemare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Fidjeland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ostrovski</surname>
          </string-name>
          , et al.,
          <article-title>Human-level control through deep reinforcement learning</article-title>
          , nature
          <volume>518</volume>
          (
          <year>2015</year>
          )
          <fpage>529</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamport</surname>
          </string-name>
          ,
          <article-title>What good is temporal logic?</article-title>
          , in: IFIP Congress, North-Holland/IFIP,
          <year>1983</year>
          , pp.
          <fpage>657</fpage>
          -
          <lpage>668</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>