1. Introduction

Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis, November

Clock Specifications for Temporal Tasks in Planning and Learning

Giuseppe De Giacomo

0 1

Marco Favorito

Fabio Patrizi

Banca d'Italia

Italy

0 Sapienza University of Rome , Italy 1 University of Oxford , UK

2023

7 2023 0000 0001

Recently, Linear Temporal Logics on finite traces, such as ltl (or ldl ), have been advocated as high-level formalisms to express dynamic properties, such as goals in planning domains or rewards in Reinforcement Learning (RL). This paper addresses the challenge of separating high-level temporal specifications from the low-level details of the underlying environment (domain or MDP), by allowing for expressing the specifications at a diferent time granularity than the environment. We study the notion of a clock which progresses the high-level ltl specification, whose ticks are triggered by dynamic (low-level) properties defined on the underlying environment. The obtained separation enables terse high-level specifications while allowing for very expressive forms of clock expressed as general ltl properties over low-level features, such as counting or occurrence/alternation of special events. We devise an automata-based construction to compile away the clock into a deterministic automaton that is polynomial in the size of the automata characterizing the high-level and clock specifications. We show the correctness of the approach and discuss its application in several contexts, including FOND planning, RL with ltl Restraining Bolts, and Reward Machines.

eol>Temporal Logics Automata Theory Planning and Learning for Temporal Tasks

1. Introduction

Linear Temporal Logic on finite traces ( ltl ) [ 1 ] has been advocated as a proper variant of ltl interpreted over finite traces. Moreover, at no cost of computational complexity but higher expressive power, the authors propose a novel formalism, Linear Dynamic Logic on finite traces (ldl ); it is as expressive as regular expressions, while retaining the declarative nature and intuitive appeal of ltl . Both ltl and ldl have been quite successful in the AI and Formal Methods communities in recent years. For example, they have been used for finite temporal synthesis [ 2, 3, 4, 5 ], in Fully-Observable Non-Deterministic (FOND) Planning for ltl Goals [ 6, 7, 8, 9 ], for reward function specification in the theory of Markov Decision Processes (MDP) [ 10, 11 ] and in Reinforcement Learning (RL) [ 12 ] with temporal logic rewards [ 13, 14 ].

The use of task specification languages, e.g., in the form of ltl /ldl formulas, allowed greater richness in goal specifications, and improved modularity of the AI system by providing a clear separation between the goal and the environment. However, despite their successes, there is a crucial issue that, to the best of our knowledge, has not been studied yet. So far, it has been implicitly assumed that the time granularity of the task specification and the time granularity of the acting of the agent in the world are synchronized. In other words, each agent timestep is in one-to-one correspondence with each task timestep. While this assumption is not limiting in terms of what specifications can be expressed, we argue that it is limiting in terms of how. Conceptually, the synchrony assumption between the designer and the agent is not realistic, as these are two diferent entities which might have diferent cognitive systems, and therefore diferent perceptions of the world. In particular, the designer and the agent might have diferent temporal processing capacities. The task desired by the designer is expressed from the designer’s perspective but has to be executed by the agent, which has its own understanding of the world and the task.

Consider the following scenario: a RL agent (a computer program) playing the Atari game Breakout [ 15 ], and a human designer that assigns the task of breaking the columns of bricks from left to right (as in [ 14 ]). The designer task can be expressed in ltl in terms of “next” operator, denoted with “∘ ”: ∘ (1 ∧ ∘ (2 ∧ . . . )). However, these two entities have completely diferent perceptions of the world. On the one hand, the RL agent observes the pixels of the game screen, and has access to the Atari Breakout simulator; hence, the timestep of the environment is under control of the agent itself. On the other hand, the designer has a common-sense understanding of the environment, and has proposed a task based on its perception. In particular, here we focus on the notion of what is the “next timestep” for such entities. While for the agent, the “next timestep” coincides with the “next frame”, for the designer it makes more sense to consider more abstract or higher-level timesteps, such as “the next removed brick”, or “the next removed column”. Given this unavoidable discrepancy, the designer should instruct the agent about how to interpret the designer’s task according to the time resolution of the agent’s perception. Without any further instruction, the original designer’s task cannot be correctly interpreted by the agent, because the meaning of the “next” operator is based on the agent’s timestep resolution, i.e. the next frame. Therefore, the designer is forced to express the goal specification in terms of stutter-invariant operators [ 16 ], e.g. in terms of eventually operators: ◇(1 ∧◇(2 ∧. . . )). The task specification might be more naturally expressed in a diferent time granularity than the agent’s, but there must still be a sort of “glue” between the two granularities. Related works. The topic of diferent temporal abstractions within the same information system has been investigated for decades in computer science. Several diferent formalisms to finitely represent infinite-time granularities have been proposed in the literature, based on algebraic [17, 18, 19, 20], logical [21, 22, 23, 24], string- based [25], and automaton-based [26, 27, 28, 29] approaches; see [30] for a survey on the topic. However, instead of devising ad-hoc temporal goal specification languages, or specific automata-based techniques, as the references above, we would like to keep intact both the ltl formalism and rely on classic automata theory, while allowing the designer to specify the clock specification and automated techniques to use it. This would give us broader impact in the wide community that is using ltl , and better reliance on the wide availability of supporting tools. Another line of research aimed to extend temporal logic with the so-called clock operator is described in [31, 32]. The ⊤ ¬ 0

1 ¬ 0 1 ¬ ⊤ ¬ ∨ ¬ 0 1

∧

clock clock ¬ clock clock ¬ clock¬ clock clock clock operator was proposed in the context of modern hardware design, in which there is no notion of a single clock. Such an operator allows us to disambiguate which clock to use in order to evaluate a temporal formula or, in other words, what is the “next timestep”. Both LTL@ [31] and PSL [32] extend LTL to support clock operators. Again, our purpose is not to change the amenable syntax of ltl , but to provide a tool for AI designers to specify the timestep granularity for semantic evaluation. Moreover, in their case, the clock only depends on the current instant, while we consider the clock specified using temporal logic formulas. Contributions. In this work, we are interested in the notion of clock specification , i.e. the explicit specification of what is “the next step” for the task given by the designer. The core contribution of this paper is to formalize and study the properties and expressivity of clock specifications in the context of temporal goal specifications. We formalize our approach by introducing a clock specification formula for a temporal goal, we show how these two can be compiled together in order to change the time granularity for the evaluation of the goal formula, by means of an automata-based construction. This technique can be used to solve the problem of temporal goal satisfaction, both in planning and in learning, in the presence of clocks.

2. Clock Specifications

Let be a set of propositions that capture facets of interest. In the context of clock specifications, we have a ltl /ldl formula goal specifying the desired temporal task, i.e. the goal formula. In addition, we have a ltl /ldl formula clock, called clock formula, describing the timesteps to consider when evaluating the goal formula. We call the pair ( goal, clock) clocked specification and say that goal is under clock specification clock. We assume, without loss of generality, that both clock and goal are defined over . Figure 1 intuitively explains the scenario we are considering. Circles represent trace timesteps. The bottom trace has finest time granularity . The formula clock is evaluated on every prefix of the trace. If the trace prefix at some time makes the formula clock true, then the timestep is passed to the evaluation of goal, and becomes a timestep of the coarser-grained timestep sequence ′. On the other hand, if for some timestep , the trace prefix up to that timestep does not satisfy clock, then the current timestep is ignored at the higher level ′.

We now formalize the semantics of the evaluation of goal under clock formula clock. We start with the notion of trace projection. The projection of onto clock formula clock is the trace | clock = 0, 1, . . . , , where = [], if (0, + 1) |= clock, and = , otherwise. We define the clocked semantics of a ltl /ldl formula under clock formula clock in terms of the original semantics but considering projection of a trace onto clock formula clock. That is, we say that models under clock formula clock, written |= clock , if | clock |= .

Now we introduce an automata-based construction to reason over clocked ltl /ldl specifications. This technique will be useful for automata-based construction in planning and learning for ltl /ldl goals. Let ( goal, clock) be a ltl /ldl clocked specification. Firstly, we compute the dfas goal = ⟨, 2 , 0, , ⟩ and clock = ⟨, 2 , 0, , ⟩ of goal and clock, respectively. Then, we compute the clocked product goal× clock = ⟨′, 2 , 0′, ′, ′⟩, defined as follows: ′ = × , 0′ = (0, 0), ′ = × , ′((, ), ) = ( (, ), (, )) if (, ) ∈ , otherwise (, (, )). Intuitively, the clocked product is like the classical synchronous product between DFAs, except that the state component coming from the goal automaton is progressed only if the clock component transitions into an accepting state of clock. An example is shown in Figure 2. We have the following result: Theorem 1. Let ( goal, clock) be a clocked specification, and goal× clock be clocked product of goal and clock. For any finite trace , |= clock goal if ∈ ℒ(goal× clock) Theorem 1 tells us that clocked ldl specifications are not more expressive than regular expressions and, therefore, than ldl . On the other hand, it is easy to see that ldl is not more expressive than clocked ldl specifications: Theorem 2. Given a ltl /ldl formula , the clocked specification (, tt ) is equivalent. We say that a formula is unclocked-equivalent to under clock formula clock if, for every trace , we have |= clock if |= . Here we show that we can automatically find “unclocked” ltl /ldl formulas that are semantically equivalent to clocked ltl /ldl specifications. Theorem 3. Given a clocked specification ( goal, clock), there exists a ldl formula that is unclocked-equivalent to ( goal, clock).

Proof sketch. Compute the regular expression equivalent to goal× clock, and take = ⟨⟩. Correctness follows by construction and by Theorem 1.

3. Discussion

We have sketched the theoretical bases for clock specifications for temporal tasks. This framework can be applied to FOND planning for ltl /ldl goals [ 6 ], by using goal× clock (instead of goal) in the cross-product with the DFA of the domain, or for specifying non-Markovian “clocked” rewards in Non-Markovian Reward Decision Processes (NMRDP) [ 10 ], by means of the usual product construction between the MDP and the reward specification represented by goal× clock. The same approach can be combined with logic-based reward specifications in a Reinforcement Learning setting, as in RL with Restraining Bolts [ 14, 33 ]; the reward is given only when both the goal formula and the clock formula are satisfied. A similar construction can be obtained when dealing with Reward Machines [34].

Acknowledgements

This work has been partially supported by the EU H2020 project AIPlan4EU (No. 101016442), the ERC-ADG White- Mech (No. 834228), the EU ICT-48 2020 project TAILOR (No. 952215), the PRIN project RIPER (No. 20203FFYLK), and the PNRR MUR project FAIR (No. PE0000013). [17] C. Bettini, S. Jajodia, X. S. Wang, Time granularities in databases, data mining, and temporal reasoning, Springer, 2000. [18] B. Leban, D. McDonald, D. Forster, A representation for collections of temporal intervals, in: AAAI, Morgan Kaufmann, 1986, pp. 367–371. [19] M. Niezette, J. Stevenne, An eficient symbolic representation of periodic time, in: Proceedings of the International Conference on Information and Knowledge Management (CIKM), 1992, pp. 161–168. [20] P. Ning, X. S. Wang, S. Jajodia, An algebraic representation of calendars, Ann. Math. Artif.

Intell. 36 (2002) 5–38. [21] C. Combi, M. Franceschet, A. Peron, Representing and reasoning about temporal granularities, J. Log. Comput. 14 (2004) 51–77. [22] S. Demri, LTL over integer periodicity constraints, Theor. Comput. Sci. 360 (2006) 96–123. [23] H. Bowman, S. J. Thompson, A decision procedure and complete axiomatization of finite interval temporal logic with projection, J. Log. Comput. 13 (2003) 195–239. [24] G. Hariharan, B. Kempa, T. Wongpiromsarn, P. H. Jones, K. Y. Rozier, MLTL multi-type (MLTLM): A logic for reasoning about signals of diferent types, in: NSV/FoMLAS@CAV, volume 13466 of Lecture Notes in Computer Science, Springer, 2022, pp. 187–204. [25] J. Wijsen, A string-based model for infinite granularities, in: Proceedings of the AAAI

Workshop on Spatial and Temporal Granularities, 2000, pp. 9–16. [26] U. D. Lago, A. Montanari, Calendars, time granularities, and automata, in: SSTD, volume 2121 of Lecture Notes in Computer Science, Springer, 2001, pp. 279–298. [27] D. Bresolin, A. Montanari, G. Puppis, Time granularities and ultimately periodic automata, in: JELIA, volume 3229 of Lecture Notes in Computer Science, Springer, 2004, pp. 513–525. [28] U. D. Lago, A. Montanari, G. Puppis, Compact and tractable automaton-based representations of time granularities, Theor. Comput. Sci. 373 (2007) 115–141. [29] U. D. Lago, A. Montanari, G. Puppis, On the equivalence of automaton-based representations of time granularities, in: TIME, IEEE Computer Society, 2007, pp. 82–93. [30] J. Euzenat, A. Montanari, Time granularity, in: Handbook of Temporal Reasoning in Artificial Intelligence, volume 1 of Foundations of Artificial Intelligence , Elsevier, 2005, pp. 59–118. [31] C. Eisner, D. Fisman, J. Havlicek, A. McIsaac, D. V. Campenhout, The definition of a temporal clock operator, in: ICALP, volume 2719 of Lecture Notes in Computer Science, Springer, 2003, pp. 857–870. [32] C. Eisner, D. Fisman, A Practical Introduction to PSL, Series on Integrated Circuits and

Systems, Springer, 2006. [33] G. De Giacomo, M. Favorito, L. Iocchi, F. Patrizi, A. Ronca, Temporal logic monitoring rewards via transducers, in: KR, 2020, pp. 860–870. [34] R. T. Icarte, T. Q. Klassen, R. A. Valenzano, S. A. McIlraith, Teaching multiple tasks to an RL agent using LTL, in: AAMAS, International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA / ACM, 2018, pp. 452–461.

[1]

De Giacomo ,

M. Y.

Vardi , Linear temporal logic and linear dynamic logic on finite traces , in: IJCAI, IJCAI/AAAI , 2013 , pp. 854 - 860 .

[2]

De Giacomo ,

M. Y.

Vardi , Synthesis for LTL and LDL on finite traces , in: IJCAI, AAAI Press, 2015 , pp. 1558 - 1564 .

[3]

De Giacomo ,

M. Y.

Vardi , Ltlf and ldlf synthesis under partial observability , in: IJCAI, IJCAI/AAAI Press, 2016 , pp. 1044 - 1050 .

[4]

Camacho ,

J. A.

Baier ,

C. J.

Muise ,

S. A.

McIlraith , Finite

LTL

synthesis as planning , in: ICAPS, AAAI Press, 2018 , pp. 29 - 38 .

[5]

Zhu ,

L. M.

Tabajara ,

Li ,

Pu ,

M. Y.

Vardi , Symbolic ltlf synthesis, in: IJCAI , 2017 .

[6]

De Giacomo ,

Rubin , Automata-theoretic foundations of FOND planning for ltlf and ldlf goals, in: IJCAI, ijcai .org, 2018 , pp. 4729 - 4735 .

[7]

R. I.

Brafman , G. De Giacomo, Planning for ltlf /ldlf goals in non-markovian fully observable nondeterministic domains, in: IJCAI, ijcai .org, 2019 , pp. 1602 - 1608 .

[8]

Camacho ,

S. A.

McIlraith , Strong fully observable non-deterministic planning with LTL and ltlf goals, in: IJCAI, ijcai .org, 2019 , pp. 5523 - 5531 .

[9]

De Giacomo ,

Favorito ,

Fuggitti , Planning for temporally extended goals in pure-past linear temporal logic: A polynomial reduction to standard planning , CoRR abs/2204 .09960 ( 2022 ).

[10]

R. I.

Brafman , G. De Giacomo , F. Patrizi , Ltlf/ldlf non-markovian rewards , in: AAAI, AAAI Press, 2018 , pp. 1771 - 1778 .

[11]

R. I.

Brafman ,

G. D.

Giacomo , Regular decision processes: A model for non-markovian domains , in: IJCAI, ijcai.org , 2019 , pp. 5516 - 5522 .

[12]

R. S.

Sutton ,

A. G.

Barto , Reinforcement learning - an introduction, Adaptive computation and machine learning , MIT Press, 1998 .

[13]

Camacho , R. T. Icarte,

T. Q.

Klassen ,

R. A.

Valenzano , S. A. McIlraith, LTL and beyond: Formal languages for reward function specification in reinforcement learning , in: IJCAI, ijcai.org , 2019 , pp. 6065 - 6073 .

[14] G. De Giacomo , L.

Iocchi , M.

Favorito , F.

Patrizi , Foundations for restraining bolts: Reinforcement learning with ltlf/ldlf restraining specifications , in: ICAPS, AAAI Press, 2019 , pp. 128 - 136 .

[15]

Mnih ,

Kavukcuoglu ,

Silver ,

A. A.

Rusu ,

Veness ,

M. G.

Bellemare ,

Graves ,

Riedmiller ,

A. K.

Fidjeland ,

Ostrovski , et al., Human-level control through deep reinforcement learning , nature 518 ( 2015 ) 529 - 533 .

[16]

Lamport , What good is temporal logic? , in: IFIP Congress, North-Holland/IFIP, 1983 , pp. 657 - 668 .