1. Overview

Strategic Reasoning over Golog Programs in the Nondeterministic Situation Calculus - Extended Abstract

Giuseppe De Giacomo

0 1

Yves Lespérance

Matteo Mancanelli

1 0 University of Oxford , Oxford , UK 1 University of Rome La Sapienza , Rome , Italy 2 York University , Toronto, ON , Canada

2025

Automata-based synthesis has seen increasing interest in the last decade, mostly focused on declarative specifications. Here, we consider procedural programs that specify high-level agent behaviors over nondeterministic domains. Specifically, we tackle the problem of synthesizing strategies that guarantee the successful execution of a Golog program within a nondeterministic basic action theory (NDBAT). Our approach constructs symbolic program graphs that capture the control flow of independently of the domain, enabling reactive synthesis via their cross product with the domain model. We formally relate graph-based transitions to standard Golog semantics and show how our framework modularly separates agent and environment behaviors. This flexibility supports strategic reasoning and synthesis in complex nondeterministic settings, with programs acting as independent controllers for both agent and environment.

eol>Program Execution Golog Situation Calculus Nondeterministic Domains Strategic Reasoning Reactive Synthesis

1. Overview

Proposed Framework. The problem we address in this paper is a variant of the high-level program execution task, that is, given a program and an action theory , find a strategy for executing that guarantees successful termination from the initial situation 0 in the domain specified by . If the domain is deterministic and the program is situation determined (i.e., the remaining subprogram is uniquely fixed by the resulting situation), then such a strategy can simply specify a sequence of actions ⃗ such that |= (, 0, (⃗, 0)). If the program is not situation determined, the strategy must also specify the remaining program after each step. In this paper, we consider the case where the domain is nondeterministic and specified by a NDBAT. For a situation determined program , we aim to synthesize a strategy (a mapping from situations to actions) such that |= (, 0, ) holds. When is not situation determined, we also generate a strategy mapping situations to subprograms, such that |= (, 0, , ). The same applies at the model level, as in FOND planning, when full information about the initial state is available, i.e., for a model such that |= .

Most prior work on Golog program execution [ 1, 2, 3 ] focuses on deterministic environments, compiling Golog into the domain so classical planners can be used. We instead target nondeterministic domains using a reactive synthesis approach, building on recent advances. A recent example is [ 4 ], which uses the 2 decidable fragment of FOL but sufers from scalability issues. Our framework is simpler and more intuitive, while ofering formal guarantees that relate program executions with classical Golog semantics and synthesis techniques. Our framework is also very flexible, and it can easily be extended to accommodate diferent challenges. To show this, in the last section we allow one to specify separate programs as constraints on agent’s behavior and environment’s behavior only, while [ 4 ] takes the Golog program as a constraint on the behavior of the whole system, i.e., both the agent and environment.

Preliminaries. Our approach builds on the nondeterministic situation calculus [ 5 ], where each agent action (⃗) is paired with an environment reaction outside the agent’s control, determining the outcome (e.g., a flipped coin may land heads or tails). A nondeterministic basic action theory (NDBAT) can be seen as a special kind of action theory, where we have system actions (⃗, ), successor state axioms , describing how predicates and functions change after system actions are performed, and action precondition axioms , stating when each system action can occur. Throughout the paper, we assume standard names for actions and objects of the domain, for which we impose the unique name assumption and domain closure. Golog programs are executed over a (possibly nondeterministic) BAT. We consider a variant of Golog where the test construct yields no transitions, and is final when the condition is satisfied [ 6, 7 ]. The key feature of our variant is that programs instruct agent actions, rather than system actions. Programs are of the form ::= (⃗) | ? | 1; 2 | 1| 2 | . | * , where (⃗) is an action term, 1; 2 is the sequential execution of 1 and 2, 1| 2 is the nondeterministic choice of 1 and 2, . executes program for some nondeterministic choice of the object variable , and * performs zero or more times. is a situation-suppressed formula, and [] the formula obtained by restoring into all fluents in . We use as an abbreviation for True?.

To construct the program graph of a Golog program 0, we define its subprograms, those encountered during partial execution, via the notion of syntactic closure [ 8 ]. They separate the program terms themselves from the assignments to pick variables. Let’s assume all pick variables are renamed apart and ordered. We define a -tuple of object terms ⃗ = ⟨1, ..., ⟩, called environment term, where is the current value of -th pick variable. In the following, denotes the Cartesian product of objects, and is the value assigned to variable .. At the beginning of an execution, the environment term is arbitrarily instantiated, and at each step of the computation it maintains only values. Tracking pick variables separately in ⃗ avoids enumerating all (possibly infinite) bindings from . 0, enabling a finite set of subprograms. The syntactic closure Γ 0 is defined inductively: (i) 0, ∈ Γ 0 , (ii) if 1; 2 ∈ Γ 0 and 1′ ∈ Γ 1 , then 1′; 2 ∈ Γ 0 and Γ 2 ⊆ Γ 0 , (iii) if 1 | 2 ∈ Γ 0 , then Γ 1 , Γ 2 ⊆ Γ 0 , (iv) if . ∈ Γ 0 , then Γ ⊆ Γ 0 , (v) if * ∈ Γ 0 , then ; * ∈ Γ 0 .

A complete configuration is a triple (, ⃗, ), where ∈ Γ 0 , ⃗ is the environment term, and is the current situation. The semantics of Golog is defined in terms of single-steps using two predicates: , indicating a program can terminate in a situation, and , specifying one-step transitions to a new situation and remaining program. Here, we define Trans and Final considering both the specified configuration version and the environment reactions as follows:

Trans(, ⃗, , ′, ⃗′, ′) ≡ ∃ . ([⃗](), ) ∧

′ = ∧ ⃗′ = ⃗ ∧ ′ = ([⃗](), ) Trans( ?, ⃗, , ′, ⃗′, ′) ≡ False Trans( 1; 2, ⃗, , ′, ⃗′, ′) ≡ Trans( 1, ⃗, , 1′, ⃗′, ′) ∧

′ = 1′; 2 ∨ Final( 1, ⃗, ) ∧ Trans( 2, ⃗, , ′, ⃗′, ′) Trans( 1| 2, ⃗, , ′, ⃗′, ′) ≡ Trans( 1, ⃗, , ′, ⃗′, ′) ∨

Trans( 2, ⃗, , ′, ⃗′, ′) Trans(., ⃗, , ′, ⃗′, ′) ≡ ∃ .Trans(, ⃗ , , ′, ⃗′, ′) Trans( * , ⃗, , ′, ⃗′, ′) ≡ Trans(, ⃗, , ′′, ⃗′, ′) ∧ ′ = ′′; * Final(, ⃗, ) ≡ False Final( ?, ⃗, ) ≡ [⃗][] Final( 1; 2, ⃗, ) ≡

Final( 1, ⃗, ) ∧ Final( 2, ⃗, ) Final( 1| 2, ⃗, ) ≡

Final( 1, ⃗, ) ∨ Final( 2, ⃗, ) Final(., ⃗, ) ≡

∃.Final(, ⃗ , ) Final( * , ⃗, ) ≡ True where [⃗] and [⃗] denote the action term and formula with free pick variables replaced by corresponding terms in ⃗, and ⃗ denotes ⃗ updated by binding variable to . We use Trans* to denote the transitive closure of Trans, i.e., Trans* (, ⃗, , ′, ⃗′, ′) means that there exists a sequence of one-step transitions from (, ⃗, ) to ( ′, ⃗′, ′). We define a program situation determined (SD) [ 9 ] if, at any times, the re. maining program is uniquely determined by the resulting situation: (, ⃗, ) = ∀′, ′, ′′, ⃗′, ⃗′′. * (, ⃗, , ′, ⃗′, ′) ∧ * (, ⃗, , ′′, ⃗′′, ′) ⊃ ′ = ′′ ∧ ⃗′ = ⃗′′. We can prove lemmas to provide guarantees that all programs that 0 can evolve into according to and , must be in its syntactic closure Γ 0 .

Lemma 1. Let 0 be a Golog program and Γ 0 its syntactic closure. Let an NDBAT over which 0 is executed and a model of . If ∈ Γ 0 and |= (, ⃗, , ′, ⃗′, ′), then ′ ∈ Γ 0 , and if |= * ( 0, ⃗0, 0, , ⃗, ), then ∈ Γ 0 .

2. First-Order Program Graphs

Constructing the Program Graph. The program graph of a Golog program 0 characterizes all its possible executions, independently of the domain (which is integrated later via a cross product). The program graph is thus a symbolic structure that captures the control flow semantics of Golog at the syntactic level, decoupled from domain dynamics. Each node corresponds to a program in the syntactic closure Γ 0 , i.e. a possible remaining subprogram, and each edge represents a guarded one-step transition between such nodes. We define transitions and termination conditions based on and . Since pick variables are unbound at this stage, the graph tracks which variables need to be bound, deferring the actual binding until domain details are available. Specifically, we define: (, ) = {( (), ∅, )} (, ) = {} ( ?, ) = {} ( 1; 2, ) = {(¬ ( 1) ∧ , , 1′; 2) | (, , 1′) ∈ ( 1, )} ∪

{( ( 1) ∧ , , 2′) | (, , 2′) ∈ ( 2, )} ( 1| 2, ) = ( 1, ) ∪ ( 2, ) (., ) = {(, ∪ , ′) | (, , ′) ∈ (, )} ( * , ) = {(¬ ( ) ∧ , , ′; * ) | (, , ′) ∈ (, )} () = ( ?) = ( 1; 2) = ( 1) ∧ ( 2) ( 1| 2) = ( 1) ∨ ( 2) (. ) = ∃. ( ) ( * ) = Given a program and an action , (, ) returns guarded transitions (, , ′), where is a Boolean guard over () and tests in , is the set of pick variables to be instantiated, and ′ is the remaining program. ( ) gives the condition under which may terminate.

Now, we can provide a definition for the program graph.

Definition 2. Let Φ be a boolean formula over tests and , the set of pick variables and the set of agent actions. If 0 is a Golog program, the program graph is a tuple ⟨Φ × 2 × , , 0, , ℒ⟩, where • Φ × 2 × is the alphabet • = Γ 0 is the set of nodes • 0 = 0 is the initial node • (, , , , ′) if (, , ′) ∈ (, ) • ℒ() = () indicates that the state is assigned a label according to Results about Program Graphs We now summarize key properties of the program graph used to symbolically encode the structure of Golog programs. First, we can talk about the dimensions: Theorem 3. Let 0 be a Golog program and the corresponding program graph. The number of nodes and edges in is linear in the size of 0.

This theorem ensures the graph remains compact wrt the original dimension of the program.

We also relate program graphs with Golog operational semantics. In particular, symbolic transitions in the graph correspond to executable transitions in the model, and vice versa, provided guard conditions are satisfied, and the final condition ( ) characterizes the same terminating configurations as . Theorem 4. Let 0 be a Golog program and the corresponding program graph. Let an NDBAT over which 0 is executed and a model of . Then, for all subprograms , ′ ∈ Γ 0 , environment terms ⃗, ⃗′, agent action , and situation : 1. If (, , , , ′) ∈ , ∀ ̸∈ .′ = and |= [⃗′][], then there exists a reaction such that |= (, ⃗, , ′, ⃗′, ([⃗′](), )); 2. For every reaction , if |= (, ⃗, , ′, ⃗′, ([⃗′](), )), then there exists such that (, , , , ′) ∈ , ∀ ̸∈ .′ = , and |= [⃗′][].

Theorem 5. For all subprograms ∈ Γ 0 , environment term ⃗ and situation , |= (, ⃗, ) if |= ( )[⃗][].

Having the previous theorems in place, we can prove that a full execution trace to a final configuration exists in the model if and only if there is a corresponding path in the program graph. Theorem 6. Let ( 0, ⃗0, 0) be an initial configuration. Then, |= * ( 0, ⃗0, 0, , ⃗, ) ∧ (, ⃗, ) if there exists a sequence of transitions (0, 1, 1, 1, 1), (1, 2, 2, 2, 2), ..., (− 1, , , , ) in , environment terms 0, ..., and reactions 1, ..., such that 0 = 0, = , ⃗ = ⃗ and = , and for each = 1, ..., , (i) ∀ ̸∈ , ′, = ,, (ii) = ([](), − 1), (iii) |= [][− 1], and (iv) |= ()[][].

Finally, if we have a SD program, then the symbolic transition relation becomes functional. Theorem 7. Let 0 be a SD program in 0 wrt . For any transitions (, 1, , , 1′) and (, 2, , , 2′), there exists ⃗, ⃗′, s.t. ∀ ̸∈ .′ = and |= 1[⃗′][] ∧ 2[⃗′][], then 1′ = 2′. Thus, when 0 is situation determined in 0 with respect to , the characteristic graph becomes deterministic, and we can replace the relation (, , , , ′) by the function (, , , ) = ′.

3. Synthesis and Reasoning over Program Graphs.

TS-based Synthesis for FO Domains Consider a nondeterministic domain = ⟨2ℱ , , 0, , ⟩, where ℱ is the set of fluents, the agent actions, 2ℱ the state space, 0 the initial state, () ⊆ the action preconditions, and (, , ′) the transition relation for ∈ (). For any NDBAT and model |= , there exists a corresponding domain . We construct a game arena by taking the cross product of the program graph and domain : Definition 8. Let 0 be a program and a ND domain. The cross product is a tuple ⟨ × × 2ℱ , × × 2ℱ , ( 0, ⃗0, 0), , ⟩, where × • × × × 2ℱ is the alphabet • × × 2ℱ is a set of states • ( 0, ⃗0, 0) is the initial state • ((, ⃗, ), (, ′, ⃗ , ′)) = ( ′, ⃗′, ′) is the transition function where (i) ∃.

∀ ̸∈ .′ = and ∀ ∈ .′ = , (iii) |= [⃗′], (iv) (, [⃗′], ′) • = {(, ⃗, ) | |= ( )[⃗]} is the set of final states (, , , , ′), (ii) This can be viewed as a game arena where the agent controls the action , the remaining program , and the binding of the pick variables , and where the environment controls the next state 2ℱ . A game strategy is a function : → × × mapping states of the game ∈ to agent actions, remaining programs, and bindings for the pick variables. The set of plays induced by a game strategy in a game arena , (, ), is the set of all plays 0, 1, ... ∈ such that 0 is the initial state of and there exists an environment state ′ such that (, ( (), ′)) = +1. A game strategy is winning in if, for every play 0, 1, ... in (, ), there exists some such that (). Agent Control and Strategic Reasoning. To represent the ability of the agent to execute an agent program in a ND domain, [ 5 ] introduce AgtCanForceIf (, , ) as an adversarial version of in the presence of environment reactions. It states that strategy , a function from situations to agent actions (including the special action ), executes SD Golog agent program in situation considering its nondeterminism angelic, as in the standard , but also considering the nondeterminism of environment reactions devilish/adversarial. Here is the version of AgtCanForceIf that considers the presence of environment term:

AgtCanForceIf (, ⃗, , ) = ∀.[. . . ⊃ (, ⃗, )] where . . . stands for [(∃⃗. () = ∧ (, ⃗, )) ⊃ (, ⃗, )] ∧ [∃.( () = ̸= ∧ ∃.∃ ′.∃⃗, ⃗′. (, ⃗, , ′, ⃗′, ([⃗′](), )) ∧ ∀.(∃ ′.∃⃗, ⃗′. (, ⃗, , ′, ⃗′, ([⃗′](), ))) ⊃ ∃ ′.∃⃗, ⃗′. (, ⃗, , ′, ⃗′, ([⃗′](), )) ∧ ( ′, ⃗′, ([⃗′](), )) ⊃ (, ⃗, )] We say that AgtCanForce(, ) holds if there exists a strategy s.t. AgtCanForceIf (, , ) holds. Theorem 9. For any subprogram ∈ Γ 0, situation and strategy , we have that: |= AgtCanForceIf (, ⃗, , ) if (, ) is in the least set such that: if () = and |= (, ⃗, ), then (, ⃗, ) ∈ if () = ̸= and there exists , ′, ⃗ and ⃗′ s.t. |= (, ⃗, , ′, ⃗′, ([⃗′](), )) and for all the existence of ′, ⃗ and ⃗′ s.t. |= (, ⃗, , ′, ⃗′, ([⃗′](), )) implies ( ′, ⃗′, ([⃗′](), )) is in , then (, ⃗, ) ∈ if (, ⃗, ) is in the least set such that if () = and |= ( )[⃗][], then (, ⃗, ) ∈ if () = ̸= and there exists , ′, ⃗ and ⃗′ s.t. (, , , , ′), for all not in , ′ = , and |= [⃗′][], and for all the existence of ′, ⃗ and ⃗′s.t. (, , , , ′) and |= [⃗′][] implies that ( ′, ⃗′, ([⃗′](), )) is in , then (, ⃗, ) ∈ Theorem 10. Let 0 be a Golog program, an NDBAT where the program is executed, a model of , the nondeterministic domain corresponding to , and the game arena generated by program 0 and domain . Then |= AgtCanForce( 0, 0) if there exists a winning strategy in .

4. Using Programs to Constrain the Environment

Our framework naturally extends to scenarios where the environment’s behavior is constrained by its own program. We assume that the environment program contains only one action, namely , taking the reaction as a parameter, and it can contain a special symbol for referring the current action executed by the agent. Here, we need to change the semantics of the programs and define and for both the agent program, denoted , and the environment program, denoted . Below is a sketch of the transition relations:

Trans(, ⃗, , ′, ⃗′, ) ≡

∃. ([⃗], ) ∧ ′ = ∧ ⃗′ = ⃗ Trans( ?, ⃗, , ′, ⃗′, ) ≡ False ...

Trans((), ⃗, , , ′, ⃗′, ) ≡

((), ) ∧ ′ = ∧ ⃗′ = ⃗ Trans( ?, ⃗, , , ′, ⃗′, ) ≡ False ... is the same as in the original definition, but it takes the action performed as last parameter instead of the next situation ′; also is similar, but it takes as input both and (because the reaction depends on the action chosen by the agent), and again it drops ′. and are equal to the original definition, except that has among its parameter, and in the final condition for tests we substitute the symbol with the actual action , i.e. Final( ?, ⃗, , ) ≡ [⃗, /][].

Finally, system transitions result from the interleaved execution of the agent program. More formally, we define the transition relation: ( , ⃗, , ⃗, , ′, ⃗′, ′, ⃗′, ′) ≡ ( , ⃗, , ′, ⃗′, ) ∧ ( , ⃗, [⃗′], , ′, ⃗′, ) ∧ ′ = ([⃗′](), ). A final configuration is reached when both programs have terminated: ( , ⃗, , ⃗, ) ≡ ( , ⃗, ) ∧ ( , ⃗, , ). Note that we could easily construct a program graph for the environment programs, and if we compute the cross product between the program graph of and the program graph of , we still obtain a program graph. This means that everything we have shown carries over even in this setting.

This leads to a joint fixpoint definition of agent-environment interaction. We extend the predicate AgtCanForceIf to capture whether an agent strategy can enforce successful execution of in the presence of an adversarial but constrained environment running .

AgtCanForceIf ( , ⃗, , ⃗, , ) = ∀.[. . . ⊃ ( , ⃗, , ⃗, )] where . . . stands for [( () = ∧ ( , ⃗, , ⃗, )) ⊃ ( , ⃗, , ⃗, )] ∧ [∃.( () = ̸= ∧ ∃.∃⃗′, ⃗′.∃ ′, ′. ( , ⃗, , ⃗, , ′, ⃗′, ′, ⃗′, ([⃗′](), )) ∧ ∀.∀⃗′.(∃ ′, ′.∃⃗. ( , ⃗, , ⃗, , ′, ⃗′, ′, ⃗′, ([⃗′](), )) ⊃ ( ′, ⃗′, ′, ⃗′, ([⃗′](), )) ⊃ ( , ⃗, , ⃗, )] This work has been partially supported by the ERC Advanced Grant WhiteMech (No. 834228), the PRIN project RIPER (No. 20203FFYLK), the PNRR MUR project FAIR (No. PE0000013), the Italian National Ph.D. on Artificial Intelligence at Sapienza University of Rome, the National Science and Engineering Research Council of Canada, and York University.

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT for formatting assistance. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

J. A.

Baier ,

Fritz ,

S. A.

McIlraith , Exploiting procedural domain control knowledge in state-ofthe-art planners ., in: ICAPS , 2007 , pp. 26 - 33 .

[2]

J. A.

Baier ,

Fritz ,

Bienvenu ,

S. A.

McIlraith , Beyond classical planning: Procedural control knowledge and preferences in state-of-the-art planners ., in: AAAI , 2008 , pp. 1509 - 1512 .

[3]

Fritz ,

J. A.

Baier ,

S. A.

McIlraith , Congolog, sin trans: Compiling congolog into basic action theories for planning and beyond , in: Proceedings of the Eleventh International Conference on Principles of Knowledge Representation and Reasoning , 2008 , pp. 600 - 610 .

[4]

Hofmann ,

Claßen , Ltlf synthesis on first-order agent programs in nondeterministic environments , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 39 , 2025 , pp. 14976 - 14986 .

[5]

De Giacomo , Y. Lespérance, The nondeterministic situation calculus , in: M. Bienvenu , G. Lakemeyer , E. Erdem (Eds.), Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning , KR 2021, Online event , November 3 - 12 , 2021 , 2021 , pp. 216 - 226 . URL: https://doi.org/10.24963/kr.2021/21. doi: 10 .24963/KR. 2021 /21.

[6]

Claßen , G. Lakemeyer, A logic for non-terminating Golog programs , in: KR , 2008 , pp. 589 - 599 .

[7]

De Giacomo ,

Lespérance ,

A. R.

Pearce , Situation calculus based programs for representing and reasoning about game structures , in: KR , 2010 , pp. 445 - 455 .

[8]

De Giacomo ,

Lespérance ,

Patrizi ,

Sardina , Verifying congolog programs on bounded situation calculus theories , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 30 , 2016 .

[9]

De Giacomo ,

Lespérance ,

C. J.

Muise , On supervising agents in situation-determined ConGolog , in: AAMAS, IFAAMAS , 2012 , pp. 1031 - 1038 .