Towards The Integration of Model Predictive Control into an AI Planning
                                   Framework

                                       Falilat Jimoh and Thomas L. McCluskey
                                                       PARK Research group
                                                      University of Huddersfield


                           Abstract                                   output controls that take into account future events. On the
                                                                      other hand, these approaches are not as flexible as the AIP
     This paper describes a framework for a hybrid algo-              & S approaches.
     rithm that combines both AI Planning and Model Pre-
     dictive Control approaches to reason with processes and             This work aims to inegrate Control Engineering and AI
     events within a domain. This effectively utilises the
                                                                      Planning techniques by formulating plan generation algo-
     strengths of search-based and model-simulation-based
     methods. We explore this control approach and show               rithms which combine the advantages of both approaches
     how it can be embedded into existing, modern AI Plan-            into one unified approach(Jimoh 2015). The algorithms
     ning technology. This preserves the many advantages of           should be able to produce readable plans to achieve goal
     the AI Planning approach, to do with domain indepen-             conditions where the dynamical system model includes con-
     dence through declarative modelling, and explicit rea-           tinuous processes. In this paper, a hybrid approach is pre-
     soning, while leveraging the capability of MPC to deal           sented where the Model Predictive Control (MPC) approach
     with continuous processes computation within such do-            is used in conjunction with a traditional state-based planning
     mains. The developed technique is tested on an urban             system. This allows for the effective planning of both logical
     traffic control application and the results demonstrate          and numerical change in a problem specification.
     the potential in utilising MPC as a heurisic to guide
     planning search.                                                    The resulting approach, Model predictive control Ap-
                                                                      proach to Continuous Planning (MACOP), is able to gen-
                       Introduction                                   erate plans in domains containing actions, events and pro-
                                                                      cesses; discrete, real-valued, and interval-valued fluents; and
The area of domain independent planning involves mod-                 continuous change to quantities. While there has been a
elling a system in a knowledge-based way, with declara-               growing amount of interest in seeking to unite work in con-
tive data structures representing goals, states, resources, ac-       trol engineering and AI hybrid planning (e.g. (Löhr et al.
tions to mention a few, and creating tools that can reason            2012), MACOPs novel contribution is to use the control out-
about them logically. Plans are created as output to achieve          put from an MPC routine as a heuristic within an AI planner
goal conditions in a future state. As well as the flexibil-           - that is to guide forward search planning using a discretized
ity of input language, a characteristic of this approach is           problem and domain model.
that the human user can understand and inspect the output
plan, to help validate the approach, and to promote a mixed-             This paper describes an implementation of the planner
initiative interaction with the system operators. Automated           used to control the light signals in the application area of
planning and scheduling is increasingly been used to solve            urban traffic conntrol using a flow model of traffic. We are
real-world planning problems. Current limitations in state-           targeting those applications which require a plan to be gen-
of-the-art planning algorithms present significant limitations        erated a priori (rather than planning in a tight plan-execution
when trying to plan for domains which contains continuous             loop), such as in Urban Traffic Control (UTC). While given
numeric change. Few planning engines can input models                 the uncertainty in the UTC area, a plan - re-planning loop
described by continuous processes, and the computational              is often necessary, but there is also a requirement to vali-
complexity of the planning problem can be prohibitive with-           date such automated strategies by inspection prior to execu-
out an appropriate planning approach or strong heuristics.            tion (e.g the problem to be solved might be planning for a
   Control Engineering researchers, on the other hand, have           known road closure or predicted saturated road conditions).
developed techniques to control a continuous process by the           This paper shows the feasibility of using this kind of hybrid
varying of a control variable over a fixed time horizon. The          approach to generate plans for such problems involving hy-
advantages of Control Engineering approaches in this area,            brid states. While the paper describes a domain dependent
such as embedded in the popular “Model Predictive Con-                planner, we propose that the framework could be used with
trol”(MPC) technique, is that they can work with dynamical            an existing planner as the basis for a domain independent
systems described as continuously changing processes, and             version.
                       Background                                 2005). It was the first planner to implement the PDDL+ Se-
Predictive controls are a branch of Control Engineering that      mantics and in addition had the capability of handling vari-
are used in adapting and forecasting the future trend of con-     able durative actions. This includes durative actions with
trol processes in other to manipulates it inputs for a desir-     continuous effects and duration-dependent end-effects. It
able result in a future time. There exist different types of      uses PDDL+ semantics to compile a collection of SAT for-
predictive controls, for example, the Receding Horizon Pre-       mulas from a horizon bounded continuous planning prob-
dictive Control(RHC); the Generalised Predictive Control          lem, together with an associated set of linear metric con-
(GPC) and the Model Predictive Controls (MPC). MPC is             straints over numeric variables. The compiled formulation
used to predict the future behaviour of processes or out-         is passed to a SAT-based arithmetic constraint solver, LP-
put of a system over a period of time in the future. This         SAT (Audemard et al. 2002). The SAT-solver parses trig-
is achieved by computing the future input variables at each       gered constraints to the LP-solver, if there is no solution the
step while minimising a cost function under disparity con-        horizon is increased and the process repeats, otherwise the
straints on the manipulated controls and controlled variable.     solution is decoded into a plan. The novelty of TM-LPSAT
MPC applies only the first set of control variables on the        lies in the compilation of the PDDL+ semantics and decod-
controlled system and repeats the previous step with new          ing of the SAT solver into a plan, since both solvers are well-
measured numeric variables (Veselý, Rosinov, and Foltin          established systems. Kongming (Li and Williams 2008; ?)is
2010). MPC has attracted notable attention in the control of      another domain dependent continuous planner that solves a
dynamic systems and has gained an important role in pro-          class of control planning problems with continuous dynam-
cess control. It was developed in the industrial area as an al-   ics. The language used is a version of PDDL2.1 extended
ternative algorithm control to the conventional Proportional      to enable dynamics to be encoded. It is based on the build-
Integrate Derivative (Bennett 1993) (PID) due to it ability       ing of fact and action layers of flow tubes, using the itera-
to reason with the model of a system under consideration.         tive plan graph structure of Graphplan algorithm (Blum and
The MPC formulation integrates optimal control, stochastic        Furst 1995). As the graph is expands, every action produces
control, control of processes with dead time, multi-variable      a flow tube which contains the valid trajectories as they de-
control and future references when available (Camacho and         velop over a period of time. Reachable states at any time can
Bordons 1999). There are various MPC algorithms, these            be computed using the state equations of the system starting
algorithms has been constantly improved and refined to in-        from a feasible region, and applying actions whose precon-
crease its robustness for real time processes (Tay 2007;          ditions intersect with the feasible region. Kongming trans-
Al-Gherwi, Budman, and Elkamel 2011).                             lates a planning problem into Mixed Logical-Quadratic Pro-
   Automated Planning with continuous processes allows us         gram (MLQP) using the plan-graph encoding with the con-
to model actions, process and events as part of domain op-        tinuous dynamics of the system. The planners metric objec-
erators. This extends temporal planning by considering con-       tives function can be defined in terms of quadratic function
stantly changing state variables with respect to time. Auto-      of state variable. Time is discretised to support state update
mated planning in domains which are represented with rich         within the plan - successive layers of the graph are separated
notations has long been a great challenge for AI (Bresina et      by a constant and uniform time increment.
al. 2013). For instance, changes occurring due to fuel con-          UPMurphi is another planner that reasons with continu-
sumption, continuous movement, or environmental condi-            ous processes (Penna et al. 2010). It alternatively refines a
tions may not be adequately modelled through instantaneous        discretisation of continuous changes until the solution to the
or even durative actions; rather these require modelling as       discretised problem validates against the original problem
continuously changing processes. The combination of time          specification. UPMurphi starts by discretising the continu-
dependent problems and numeric optimisation problem cre-          ous representation of the problem. Specific values within
ate a more challenging and hard task of time-dependent met-       feasible ranges are taken as actions, giving rise to several
ric fluents.                                                      version of each action. The current discretisation is then
   One of the earliest works that involves planning with con-     used to explicitly construct and explore the state space. Plans
tinuous processes includes the Zeno system (Penberthy and         are constructed in the form of planning-as-model-checking
Weld 1994). In this case, processes are described using dif-      paradigm (Cimatti et al. 1997) with no heuristic to guide the
ferential equations rather than as continuous update effects,     search (users can insert their own). When a plan is found,
so that simultaneous equations must be consistent with one        it is validated against the original continuous model, using
another rather than accumulating additive effects. McDer-         the plan validator (Fox, Howey, and Long 2005). If it fails
motts OPTOP system (McDermott 2003) is another early              to find a plan at one discretisation, it iterates again at a finer
planner to handle continuous processes. A forward search          grained discretisation. Successive refinements lead to ever
planner that avoids grounding the representation using a          denser feasible regions, which might be increasingly com-
unique approach to generate heuristics(relaxed plan esti-         plex to construct.
mates of the number of actions required to achieve the Goal)         COLIN (Coles et al. 2012) is a forward-chaining heuris-
from a given state.                                               tic search planner is capable of reasoning with continuous
   More recently, the syntax and semantics of a hybrid lan-       linear numeric change. It combines forward chaining search
guage was brought into the PDDL family in the form of             of FF to prune state, with the use of a Linear Program (LP)
PDDL+ (Fox and Long 2006). TM-LPSAT (Shin and Davis               to check the consistency of the interacting temporal and nu-
2005) was built upon the earlier LPSAT (Shin and Davis            meric constraints at each state. The Temporal Relaxed Plan-
ning Graph heuristic of CRIKEY3 (Coles et al. 2008) is also       Definition 4 (Domain Model) A Domain Model DM, con-
extended to support reasoning with continuous change. A           sist of:
mix integer programming is used for post processes to opti-       • A set of Propositions {p1 , ..., pk }
mise the timestamps of the actions in the plan.
   As the diversity of potential planning applications has in-    • A set of numeric Functions {n1 , ..., nk }
creased, so has the complexity of the continuous domain           • A set of Resources {r1 , ..., rk }
knowledge. In order to circumvent this limitation, some re-       • A set of Actions {a1 , ..., ak }
searchers are currently relaxing the complexity of the do-        • A set of Processes {c1 , ..., ck }
main by discretising continuous change into discrete pro-
files of linear change (Piacentini et al. 2013). in a similar     • A set of Events {e1 , ..., ek }
vein, in Domain Predictive Control (Löhr et al. 2012), a            The domain description language syntax and semantics
discrete domain model is derived from the equations gov-          used in this implementation is similar to PDDL+ (Fox and
erning dynamics in the application domain, and AI plan-           Long 2006). To save space, we refer the reader to the defini-
ning is used to generate plans (using durative planning with      tion of PDDL+ for prelimiary definitions. The exact defini-
PDDL 2.1). The application area is continuous (re-)planning       tions (which are at variance a little with PDDL+) are given
in “switched hybrid systems”. The idea is to start with the       in Jimoh’s thesis1 . The example action, process and event
dynamical equations, and generate a discrete domain model         definitions in Figs 1 - 3 below, taken from the flow model of
from that; this contrasts with the work in this paper, which      urban traffic control that we will develop later in the paper,
assumes that we create a symbolic domain model containing         illustrate the idea.
processes, events and actions, then use MPC derived from a
model of dynamical equations as a heuristic to control for-       Algorithm 1 Top level algorithm of MACOP
ward search in a symbolic planning search space. While in         Input:
the DPC work the emphasis is on applications requiring real       (P,R) : initial state
time control, in our work we are more interested in creating      G: Goal Condition
a complete readable plan before execution.                        DM: Domain Model
                                                                  Nc : control horizon window
              The MACOP Framework                                 N p : horizon prediction window
In overview, MACOP inputs a planning domain model and             Output: Plan.
problem, and searches through a space of nodes using a best-
first heuristic to find a complete plan. A node is a point in      1: ℜ := [ R ]; S := [ ]; ℘ := null;
a search space at which search frontiers or pathways inter-        2: n := (P, ℜ, S)
sect or branch. Operators’ preconditions are checked against       3: repeat
propositions and numeric fluents at each node, if one is sat-      4:     Q := {n}
isfied, the operator effect is applied and the new state be-       5:     i := 1
comes the current state. The model-based numeric optimi-           6:     ℘ := SolveMPC(n, DM, Nc , N p ,℘)
sation problem within the domain model is solved at spe-           7:     while Q 6= {} and i ≤ Nc and NoSolution(Q) do
cific nodes during node exploration. The search proceeds by        8:         n := getBest(Q,℘)
applying each applicable operator to the current states in a       9:         N := Expand(n)
receding horizon until a goal state is found or the node set is   10:         Q := AddTo(N, Q)
empty. The following subsection gives a description of the        11:         i := i + 1
details of the algorithm.                                         12:     end while
                                                                  13:     if Q 6= {} and NoSolution(Q) then
MACOP Algorithm Preliminaries                                     14:         n := getBest(Q,℘)
This section contains definitions that are fundamental to the     15:     end if
design of the planner algorithm.                                  16: until Q = {} or SolutionFound(Q)
Definition 1 (State) A state S is a pair hP, Ri, where P is
the set of atomic propositions and R is an assignment of nu-
meric variables to values. A state describes what is true of
some world at a snapshot of time assuming a Closed World          Top level algorithm of MACOP
Assumption on S.                                                  The input to the planner is the initial state, goal condition
Definition 2 (Initial State) An Initial State is a state hP, Ri   and the domain model as defined in the preliminary defini-
that is true at the start of some planning problem.               tions. As well as this, tied to the particular domain that the
                                                                  system is being applied to, are the fixed horizon prediction
Definition 3 (Goal Condition) A Goal Condition is a con-          value N p (the amount of time for which the MPC compo-
dition G = hQ, Ni, where Q is a set of atomic propositions,       nent will form a guiding plan into the future), and the value
and N is a set of conditions on numeric variables. For a goal
to be achieved in some state (P, R), Q must be contained in          1 ”A Synthesis of Automated Planning and Model Predictive
P, and the values of numeric variables in R must satisfy the      Control Techniques and its Use in Solving Urban Traffic Control
conditions in N.                                                  Problem”, F. Jimoh, 2016
(:action switch_to_green
:parameters [at_junction, this_phase, from_road1, to_road2]
:precondition [(intersect at_junction this_phase from_road1 to_road2)]
                [(>(queueLenght (from_road1 0.0))
                (<(interuptLevel (to_road2 7.0))]
:effect ([(JflowActive at_junction this_phase from_road1 to_road2)]) )

                                            Figure 1: Sample of an Action Declaration

 (:process Jtraffic_flow
:parameters [at_junction, this_phase, from_road1, to_road2]
:precondition [(JflowActive at_junction
                this_phase from_road1 to_road2)]
:effect ([(decrease(queueLenght (from_road1 (* #t flowrate)))
        (increase(queueLenght (to_road2 (* #t flowrate)))]) )

                                           Figure 2: Sample of an Process Declaration

 (:event upstreamFilled
:parameters [at_junction, from_road1, to_road2]
:precondition [(>=(queueLenght (to_road2 capacity_of_road2)))]
:effect ([(assign(queueLenght (to_road2 capacity_of_road2))
            (assign(interuptLevel (to_road2 7.0))]) )

                                            Figure 3: Sample of an Event Declaration


of the control horizon window Nc (the number of nodes that         search, and uses the direction from the MPC component to
are searched between MPC prediction episodes). Both these          decide which the best node is to go forward with. Line 16 re-
values are determined a priori for the domain and kinds of         peats the search and optimisation processes from the current
problems that the planner is aimed at.                             node until SolutionFound flag is true or the search cannot
   A node in the search space is made up of 3 components:          find a solution (the open node set is empty).
(1) a set of propositions as previously defined as the “P”
component of a state (2) a sequence of the numerical vari-         Expanding Search Nodes
able components in the “R” component of the state, the se-         An action is an instance of an operator within the domain
quence capturing their history over a fixed time horizon (3)       model. Its preconditions could be logical, or both logical and
a partial plan. Lines 1-2 initialise a node. ℘ is the variable     numeric inequalities and its effect are logical and/or numeric
that stores dynamic prediction values output from the MPC          updates to the current state where the action is executed. For
process over successive horizons, and is initially set to null.    instance, the action ‘switch to green’ in Fig 1 has a logical
   Within the outer loop Line 4-5 initialise the search space,     precondition that the two roads must be intersected at the
and in Line 6 the MPC optimization process is called. This         junction and must be sharing the same green phase. It also
returns numeric values for control variables which can be          has numeric preconditions such as the interrupt level of the
interpreted as a set of predicted actions that if executed will    connected road must be less that interrupt level seven. The
lead towards the objective function.                               effect of this action changes the logical state of the phase
   The inner loop (Line 7 - 12) expands search within a fixed      at the junction of the two roads to be active, which subse-
horizon Nc . Within the loop the best open node n is chosen        quently starts a process at that junction in the next node.
and removed from Q. This choice is informed by the output             Search space expansion of the current node n occurs by
of MPC: the node which is closest to the trajectory given by       applying actions that satisfy the conditions at a node, or
the partial plan in the current ℘ is picked. Currently, MA-        by time passing for a unit of time. The effect of an action
COP uses no other built-in specific heuristics based on the        changes the state at a node, as shown in Algorithm 2. This
goal condition. In Line 9 the node chosen for expansion is         makes certain assumptions on the semantics of events, such
then expanded to return a set of successor nodes, N, which         as different orders of simultaneous events make no differ-
are added to the open node set. Details of the expansion al-       ence.
gorithm is given in Algorithm 2 below. After the inner loop           When a process is initiated at a given node, the process
exits, if no solution has been found, then the best node is        will run while its precondition holds. Whenever there is a
chosen, extracted from Q, and used as the start node for           process in our domain representation there must always be a
a new search within a new horizon. While the selection of          corresponding event to monitor and control the process.
one node creates incompleteness in the system, it limits the          A grounded process within the domain runs for a period
Algorithm 2 Algorithm for Expand(n)                                 prediction table of changes in numeric values over the period
Input: n                                                            of prediction horizon of 30 seconds. The generated values
Output: N                                                           will be passed to the optimiser to compute the best actions
                                                                    for the subsequent search period taking into considerations
  N := {}                                                           all the constraints in the domain.
  E := {e0 |e0 is an instantiation of some event e ∈ DM, and           How this table is generated depends on the application;
  n makes e0 .pre true };                                           below we illustrate this with a UTC domain.
  n := apply all events in E sequentially to n
  O := {o0 |o0 is an instantiation of some operator o ∈ DM,             Application of MACOP to a UTC Domain
  and n makes o0 .pre true }                                        We illustrate the working of the MACOP framework with
  for all o0 ∈ O do                                                 a specific UTC application. The application is extracted
      n0 := apply o0 to n                                           from a town center area in the United Kingdom.The do-
      N := N ∪ {(n0 .I, n0 .ℜ, [o0 ] + +n0 .S)}                     main model consists of both static and dynamic parts. The
  end for                                                           static part represents the road network topology, i.e., roads,
  P := {p0 |p0 is an instantiation of some process p ∈ DM,          their capacity, length and junctions connecting the roads.
  and n makes p0 .pre true }                                        The road network is represented by a directed graph, where
  for all p ∈ P do                                                  edges stand for road links and vertices stand for either junc-
      n := apply p for one time unit to n                           tions, source or sink roads. Sources are roads where vehicles
  end for                                                           enter the network, while sink road are roads where vehicles
  N := N ∪ {n}                                                      exit the network. The dynamic part represents the length of
                                                                    queues (relating to vehicle occupancy on a link) on each road
                                                                    and the flowrate of vehicles on such roads. The dynamic in-
of time once it is initiated within the search node. The time is    formation changes as vehicles are moving through the road
discretised into single step counts(E.g. t = 1, 2, 3...tn ) where   network.
tn is the duration of simulation of the process. Processes             For example, if a predicate (link nLSouth wDStr) is
are started as a result of an action initiating the process or      present in some state, then in this state the road nLSouth is
an event triggering the start of a process. The preconditions       linked to wDStr, thus allowing the flow of traffic from nL-
could be logical or numeric inequalities and their effects are      South to wDStr if all other constraints are satisfied. A UTC
also numeric update to the current state where the process          Planning Problem addresses the problem of effective navi-
is activated. An example of a process initialisation is the re-     gation of vehicles through a given road network from source
sulting effect of an action “switch-to-green” (Figure 1). This      to sink roads while optimising traffic flow. This is equated to
action effect could lead to a process flow of vehicles starting,    minimising the accumulated queues at each of the junctions
from one road link to another, at the rate of flow of traffic       in the network.
through that junction for the duration of active green phase           A “store-and-forward” traffic flow model is used to for-
at the junction (Figure 2). The process would continue to           mulate a state space MPC system for this work. The store-
run until the specified simulation time elapses or there is an      and-forward traffic flow model was proposed in 1963 by
internally generated event that halts the process.                  Gazis and Potts with the desire of getting a balanced trade-
                                                                    off between control accuracy and computational efforts. The
The MPC Solver: generating and using a dynamic                      mathematical treatment of this model is covered by various
prediction table                                                    works, and recently by Guo et al (Guo, Gang, and Zhang
                                                                    2014), hence we will not repeat the mathematics here but
Whenever the horizon control value Nc is reached, the stored        refer the reader to that work. The method sets up:
past numeric fluents in node n are used to generate a dy-
namic prediction table over the period of prediction horizon        1. a set of equations x(k) = x(k + 1) + K which relates the
count N p , via the SolveMPC procedure. The generated val-             number of vehicles in a road link x at time step k, with the
ues are passed to a numeric optimisation procedure to com-             number of vehicles in the same link x at step k+1, given
pute the best control values, ℘, for the next set of alterations       the effect of those flows over one unit of time. K encapsu-
taking into considerations all the constraints in the domain.          lates all the effects of flows into x from other links, all the
   The optimiser is a procedure which works as a planning as           flows out of x into other links, as well as the traffic sig-
satisfiability(SAT) problem solver (originally used in (Au-            nal condition at that step. Flow rates are estimated using
demard et al. 2002; Shin and Davis 2005)). The continu-                values that are obtained from historical traffic databases.
ous variables with their corresponding constraint values are           The equations gives the basic dynamical relations in the
translated at a given search node into a linear programming            system: they take into account all the processes affecting
problem. The best combination of inputs that satisfies the             link x at once (this is somewhat different to the domain
given numeric constraint are returned to the node.                     model’s process specification in Figure 2, which specifies
   For instance, assume Nc is set to 500 node count and N p is         one flow between two road links in isolation).
set to 30 seconds. The planner keeps track of the node counts       2. relations capturing all the constraints in the system: the
and retrieves past numeric fluents at every 500 node count. It         maximum and minimum green times, the maximum num-
will use this numeric information to generate a new dynamic            ber of vehicles that can occpy each link, etc.
   The series of linear equations/relations making up (i) and
(ii) are then input to a linear optimiser. The variables that the
optimiser can change are the green light times, at each junc-
tion, and the objective to be optimised (minimised) is the
sum of vehicles in queues. The output is the timings over
the look-a-head period of changes to the lights in each junc-
tion, which minimises the overall occupancy. These timings
equate to actions which change the lights over the horizon
period, and hence form heuristics to the search process in
the planner.

                  Setup and Evaluation
The MACOP algorithm, and the embedded MPC solver, are                  Figure 4: Run-times of MACOP with Fixed signal (right)
implemented in Netbeans Java 8.0. The domain and problem               and controlled (left). The y-axis shows run-time in microsec-
representation (in this case the traffic domain description)           onds (time taken to output a complete plan), the x-axis rep-
were also developed in Java syntax to ease the parser and              resents the size of the queue length. Thus, problems which
integration issues. The experiments were carried out on an             start with an increased queue length indicate a more con-
Intel(R) Core(TM) i7-4702MQ CPU, 2.20GHz, with 16GB                    gested network, and hence a more complex problem.
RAM.
   We evaluated the performance of MACOP on the UTC
domain previously introduced: the UTC problem is a tafffic               the green phase at a junction based on traffic demands
network problem, comprising of 12 roads connected by                     within the network.
junctions, 2 linked roads and 3 junctions. The number of ve-
hicles on road links is meansured by the queue length. Each               We constructed several problems within the UTC domain
junction is designed to have more than one signal stage to             with increasing complexities. These problems are useful for
test the ability of MACOP to split the green time between the          our evaluation as it highlights the benefits of the MPC in-
two stages at a junction based on current state of the queue           tegration (controlled signal) over the numeric AI searching
length associated to each road at the junction. The network            (fixed signal) mechanism. Fig 4 shows the times taken by
model also has connected roads linking other roads without             MACOP to solve problems in our test suite.
a signaled junction, in order to test the ability of MACOP to             The same time discretisation is used t = 1.0, for all prob-
reason with the dynamic state of those connected roads that            lems in our UTC domain.
are not directly linked to a junction in the network.                     We investigated the speed of MACOP on different vol-
   The plans that MACOP generates contains a list of switch            umes of traffic and bottlenecks to test the performance of
actions over time representing signal changes in the junc-             MACOP during traffic congestion. We also evaluated the
tions in the region. The switch action is executed at a time           quality of plans generated by comparing the number of ac-
when it anticipates a better optimum green stage than the              tions, processes and plan length in the fix signalled plan to
fixed value during search space.                                       the controlled signalled plan. We generated several traffic
   We evaluated the effectiveness of the embedded MPC ap-              flows by increasing the percentage of queues to create heav-
proach in the MACOP algorithm to optimise traffic flow dur-            ier traffic flow in the experiment.
ing changes in traffic situation. We tested the performance
of the planner based on its ability at controlling the signaled                                Discussion
junctions to accommodate for the changes in traffic situa-             The results show that both fixed and controlled strategies
tion. As no benchmark set exists yet for PDDL+ problems,               perform well in lesser traffic conditions (problems with less
we created a variation of MACOP, a version without MPC                 complexities). However, there is a vast difference between
integrated (Fixed Signal) that reasons with numerics like a            the two instances when the traffic condition becomes heav-
numeric planner (Hoffmann 2003).                                       ier with increasing bottlenecks (i.e increasing complexities).
   Both instances used the same formulation of the given do-           The planning time of the controlled instance is steady while
main and problems:                                                     the performance of the fixed time strategy gets worse with
                                                                       increasing problem complexities. The total number of ac-
Fixed The signal duration at the junctions are fixed from
                                                                       tions and processes in the fix instance is 45% more than the
  the initial state to the goal state: the planner decides what
                                                                       total number of actions and processes in the controlled in-
  flows are turned on/off by the signals. Hence, an action
                                                                       stance. Also, the plan length of the controlled instance is
  could be to switch a (fixed duration) stage from red to
                                                                       less than the plan length of the fixed instance. This gives
  green.
                                                                       evidence that MACOP generates quality plans. In terms of
Controlled The signal is fully controlled by the planner. In           coverage, both configurations are able to solve the problems
  this case, the signal is fixed at the initial state, but the plan-   in the domain, but the runtime is less in the controlled in-
  ner can change the signal duration during search space us-           stance than the fixed instance. This shows the advantage of
  ing the embedded MPC approach in MACOP to optimise                   the more informed MACOP reaching goal conditions in less
 time.                                                             neering 1930-1955. Hitchin, Herts., UK, UK: Peter Peregri-
    The ability to create rich representations of UTC do-          nus, 1st edition.
 main makes it easier to reason with some logical constraints     [Blum and Furst 1995] Blum, A., and Furst, M. 1995. Fast
 within the road network, in contrast to a classic MPC con-        planning through planning graph analysis. In Proceedings of
 troller might not be able to handle them. The MPC approach,       the International Joint Conference on Artificial Inteligence
 on the other hand, utilising domain-dependent knowledge,          (IJCAI-95).
 helps to dynamically control the green split which the
                                                                  [Bresina et al. 2013] Bresina, J. L.; Dearden, R.; Meuleau,
 searching mechanism might not be able to simulate. Com-
                                                                   N.; Ramakrishnan, S.; Smith, D. E.; and Washington, R.
 bining the two approaches this way helps to control a sig-
                                                                   2013. Planning under continuous time and resource uncer-
 nalled junction while still taking care of the logical reason-
                                                                   tainty: A challenge for ai. CoRR abs/1301.0559.
 ing within the network of roads.
    At the moment, the planner works offline. The genera-         [Camacho and Bordons 1999] Camacho, E., and Bordons, C.
 tion of plans is only based on the formal description of the      1999. Model predictive control. London: Springer.
 environment. The state of the system at the time of execut-      [Cimatti et al. 1997] Cimatti,       A.;     Giunchiglia,      F.;
 ing the plan is assumed to be adequately modelled. Hence,         Giunchiglia, E.; and Traverso, P. 1997. Planning via
 a change in a state prior to plan execution is not taken into     model checking: A decision procedure for r. In Steel, S.,
 consideration. Thus, the domain model is robust enough to         and Alami, R., eds., ECP, volume 1348 of Lecture Notes in
 take care of the gaps or differences between the conceptual       Computer Science, 130–142. Springer.
 model and the real world. In a dynamic environment where         [Coles et al. 2008] Coles, A. I.; Fox, M.; Long, D.; and
 unpredictable changes are likely to occur, however, the plan-     Smith, A. J. 2008. Planning with problems requiring tem-
 ner will need to have constant feedback from the effect of its    poral coordination. In Proc. 23rd AAAI Conf. on Artificial
 actions on the domain environment. Thus, online planning          Intelligence.
 will be employed in this situation.
                                                                  [Coles et al. 2012] Coles, A. J.; Coles, A.; Fox, M.; and
                                                                   Long, D. 2012. Colin: Planning with continuous linear nu-
                        Conclusion                                 meric change. J. Artif. Intell. Res. (JAIR) 44:1–96.
 In this paper we have described a UTC dependent, but sce-        [Fox and Long 2006] Fox, M., and Long, D. 2006. Mod-
 nario and configuration independent planning system called        elling mixed discrete-continuous domains for planning. J.
 MACOP. It supports the encoding of domains containing             Art. Int. Res. (JAIR) 27:235–297.
 continuously changing processes, events and actions. We          [Fox, Howey, and Long 2005] Fox, M.; Howey, R.; and
 presented a new approach for such problems areas, integrat-       Long, D. 2005. Validating plans in the context of processes
 ing MPC and search space planning, by using MPC as a con-         and exogenous events. In Veloso, M. M., and Kambhampati,
 trol heuristic for a discretised forward search space. We de-     S., eds., AAAI, 1151–1156. AAAI Press / The MIT Press.
 scribed the implementation and performance of our MACOP
 hybrid algorithm, when tested on a network of connected          [Guo, Gang, and Zhang 2014] Guo, C.; Gang, X.; and
 roads. While our implementation is not yet competitive with       Zhang, M. 2014. Model predictive control implementation
 state of the art hybrid planners, we have used the application    and simulation for urban traffic networks. In Proceedings of
 to UTC to show the feasibility and promise of this partticu-      2014 IEEE International Conference on Service Operations
 lar type of hybrid integration. Future work includes integrat-    and Logistics, and Informatics, 334–340.
 ing state-of-the-art heuristics into the search space planning   [Hoffmann 2003] Hoffmann, J. 2003. The Metric-FF Plan-
 process to improve the performance, stability and robustness      ning System: Translating “Ignoring Delete Lists” to Nu-
 of the planner. Also, a more efficient optimiser needs to be      meric State Variables. J. Art. Int. Res. (JAIR) 20:291–341.
 employed to improve the robustness to larger networks of         [Jimoh 2015] Jimoh, F. 2015. A synthesis of automated plan-
 constraints, rather than the simple solver that was employed      ning and model predictive control techniques and its use in
 in the current implementation.                                    solving urban traffic control problem.
                                                                  [Li and Williams 2008] Li, H., and Williams, B. 2008. Gen-
                        References                                 erative systems for hybrid planning based on flow tubes.
[Al-Gherwi, Budman, and Elkamel 2011] Al-Gherwi, W.;               In Proc. 18th Int. Conf. on Aut. Planning and Scheduling
 Budman, H.; and Elkamel, A. 2011. A robust distributed            (ICAPS).
 model predictive control algorithm. Journal of Process           [Löhr et al. 2012] Löhr, J.; Eyerich, P.; Keller, T.; and Nebel,
 Control 21(8):1127–1137.                                          B. 2012. A planning based framework for controlling hybrid
[Audemard et al. 2002] Audemard, G.; Bertoli, P.; Cimatti,         systems. In ICAPS.
 A.; Kornilowicz, A.; and Sebastiani, R. 2002. A SAT-based        [McDermott 2003] McDermott, D. 2003. Reasoning about
 approach for solving formulas over boolean and linear math-       Autonomous Processes in an Estimated Regression Planner.
 ematical propositions. In Proc. 18th Int. Conf. on Automated      In Proc. 13th Int. Conf. on Aut. Planning and Scheduling
 Deduction, volume 2392, 193–208. Springer-Verlag, LNAI            (ICAPS).
 Series.                                                          [Penberthy and Weld 1994] Penberthy, S., and Weld, D.
[Bennett 1993] Bennett, S. 1993. A History of Control Engi-        1994. Temporal Planning with Continuous Change. In
 Proc. 12th Nat. Conf. on AI (AAAI), 1010–1015. AAAI/MIT
 Press.
[Penna et al. 2010] Penna, G. D.; Intrigila, B.; Magazzeni,
 D.; and Mercorio, F. 2010. A pddl+ benchmark problem:
 The batch chemical plant. In ICAPS’10, 222–225.
[Piacentini et al. 2013] Piacentini, C.; Alimisis, V.; Fox, M.;
 and Long, D. 2013. Combining a temporal planner with
 an external solver for the power balancing problem in an
 electricity network. In ICAPS.
[Shin and Davis 2005] Shin, J., and Davis, E. 2005. Pro-
 cesses and Continuous Change in a SAT-based Planner. Art.
 Int. (AIJ) 166:194–253.
[Tay 2007] Tay, M. 2007. Model predictive cost control.
 Control Engineering 54(8):IE9.
[Veselý, Rosinov, and Foltin 2010] Veselý, V.; Rosinov, D.;
 and Foltin, M. 2010. Robust model predictive control design
 with input constraints. ISA Transactions 49(1):114–120.