Towards The Integration of Model Predictive Control into an AI Planning Framework Falilat Jimoh and Thomas L. McCluskey PARK Research group University of Huddersfield Abstract output controls that take into account future events. On the other hand, these approaches are not as flexible as the AIP This paper describes a framework for a hybrid algo- & S approaches. rithm that combines both AI Planning and Model Pre- dictive Control approaches to reason with processes and This work aims to inegrate Control Engineering and AI events within a domain. This effectively utilises the Planning techniques by formulating plan generation algo- strengths of search-based and model-simulation-based methods. We explore this control approach and show rithms which combine the advantages of both approaches how it can be embedded into existing, modern AI Plan- into one unified approach(Jimoh 2015). The algorithms ning technology. This preserves the many advantages of should be able to produce readable plans to achieve goal the AI Planning approach, to do with domain indepen- conditions where the dynamical system model includes con- dence through declarative modelling, and explicit rea- tinuous processes. In this paper, a hybrid approach is pre- soning, while leveraging the capability of MPC to deal sented where the Model Predictive Control (MPC) approach with continuous processes computation within such do- is used in conjunction with a traditional state-based planning mains. The developed technique is tested on an urban system. This allows for the effective planning of both logical traffic control application and the results demonstrate and numerical change in a problem specification. the potential in utilising MPC as a heurisic to guide planning search. The resulting approach, Model predictive control Ap- proach to Continuous Planning (MACOP), is able to gen- Introduction erate plans in domains containing actions, events and pro- cesses; discrete, real-valued, and interval-valued fluents; and The area of domain independent planning involves mod- continuous change to quantities. While there has been a elling a system in a knowledge-based way, with declara- growing amount of interest in seeking to unite work in con- tive data structures representing goals, states, resources, ac- trol engineering and AI hybrid planning (e.g. (Löhr et al. tions to mention a few, and creating tools that can reason 2012), MACOPs novel contribution is to use the control out- about them logically. Plans are created as output to achieve put from an MPC routine as a heuristic within an AI planner goal conditions in a future state. As well as the flexibil- - that is to guide forward search planning using a discretized ity of input language, a characteristic of this approach is problem and domain model. that the human user can understand and inspect the output plan, to help validate the approach, and to promote a mixed- This paper describes an implementation of the planner initiative interaction with the system operators. Automated used to control the light signals in the application area of planning and scheduling is increasingly been used to solve urban traffic conntrol using a flow model of traffic. We are real-world planning problems. Current limitations in state- targeting those applications which require a plan to be gen- of-the-art planning algorithms present significant limitations erated a priori (rather than planning in a tight plan-execution when trying to plan for domains which contains continuous loop), such as in Urban Traffic Control (UTC). While given numeric change. Few planning engines can input models the uncertainty in the UTC area, a plan - re-planning loop described by continuous processes, and the computational is often necessary, but there is also a requirement to vali- complexity of the planning problem can be prohibitive with- date such automated strategies by inspection prior to execu- out an appropriate planning approach or strong heuristics. tion (e.g the problem to be solved might be planning for a Control Engineering researchers, on the other hand, have known road closure or predicted saturated road conditions). developed techniques to control a continuous process by the This paper shows the feasibility of using this kind of hybrid varying of a control variable over a fixed time horizon. The approach to generate plans for such problems involving hy- advantages of Control Engineering approaches in this area, brid states. While the paper describes a domain dependent such as embedded in the popular “Model Predictive Con- planner, we propose that the framework could be used with trol”(MPC) technique, is that they can work with dynamical an existing planner as the basis for a domain independent systems described as continuously changing processes, and version. Background 2005). It was the first planner to implement the PDDL+ Se- Predictive controls are a branch of Control Engineering that mantics and in addition had the capability of handling vari- are used in adapting and forecasting the future trend of con- able durative actions. This includes durative actions with trol processes in other to manipulates it inputs for a desir- continuous effects and duration-dependent end-effects. It able result in a future time. There exist different types of uses PDDL+ semantics to compile a collection of SAT for- predictive controls, for example, the Receding Horizon Pre- mulas from a horizon bounded continuous planning prob- dictive Control(RHC); the Generalised Predictive Control lem, together with an associated set of linear metric con- (GPC) and the Model Predictive Controls (MPC). MPC is straints over numeric variables. The compiled formulation used to predict the future behaviour of processes or out- is passed to a SAT-based arithmetic constraint solver, LP- put of a system over a period of time in the future. This SAT (Audemard et al. 2002). The SAT-solver parses trig- is achieved by computing the future input variables at each gered constraints to the LP-solver, if there is no solution the step while minimising a cost function under disparity con- horizon is increased and the process repeats, otherwise the straints on the manipulated controls and controlled variable. solution is decoded into a plan. The novelty of TM-LPSAT MPC applies only the first set of control variables on the lies in the compilation of the PDDL+ semantics and decod- controlled system and repeats the previous step with new ing of the SAT solver into a plan, since both solvers are well- measured numeric variables (Veselý, Rosinov, and Foltin established systems. Kongming (Li and Williams 2008; ?)is 2010). MPC has attracted notable attention in the control of another domain dependent continuous planner that solves a dynamic systems and has gained an important role in pro- class of control planning problems with continuous dynam- cess control. It was developed in the industrial area as an al- ics. The language used is a version of PDDL2.1 extended ternative algorithm control to the conventional Proportional to enable dynamics to be encoded. It is based on the build- Integrate Derivative (Bennett 1993) (PID) due to it ability ing of fact and action layers of flow tubes, using the itera- to reason with the model of a system under consideration. tive plan graph structure of Graphplan algorithm (Blum and The MPC formulation integrates optimal control, stochastic Furst 1995). As the graph is expands, every action produces control, control of processes with dead time, multi-variable a flow tube which contains the valid trajectories as they de- control and future references when available (Camacho and velop over a period of time. Reachable states at any time can Bordons 1999). There are various MPC algorithms, these be computed using the state equations of the system starting algorithms has been constantly improved and refined to in- from a feasible region, and applying actions whose precon- crease its robustness for real time processes (Tay 2007; ditions intersect with the feasible region. Kongming trans- Al-Gherwi, Budman, and Elkamel 2011). lates a planning problem into Mixed Logical-Quadratic Pro- Automated Planning with continuous processes allows us gram (MLQP) using the plan-graph encoding with the con- to model actions, process and events as part of domain op- tinuous dynamics of the system. The planners metric objec- erators. This extends temporal planning by considering con- tives function can be defined in terms of quadratic function stantly changing state variables with respect to time. Auto- of state variable. Time is discretised to support state update mated planning in domains which are represented with rich within the plan - successive layers of the graph are separated notations has long been a great challenge for AI (Bresina et by a constant and uniform time increment. al. 2013). For instance, changes occurring due to fuel con- UPMurphi is another planner that reasons with continu- sumption, continuous movement, or environmental condi- ous processes (Penna et al. 2010). It alternatively refines a tions may not be adequately modelled through instantaneous discretisation of continuous changes until the solution to the or even durative actions; rather these require modelling as discretised problem validates against the original problem continuously changing processes. The combination of time specification. UPMurphi starts by discretising the continu- dependent problems and numeric optimisation problem cre- ous representation of the problem. Specific values within ate a more challenging and hard task of time-dependent met- feasible ranges are taken as actions, giving rise to several ric fluents. version of each action. The current discretisation is then One of the earliest works that involves planning with con- used to explicitly construct and explore the state space. Plans tinuous processes includes the Zeno system (Penberthy and are constructed in the form of planning-as-model-checking Weld 1994). In this case, processes are described using dif- paradigm (Cimatti et al. 1997) with no heuristic to guide the ferential equations rather than as continuous update effects, search (users can insert their own). When a plan is found, so that simultaneous equations must be consistent with one it is validated against the original continuous model, using another rather than accumulating additive effects. McDer- the plan validator (Fox, Howey, and Long 2005). If it fails motts OPTOP system (McDermott 2003) is another early to find a plan at one discretisation, it iterates again at a finer planner to handle continuous processes. A forward search grained discretisation. Successive refinements lead to ever planner that avoids grounding the representation using a denser feasible regions, which might be increasingly com- unique approach to generate heuristics(relaxed plan esti- plex to construct. mates of the number of actions required to achieve the Goal) COLIN (Coles et al. 2012) is a forward-chaining heuris- from a given state. tic search planner is capable of reasoning with continuous More recently, the syntax and semantics of a hybrid lan- linear numeric change. It combines forward chaining search guage was brought into the PDDL family in the form of of FF to prune state, with the use of a Linear Program (LP) PDDL+ (Fox and Long 2006). TM-LPSAT (Shin and Davis to check the consistency of the interacting temporal and nu- 2005) was built upon the earlier LPSAT (Shin and Davis meric constraints at each state. The Temporal Relaxed Plan- ning Graph heuristic of CRIKEY3 (Coles et al. 2008) is also Definition 4 (Domain Model) A Domain Model DM, con- extended to support reasoning with continuous change. A sist of: mix integer programming is used for post processes to opti- • A set of Propositions {p1 , ..., pk } mise the timestamps of the actions in the plan. As the diversity of potential planning applications has in- • A set of numeric Functions {n1 , ..., nk } creased, so has the complexity of the continuous domain • A set of Resources {r1 , ..., rk } knowledge. In order to circumvent this limitation, some re- • A set of Actions {a1 , ..., ak } searchers are currently relaxing the complexity of the do- • A set of Processes {c1 , ..., ck } main by discretising continuous change into discrete pro- files of linear change (Piacentini et al. 2013). in a similar • A set of Events {e1 , ..., ek } vein, in Domain Predictive Control (Löhr et al. 2012), a The domain description language syntax and semantics discrete domain model is derived from the equations gov- used in this implementation is similar to PDDL+ (Fox and erning dynamics in the application domain, and AI plan- Long 2006). To save space, we refer the reader to the defini- ning is used to generate plans (using durative planning with tion of PDDL+ for prelimiary definitions. The exact defini- PDDL 2.1). The application area is continuous (re-)planning tions (which are at variance a little with PDDL+) are given in “switched hybrid systems”. The idea is to start with the in Jimoh’s thesis1 . The example action, process and event dynamical equations, and generate a discrete domain model definitions in Figs 1 - 3 below, taken from the flow model of from that; this contrasts with the work in this paper, which urban traffic control that we will develop later in the paper, assumes that we create a symbolic domain model containing illustrate the idea. processes, events and actions, then use MPC derived from a model of dynamical equations as a heuristic to control for- Algorithm 1 Top level algorithm of MACOP ward search in a symbolic planning search space. While in Input: the DPC work the emphasis is on applications requiring real (P,R) : initial state time control, in our work we are more interested in creating G: Goal Condition a complete readable plan before execution. DM: Domain Model Nc : control horizon window The MACOP Framework N p : horizon prediction window In overview, MACOP inputs a planning domain model and Output: Plan. problem, and searches through a space of nodes using a best- first heuristic to find a complete plan. A node is a point in 1: ℜ := [ R ]; S := [ ]; ℘ := null; a search space at which search frontiers or pathways inter- 2: n := (P, ℜ, S) sect or branch. Operators’ preconditions are checked against 3: repeat propositions and numeric fluents at each node, if one is sat- 4: Q := {n} isfied, the operator effect is applied and the new state be- 5: i := 1 comes the current state. The model-based numeric optimi- 6: ℘ := SolveMPC(n, DM, Nc , N p ,℘) sation problem within the domain model is solved at spe- 7: while Q 6= {} and i ≤ Nc and NoSolution(Q) do cific nodes during node exploration. The search proceeds by 8: n := getBest(Q,℘) applying each applicable operator to the current states in a 9: N := Expand(n) receding horizon until a goal state is found or the node set is 10: Q := AddTo(N, Q) empty. The following subsection gives a description of the 11: i := i + 1 details of the algorithm. 12: end while 13: if Q 6= {} and NoSolution(Q) then MACOP Algorithm Preliminaries 14: n := getBest(Q,℘) This section contains definitions that are fundamental to the 15: end if design of the planner algorithm. 16: until Q = {} or SolutionFound(Q) Definition 1 (State) A state S is a pair hP, Ri, where P is the set of atomic propositions and R is an assignment of nu- meric variables to values. A state describes what is true of some world at a snapshot of time assuming a Closed World Top level algorithm of MACOP Assumption on S. The input to the planner is the initial state, goal condition Definition 2 (Initial State) An Initial State is a state hP, Ri and the domain model as defined in the preliminary defini- that is true at the start of some planning problem. tions. As well as this, tied to the particular domain that the system is being applied to, are the fixed horizon prediction Definition 3 (Goal Condition) A Goal Condition is a con- value N p (the amount of time for which the MPC compo- dition G = hQ, Ni, where Q is a set of atomic propositions, nent will form a guiding plan into the future), and the value and N is a set of conditions on numeric variables. For a goal to be achieved in some state (P, R), Q must be contained in 1 ”A Synthesis of Automated Planning and Model Predictive P, and the values of numeric variables in R must satisfy the Control Techniques and its Use in Solving Urban Traffic Control conditions in N. Problem”, F. Jimoh, 2016 (:action switch_to_green :parameters [at_junction, this_phase, from_road1, to_road2] :precondition [(intersect at_junction this_phase from_road1 to_road2)] [(>(queueLenght (from_road1 0.0)) (<(interuptLevel (to_road2 7.0))] :effect ([(JflowActive at_junction this_phase from_road1 to_road2)]) ) Figure 1: Sample of an Action Declaration (:process Jtraffic_flow :parameters [at_junction, this_phase, from_road1, to_road2] :precondition [(JflowActive at_junction this_phase from_road1 to_road2)] :effect ([(decrease(queueLenght (from_road1 (* #t flowrate))) (increase(queueLenght (to_road2 (* #t flowrate)))]) ) Figure 2: Sample of an Process Declaration (:event upstreamFilled :parameters [at_junction, from_road1, to_road2] :precondition [(>=(queueLenght (to_road2 capacity_of_road2)))] :effect ([(assign(queueLenght (to_road2 capacity_of_road2)) (assign(interuptLevel (to_road2 7.0))]) ) Figure 3: Sample of an Event Declaration of the control horizon window Nc (the number of nodes that search, and uses the direction from the MPC component to are searched between MPC prediction episodes). Both these decide which the best node is to go forward with. Line 16 re- values are determined a priori for the domain and kinds of peats the search and optimisation processes from the current problems that the planner is aimed at. node until SolutionFound flag is true or the search cannot A node in the search space is made up of 3 components: find a solution (the open node set is empty). (1) a set of propositions as previously defined as the “P” component of a state (2) a sequence of the numerical vari- Expanding Search Nodes able components in the “R” component of the state, the se- An action is an instance of an operator within the domain quence capturing their history over a fixed time horizon (3) model. Its preconditions could be logical, or both logical and a partial plan. Lines 1-2 initialise a node. ℘ is the variable numeric inequalities and its effect are logical and/or numeric that stores dynamic prediction values output from the MPC updates to the current state where the action is executed. For process over successive horizons, and is initially set to null. instance, the action ‘switch to green’ in Fig 1 has a logical Within the outer loop Line 4-5 initialise the search space, precondition that the two roads must be intersected at the and in Line 6 the MPC optimization process is called. This junction and must be sharing the same green phase. It also returns numeric values for control variables which can be has numeric preconditions such as the interrupt level of the interpreted as a set of predicted actions that if executed will connected road must be less that interrupt level seven. The lead towards the objective function. effect of this action changes the logical state of the phase The inner loop (Line 7 - 12) expands search within a fixed at the junction of the two roads to be active, which subse- horizon Nc . Within the loop the best open node n is chosen quently starts a process at that junction in the next node. and removed from Q. This choice is informed by the output Search space expansion of the current node n occurs by of MPC: the node which is closest to the trajectory given by applying actions that satisfy the conditions at a node, or the partial plan in the current ℘ is picked. Currently, MA- by time passing for a unit of time. The effect of an action COP uses no other built-in specific heuristics based on the changes the state at a node, as shown in Algorithm 2. This goal condition. In Line 9 the node chosen for expansion is makes certain assumptions on the semantics of events, such then expanded to return a set of successor nodes, N, which as different orders of simultaneous events make no differ- are added to the open node set. Details of the expansion al- ence. gorithm is given in Algorithm 2 below. After the inner loop When a process is initiated at a given node, the process exits, if no solution has been found, then the best node is will run while its precondition holds. Whenever there is a chosen, extracted from Q, and used as the start node for process in our domain representation there must always be a a new search within a new horizon. While the selection of corresponding event to monitor and control the process. one node creates incompleteness in the system, it limits the A grounded process within the domain runs for a period Algorithm 2 Algorithm for Expand(n) prediction table of changes in numeric values over the period Input: n of prediction horizon of 30 seconds. The generated values Output: N will be passed to the optimiser to compute the best actions for the subsequent search period taking into considerations N := {} all the constraints in the domain. E := {e0 |e0 is an instantiation of some event e ∈ DM, and How this table is generated depends on the application; n makes e0 .pre true }; below we illustrate this with a UTC domain. n := apply all events in E sequentially to n O := {o0 |o0 is an instantiation of some operator o ∈ DM, Application of MACOP to a UTC Domain and n makes o0 .pre true } We illustrate the working of the MACOP framework with for all o0 ∈ O do a specific UTC application. The application is extracted n0 := apply o0 to n from a town center area in the United Kingdom.The do- N := N ∪ {(n0 .I, n0 .ℜ, [o0 ] + +n0 .S)} main model consists of both static and dynamic parts. The end for static part represents the road network topology, i.e., roads, P := {p0 |p0 is an instantiation of some process p ∈ DM, their capacity, length and junctions connecting the roads. and n makes p0 .pre true } The road network is represented by a directed graph, where for all p ∈ P do edges stand for road links and vertices stand for either junc- n := apply p for one time unit to n tions, source or sink roads. Sources are roads where vehicles end for enter the network, while sink road are roads where vehicles N := N ∪ {n} exit the network. The dynamic part represents the length of queues (relating to vehicle occupancy on a link) on each road and the flowrate of vehicles on such roads. The dynamic in- of time once it is initiated within the search node. The time is formation changes as vehicles are moving through the road discretised into single step counts(E.g. t = 1, 2, 3...tn ) where network. tn is the duration of simulation of the process. Processes For example, if a predicate (link nLSouth wDStr) is are started as a result of an action initiating the process or present in some state, then in this state the road nLSouth is an event triggering the start of a process. The preconditions linked to wDStr, thus allowing the flow of traffic from nL- could be logical or numeric inequalities and their effects are South to wDStr if all other constraints are satisfied. A UTC also numeric update to the current state where the process Planning Problem addresses the problem of effective navi- is activated. An example of a process initialisation is the re- gation of vehicles through a given road network from source sulting effect of an action “switch-to-green” (Figure 1). This to sink roads while optimising traffic flow. This is equated to action effect could lead to a process flow of vehicles starting, minimising the accumulated queues at each of the junctions from one road link to another, at the rate of flow of traffic in the network. through that junction for the duration of active green phase A “store-and-forward” traffic flow model is used to for- at the junction (Figure 2). The process would continue to mulate a state space MPC system for this work. The store- run until the specified simulation time elapses or there is an and-forward traffic flow model was proposed in 1963 by internally generated event that halts the process. Gazis and Potts with the desire of getting a balanced trade- off between control accuracy and computational efforts. The The MPC Solver: generating and using a dynamic mathematical treatment of this model is covered by various prediction table works, and recently by Guo et al (Guo, Gang, and Zhang 2014), hence we will not repeat the mathematics here but Whenever the horizon control value Nc is reached, the stored refer the reader to that work. The method sets up: past numeric fluents in node n are used to generate a dy- namic prediction table over the period of prediction horizon 1. a set of equations x(k) = x(k + 1) + K which relates the count N p , via the SolveMPC procedure. The generated val- number of vehicles in a road link x at time step k, with the ues are passed to a numeric optimisation procedure to com- number of vehicles in the same link x at step k+1, given pute the best control values, ℘, for the next set of alterations the effect of those flows over one unit of time. K encapsu- taking into considerations all the constraints in the domain. lates all the effects of flows into x from other links, all the The optimiser is a procedure which works as a planning as flows out of x into other links, as well as the traffic sig- satisfiability(SAT) problem solver (originally used in (Au- nal condition at that step. Flow rates are estimated using demard et al. 2002; Shin and Davis 2005)). The continu- values that are obtained from historical traffic databases. ous variables with their corresponding constraint values are The equations gives the basic dynamical relations in the translated at a given search node into a linear programming system: they take into account all the processes affecting problem. The best combination of inputs that satisfies the link x at once (this is somewhat different to the domain given numeric constraint are returned to the node. model’s process specification in Figure 2, which specifies For instance, assume Nc is set to 500 node count and N p is one flow between two road links in isolation). set to 30 seconds. The planner keeps track of the node counts 2. relations capturing all the constraints in the system: the and retrieves past numeric fluents at every 500 node count. It maximum and minimum green times, the maximum num- will use this numeric information to generate a new dynamic ber of vehicles that can occpy each link, etc. The series of linear equations/relations making up (i) and (ii) are then input to a linear optimiser. The variables that the optimiser can change are the green light times, at each junc- tion, and the objective to be optimised (minimised) is the sum of vehicles in queues. The output is the timings over the look-a-head period of changes to the lights in each junc- tion, which minimises the overall occupancy. These timings equate to actions which change the lights over the horizon period, and hence form heuristics to the search process in the planner. Setup and Evaluation The MACOP algorithm, and the embedded MPC solver, are Figure 4: Run-times of MACOP with Fixed signal (right) implemented in Netbeans Java 8.0. The domain and problem and controlled (left). The y-axis shows run-time in microsec- representation (in this case the traffic domain description) onds (time taken to output a complete plan), the x-axis rep- were also developed in Java syntax to ease the parser and resents the size of the queue length. Thus, problems which integration issues. The experiments were carried out on an start with an increased queue length indicate a more con- Intel(R) Core(TM) i7-4702MQ CPU, 2.20GHz, with 16GB gested network, and hence a more complex problem. RAM. We evaluated the performance of MACOP on the UTC domain previously introduced: the UTC problem is a tafffic the green phase at a junction based on traffic demands network problem, comprising of 12 roads connected by within the network. junctions, 2 linked roads and 3 junctions. The number of ve- hicles on road links is meansured by the queue length. Each We constructed several problems within the UTC domain junction is designed to have more than one signal stage to with increasing complexities. These problems are useful for test the ability of MACOP to split the green time between the our evaluation as it highlights the benefits of the MPC in- two stages at a junction based on current state of the queue tegration (controlled signal) over the numeric AI searching length associated to each road at the junction. The network (fixed signal) mechanism. Fig 4 shows the times taken by model also has connected roads linking other roads without MACOP to solve problems in our test suite. a signaled junction, in order to test the ability of MACOP to The same time discretisation is used t = 1.0, for all prob- reason with the dynamic state of those connected roads that lems in our UTC domain. are not directly linked to a junction in the network. We investigated the speed of MACOP on different vol- The plans that MACOP generates contains a list of switch umes of traffic and bottlenecks to test the performance of actions over time representing signal changes in the junc- MACOP during traffic congestion. We also evaluated the tions in the region. The switch action is executed at a time quality of plans generated by comparing the number of ac- when it anticipates a better optimum green stage than the tions, processes and plan length in the fix signalled plan to fixed value during search space. the controlled signalled plan. We generated several traffic We evaluated the effectiveness of the embedded MPC ap- flows by increasing the percentage of queues to create heav- proach in the MACOP algorithm to optimise traffic flow dur- ier traffic flow in the experiment. ing changes in traffic situation. We tested the performance of the planner based on its ability at controlling the signaled Discussion junctions to accommodate for the changes in traffic situa- The results show that both fixed and controlled strategies tion. As no benchmark set exists yet for PDDL+ problems, perform well in lesser traffic conditions (problems with less we created a variation of MACOP, a version without MPC complexities). However, there is a vast difference between integrated (Fixed Signal) that reasons with numerics like a the two instances when the traffic condition becomes heav- numeric planner (Hoffmann 2003). ier with increasing bottlenecks (i.e increasing complexities). Both instances used the same formulation of the given do- The planning time of the controlled instance is steady while main and problems: the performance of the fixed time strategy gets worse with increasing problem complexities. The total number of ac- Fixed The signal duration at the junctions are fixed from tions and processes in the fix instance is 45% more than the the initial state to the goal state: the planner decides what total number of actions and processes in the controlled in- flows are turned on/off by the signals. Hence, an action stance. Also, the plan length of the controlled instance is could be to switch a (fixed duration) stage from red to less than the plan length of the fixed instance. This gives green. evidence that MACOP generates quality plans. In terms of Controlled The signal is fully controlled by the planner. In coverage, both configurations are able to solve the problems this case, the signal is fixed at the initial state, but the plan- in the domain, but the runtime is less in the controlled in- ner can change the signal duration during search space us- stance than the fixed instance. This shows the advantage of ing the embedded MPC approach in MACOP to optimise the more informed MACOP reaching goal conditions in less time. neering 1930-1955. Hitchin, Herts., UK, UK: Peter Peregri- The ability to create rich representations of UTC do- nus, 1st edition. main makes it easier to reason with some logical constraints [Blum and Furst 1995] Blum, A., and Furst, M. 1995. Fast within the road network, in contrast to a classic MPC con- planning through planning graph analysis. In Proceedings of troller might not be able to handle them. The MPC approach, the International Joint Conference on Artificial Inteligence on the other hand, utilising domain-dependent knowledge, (IJCAI-95). helps to dynamically control the green split which the [Bresina et al. 2013] Bresina, J. L.; Dearden, R.; Meuleau, searching mechanism might not be able to simulate. Com- N.; Ramakrishnan, S.; Smith, D. E.; and Washington, R. bining the two approaches this way helps to control a sig- 2013. Planning under continuous time and resource uncer- nalled junction while still taking care of the logical reason- tainty: A challenge for ai. CoRR abs/1301.0559. ing within the network of roads. At the moment, the planner works offline. The genera- [Camacho and Bordons 1999] Camacho, E., and Bordons, C. tion of plans is only based on the formal description of the 1999. Model predictive control. London: Springer. environment. The state of the system at the time of execut- [Cimatti et al. 1997] Cimatti, A.; Giunchiglia, F.; ing the plan is assumed to be adequately modelled. Hence, Giunchiglia, E.; and Traverso, P. 1997. Planning via a change in a state prior to plan execution is not taken into model checking: A decision procedure for r. In Steel, S., consideration. Thus, the domain model is robust enough to and Alami, R., eds., ECP, volume 1348 of Lecture Notes in take care of the gaps or differences between the conceptual Computer Science, 130–142. Springer. model and the real world. In a dynamic environment where [Coles et al. 2008] Coles, A. I.; Fox, M.; Long, D.; and unpredictable changes are likely to occur, however, the plan- Smith, A. J. 2008. Planning with problems requiring tem- ner will need to have constant feedback from the effect of its poral coordination. In Proc. 23rd AAAI Conf. on Artificial actions on the domain environment. Thus, online planning Intelligence. will be employed in this situation. [Coles et al. 2012] Coles, A. J.; Coles, A.; Fox, M.; and Long, D. 2012. Colin: Planning with continuous linear nu- Conclusion meric change. J. Artif. Intell. Res. (JAIR) 44:1–96. In this paper we have described a UTC dependent, but sce- [Fox and Long 2006] Fox, M., and Long, D. 2006. Mod- nario and configuration independent planning system called elling mixed discrete-continuous domains for planning. J. MACOP. It supports the encoding of domains containing Art. Int. Res. (JAIR) 27:235–297. continuously changing processes, events and actions. We [Fox, Howey, and Long 2005] Fox, M.; Howey, R.; and presented a new approach for such problems areas, integrat- Long, D. 2005. Validating plans in the context of processes ing MPC and search space planning, by using MPC as a con- and exogenous events. In Veloso, M. M., and Kambhampati, trol heuristic for a discretised forward search space. We de- S., eds., AAAI, 1151–1156. AAAI Press / The MIT Press. scribed the implementation and performance of our MACOP hybrid algorithm, when tested on a network of connected [Guo, Gang, and Zhang 2014] Guo, C.; Gang, X.; and roads. While our implementation is not yet competitive with Zhang, M. 2014. Model predictive control implementation state of the art hybrid planners, we have used the application and simulation for urban traffic networks. In Proceedings of to UTC to show the feasibility and promise of this partticu- 2014 IEEE International Conference on Service Operations lar type of hybrid integration. Future work includes integrat- and Logistics, and Informatics, 334–340. ing state-of-the-art heuristics into the search space planning [Hoffmann 2003] Hoffmann, J. 2003. The Metric-FF Plan- process to improve the performance, stability and robustness ning System: Translating “Ignoring Delete Lists” to Nu- of the planner. Also, a more efficient optimiser needs to be meric State Variables. J. Art. Int. Res. (JAIR) 20:291–341. employed to improve the robustness to larger networks of [Jimoh 2015] Jimoh, F. 2015. A synthesis of automated plan- constraints, rather than the simple solver that was employed ning and model predictive control techniques and its use in in the current implementation. solving urban traffic control problem. [Li and Williams 2008] Li, H., and Williams, B. 2008. Gen- References erative systems for hybrid planning based on flow tubes. [Al-Gherwi, Budman, and Elkamel 2011] Al-Gherwi, W.; In Proc. 18th Int. Conf. on Aut. Planning and Scheduling Budman, H.; and Elkamel, A. 2011. A robust distributed (ICAPS). model predictive control algorithm. Journal of Process [Löhr et al. 2012] Löhr, J.; Eyerich, P.; Keller, T.; and Nebel, Control 21(8):1127–1137. B. 2012. A planning based framework for controlling hybrid [Audemard et al. 2002] Audemard, G.; Bertoli, P.; Cimatti, systems. In ICAPS. A.; Kornilowicz, A.; and Sebastiani, R. 2002. A SAT-based [McDermott 2003] McDermott, D. 2003. Reasoning about approach for solving formulas over boolean and linear math- Autonomous Processes in an Estimated Regression Planner. ematical propositions. In Proc. 18th Int. Conf. on Automated In Proc. 13th Int. Conf. on Aut. Planning and Scheduling Deduction, volume 2392, 193–208. Springer-Verlag, LNAI (ICAPS). Series. [Penberthy and Weld 1994] Penberthy, S., and Weld, D. [Bennett 1993] Bennett, S. 1993. A History of Control Engi- 1994. Temporal Planning with Continuous Change. In Proc. 12th Nat. Conf. on AI (AAAI), 1010–1015. AAAI/MIT Press. [Penna et al. 2010] Penna, G. D.; Intrigila, B.; Magazzeni, D.; and Mercorio, F. 2010. A pddl+ benchmark problem: The batch chemical plant. In ICAPS’10, 222–225. [Piacentini et al. 2013] Piacentini, C.; Alimisis, V.; Fox, M.; and Long, D. 2013. Combining a temporal planner with an external solver for the power balancing problem in an electricity network. In ICAPS. [Shin and Davis 2005] Shin, J., and Davis, E. 2005. Pro- cesses and Continuous Change in a SAT-based Planner. Art. Int. (AIJ) 166:194–253. [Tay 2007] Tay, M. 2007. Model predictive cost control. Control Engineering 54(8):IE9. [Veselý, Rosinov, and Foltin 2010] Veselý, V.; Rosinov, D.; and Foltin, M. 2010. Robust model predictive control design with input constraints. ISA Transactions 49(1):114–120.