Urban Traffic Control Assisted by AI Planning and Relational Learning Alberto Pozanco and Susana Fernández and Daniel Borrajo Departamento de Informática, Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganes (Madrid). Spain apozanco@pa.uc3m.es, sfarregu@inf.uc3m.es, dborrajo@ia.uc3m.es Abstract a street density is going to be high in the near future. In those cases, our system anticipates future problems by gener- Urban Traffic Control is a key problem for most big ating new goals to the planning module and starts a planning- cities. An inefficient traffic control system can lead execution-monitoring process. The proposed system can be to increased traffic congestions that degrade city seen as an instance of a full autonomic (autonomous) sys- quality metrics such as average travel time or city tem, given that it incorporates many self-* properties, as pollution. Most common approaches focus on con- self-monitoring (continuous observation), self-diagnosis (de- trolling traffic by appropriately setting traffic lights. tects undesired behavior), self-optimization (planning), self- Current systems in operation range from static con- healing (executes actions) and self-adaptation (learning). trol of traffic light phases to adaptive systems based The paper is organized as follows: the next section de- on numeric models. In this paper, we propose an au- scribes the system architecture that integrates learning with tonomic approach based on declarative automated AP; the third section formally defines AP tasks and describes planning to generate control plans only when the the traffic-control domain; the fourth section briefly describes default behavior should be overridden. Planning the learning system; the fifth section presents the experimen- is complemented with plan execution control and tal results; and the last section draws conclusions and outlines monitoring, replanning, as well as self-adaptive be- future work. havior using Relational Learning. Learning is used to anticipate the appearance of congestions and cor- rectly solve them. Our system outperforms static 2 Architecture approaches as well as a planning-based system that We propose to use a planning-execution-monitoring architec- recently won a competition on autonomic behavior ture called PELEA to provide a framework that can integrate in Urban Traffic Control. the various components of our system [Guzmán et al., 2012]. Figure 1 shows a sketch of the architecture. At start, the Ex- ecution module receives an AP domain and problem. Then, 1 Introduction it captures the current state of the world, state, and sets the Traffic efficient management and control in urban networks is problem initial state. The initial goal set could be also set an important challenge for city authorities. They usually want by the Goal&Metrics Generation module. The Monitoring to achieve a variety of policy-based objectives, such as re- module calls the Planning module to obtain a plan whose ducing atmospheric pollution or mitigating the effects of un- actions are sent back to the Execution module. Once the ac- expected situations like accidents or road closure. There are tions are executed, the Monitoring module receives the nec- many ways to set the traffic lights programs, ranging from essary knowledge (current state, problem and domain) from early static off-line approaches, to most recent adaptive ap- the Execution module to initialize a new planning-execution- proaches that change the programs according to the state of monitoring cycle. If the execution did not produce the ex- the city. The reader is directed to surveys in the area [Papa- pected changes (reduction in traffic density in some streets), georgiou et al., 2007; Hamilton et al., 2013]. it will result in the generation of new goals and a new initial From a centralized perspective, Automated Planning (AP) state for a new call to the planner. The Goal&Metrics Gen- has been recently shown to perform well in this kind of eration module combines these goals with possible external tasks [Gulic̀ et al., 2015; Vallati et al., 2016]. The main ad- ones (as the ones given directly by traffic controllers) to up- vantage of using AP is that the domain and problem descrip- date the problem. The environment can be substituted by a tions are specified in a declarative language. Thus, even traf- Simulator in some domains, as the one we focus in this paper. fic engineers can easily include new actions, sensor informa- One of the greatest challenges in the proposed architecture tion or metrics. Also, these models can be automatically up- is the generation of new goals. Here, we propose to apply ma- dated by using learning techniques. In this paper we propose chine learning techniques to infer when new goals should be an approach that integrates a planning system for control- generated to anticipate future problematic streets. In a train- ling traffic lights with a learning system that predicts when ing step, examples are generated by observing the traffic be- Inputs Training Examples SIMULATOR/ plan problem, domain ENVIRONMENT EXECUTION state LEARNING domain plan problem state problem state GOALS&METRICS state MONITORING Learned GENERATION Model new goals problem domain plan External goals problem PLANNING Figure 1: Planning and execution architecture that includes learning capabilities. havior during some time periods, under different traffic con- entering the city is going to be in the following time steps). ditions. Then, a learning algorithm can generate a model from There have mainly been two ways to handle uncertainty. In those examples, such that given any new state it returns new the first type of models, uncertainty is represented explicitly goals. We are assuming here that the learning process is per- in the planning model and planners reason with those stochas- formed off-line, prior to the actual use of the AP-based sys- tic models [Bonet and Geffner, 2005]. In the second, planners tem, but it could also be done on-line. The following section reason with deterministic world models and when execution formally defines AP tasks and describes the Urban Traffic of some actions fails, the agent replans [Yoon et al., 2007]. In Control (UTC) domain we are using on this work. this paper, we will use the second alternative given that, from a practical perspective, it is good enough for the domain we 3 Planning Tasks are focusing on. In order to represent planning tasks compactly, the AP com- 4 Learning Traffic Behavior munity uses the standard language PDDL (Planning Domain Description Language) [Fox and Long, 2003]. Most planners In this section we define the task of learning when goals will automatically generate an instantiated planning task from the arrive; that is, predicting the density level of the streets so PDDL declarative description of a domain D and a problem we can anticipate their congestion, generating the appropiate P . The domain defines the predicates for representing states goals for the planner. We formulate this problem as a time and the actions that agents can perform. Figure 2 shows an series prediction one, using Relational Learning in this case. example of an action in the domain definition. The problem Relational Learning is a Machine Learning technique that can describes the task to be solved at each reasoning step; i.e., the capture the correlations between connected elements. In our objects involved (e.g., streets, traffic lights), the initial state case, we conjecture that the structured layout of a city can in- and the set of goals to achieve. Figure 3 shows a subset of a fluence the density levels of some streets based on the ones problem definition. The planner will receive both the domain that are connected to some others. Thus, it is a relational do- and the problem files as input and it will try to find a solu- main. Relational Learning also suits AP, because it allows in- tion plan for the given problem. In this case, the output of the duction over structured examples that can include first-order planner will be a set of actions to be performed over the traf- logical representations, like the ones used in PDDL. fic lights, such that these actions override the default control program for a certain time period. If the planner has solved 4.1 Representation the congestion at the next reasoning step, the default program The representation is based on a subset of the predicates we will take the control again. Otherwise, the next actions of the use in the planning traffic domain. In order to represent the previously generated plan are executed. time steps, we modify some of these predicates, adding the This planning model assumes the world is deterministic corresponding time steps. The predicates used for the learning and the agent has full observability, among other assump- task are shown in Table 1. tions. In most real-world environments, this is not the case. We distinguish two types of predicates: the static and the Actions have stochastic outcomes (the traffic density is not dynamic ones. The static part of the city is represented by the always reduced in the same way when setting a longer green connection predicate, that indicates that a vehicle can move phase in a traffic light), and agents have partial observabil- from one street section to another. All the connection predi- ity (they do not know what the density due to new vehicles cates together represent the entire city network. The dynamic (:action hm-green-to-all-ways :parameters (?t - traffic-light ?c - crossing ?sin - street ?sout1 - street ?sout2 - street ?sout3 - street) :precondition (and (goes-into ?sin ?c) (goes-out ?sout1 ?c) (traffic-lights-from-street ?t ?c ?sin) (not (opposite-direction ?sin ?sout1)) (densityLevel ?sout1 moderate)...) :effect (and (not (state-to-street ?t ?sout1 red)) (densityLevel ?sin low)...) Figure 2: Part of an example description of a PDDL action. (define (problem traffic1) (:domain traffic) (:objects s1 ... s566 - street c1 ... c30 - crossing tl1 ... tl10 - traffic-light) (:init (goes-into s1 c3) (opposite-directions s5 s7) (state-from-street tl1 s7 green) (densityLevel s1 high)...) (:goal (and (densityLevel s4 low) (densityLevel s35 low) ...))) Figure 3: Part of an example PDDL problem file. Predicate Type where A represents the example id and the other letters the density(st,l) Dynamic predicates’ arguments (B is the street whose density level, connection(st,st) Static C, we want to predict). A minus symbol predating a vari- openX(tl,st) Dynamic able means that it is new in the tree, while when the variable densityLX(st) Dynamic appears alone, it has to be referenced before. The classes to predict appear in the leaf nodes of the tree between brackets. Table 1: Predicates used in the learning task. X represents the For example, in the model shown in Figure 4, a high den- time step. L represents the density level. sity would be predicted for a street B in two cases: (1) if its density was low two time steps ago, but there exists another street D connected to B whose density was high three time part of the city is formed by the state of the traffic lights and steps ago and was not low in the last time step; and (2) if its the density of the streets. The openX(tl,st) predicate repre- density was not low neither two time steps ago nor one time sents a green traffic light tl located at street st at time step X. step ago. In our approach, X can take the values from one to three (X previous time steps, or time windows), but it is a parameter that can be modified to extend or reduce the prediction hori- zon. The densityLX(st) predicate indicates that a street st has a density(-A,-B,-C) density level L at time step X. L can take the values veryhigh, densityLow2(A,B)? high, moderate, low and verylow. The last predicate of each +-yes: densityHigh3(A,-D)? example, density(st,l), represents the current density level l of +-yes: connection(A,B,D)? the street st. This will represent the class of each example. +-yes: densityLow1(A,D)? +-yes:[low] 4.2 Algorithms +-no:[high] We are using T ILDE [Blockeel and De Raedt, 1998] to learn +-no:[low] relational decision trees. It receives two files as input: the set- +-no:[low] tings file, where the user can specify the algorithm parame- +-no: densityLow1(A,B)? ters, as well as defining the predicates and classes; and the +-yes: [low] knowledge base file, where both the training and test data +-no: [high] are included. The output of the learning algorithm is a file containing the resulting relational tree and its translation into Figure 4: Example of T ILDE output. rules. It also contains the confusion matrix for the training and test sets. An example output of T ILDE is shown in Figure 4, 5 Experiments and results 7 On this work we use SUMO [Behrisch et al., 2011], an open 6 source traffic simulator developed by the German Aerospace Center (DLR). It allows to import or generate not only road 5 networks, but also traffic demand. And it also allows users Miles of vehicles to define traffic lights control programs. We want to test first 4 if we are able to build a model to predict the appearance of goals in advance, and then we try to apply the created model 3 to several urban traffic control scenarios. 2 5.1 Results on Learning Goals Week Weekend 1 We are using a real city network in our learning experiments; a grid-like section of Houston downtown, shown in Figure 5. 0 00:00 05:00 10:00 15:00 20:00 It is composed of 35 junctions, 140 traffic lights and 164 street Hours sections. We have selected five particular street sections to learn from (A to E). We chose these city points due to their Figure 6: Summary of the generated traffic flows on weekdays different traffic characteristics. C and D are street sections and weekends. The y axis represents the number of vehicles close to a Job Center. B is a point between the Job Center and that enter the network at each hour, in thousands, and the x the main exit of the city. E represents a street section far from axis represents the hours. the main traffic, while A is a random point with no specific features. Data is collected every five minutes for the learning task, which means 2013 instances for the whole week. Five min- utes is what we call “time step”, the sample frequency. We have chosen this sample time as we want to collect traffic data from an entire week, and, at the same time, we want to keep a not very high number of instances so that T ILDE is able to handle them. In our experimental setting, a step in the simu- lation corresponds to a second. Each instance stores the static part of the city previously described, as well as the dynamic component of the state in the last three time steps. We learn one relational model for each street section shown in Figure 5, and then we test with data of the other street sections. We have also varied the density levels, both in the classes to predict and the predicates used on each instance. We have used two approaches. One is based on five density levels: veryhigh, high, moderate, low and verylow. A second version uses only two: high and low. All the generated models are pre-pruned, limiting the creation of new branches when the node has less than 10 instances. In the first experiment, we generated five different models using data from the five selected street sections and the five Figure 5: Benchmark network in SUMO. Models are created density levels approach. And we tested these models in the for points A, B, C, D and E. We assume that a Job Center is five street sections to check accuracy and generality of the located on D. F corresponds to the main exit point of the city. learned models. The results for this first configuration are on Table 2. We have also defined a traffic demand that tries to emulate We can observe that the accuracy is similar for all the street the real traffic flow of a city for an entire week. So, we define sections except for B, whose behaviour seems to be more dif- lower vehicles traffic at night, more traffic at rush hours, and ficult to predict. A and E, the two points away from downtown higher traffic during week days than in the weekend. The Job and the Job Center, present a similar behaviour as expected. Center is included, where most of the cars want to go dur- In the second experiment, the problem is simplified with ing the work hours and also a main exit point, to go out of only two density levels both for the class and the state predi- the city at the end of the workday. The rest of the routes are cates. The results for this last configuration are on Table 3. randomly generated. The vehicles may enter the city by any We can observe that as we decrease the number of den- street section and can finish their trip in an inner (parking, sity levels, the complexity of the problem decreases too and mall, office...) or outer point of the network. A summary of the prediction task becomes easier. With only two levels, the the full traffic demand specification is shown in Figure 6. density of a street knowing the state of the city in the last time A B C D E whose their corresponding street density is currently high. A 0.90 0.68 0.85 0.77 0.83 We also compare with the AP approach proposed in [Gulic̀ B 0.82 0.72 0.79 0.77 0.80 et al., 2015], co-winner of the ARTS-COST competition on C 0.83 0.66 0.88 0.77 0.81 Increasing the resilience of road traffic support systems by the D 0.80 0.66 0.83 0.85 0.81 use of autonomics1 . That planning system does not have any E 0.87 0.66 0.85 0.78 0.89 learning component and only calls the planner when a vehicle has been stopped for a long time. We will call it Planning. Table 2: Accuracy results using the model obtained with five This system is the starting point of our approach, so we use density levels. Each cell (i, j) represents the estimated accu- the same planning domain and planner, LAMA [Richter and racy of learning a model with the data extracted at point i in Westphal, 2010]. The last system we introduce in the tests the city and testing that model against the data collected at combines the Planning approach and the Learning one. point j. It calls the planner when a goal (high density) is predicted or the current density of a street is high. We will refer to it as A B C D E Combined. A 0.99 0.94 0.99 0.97 0.99 We use the following metrics to measure the performance B 0.99 0.95 0.99 0.98 0.99 of each system: the number of steps it takes all cars to reach C 0.96 0.93 0.99 0.97 0.99 their destination; the total amount of C02 emitted by the vehi- D 0.96 0.93 0.99 0.98 0.99 cles; the average waiting time (AWT); the average travel time E 0.96 0.93 0.99 0.97 0.99 (ATT); and, if it applies, the number of planner executions (PE) and the mean planner execution time (MPE). We choose Table 3: Accuracy results using two density levels for the them simply for comparison, none of the systems explicitly class and the predicates. Each cell (i, j) represents the esti- reasons on optimizing these metrics. mated accuracy of learning a model with the data extracted at point i in the city and testing that model against the data Experiments in a Medium-Sized City Network collected at point j. We created a fluent traffic scenario for the first experiment by introducing 5300 cars in 3600 steps in the same city network we used in the learning goals experiments. The simulation steps can be predicted with a high accuracy, even in street sec- finishes if all cars reach their destination, or after 5000 steps. tions that have very different behavior. The final model that The results are shown in Table 4. We can see that there is will be used in our architecture corresponds to the one learned no substantial difference when the traffic is fluid among the with the data of point B, which on average performs best. The different systems. But the Learning approach outperforms relational tree was shown in Figure 4. the others on most metrics. So, when the traffic is fluent, one 5.2 Results on Traffic Management expects that even the Static control program will perform well. In this traffic situation, the time spent on average per Finally, we want to test whether a traffic control system would vehicle in a traffic light (AWT) is approximately half of the improve its performance if it had some predictive model of total time spent in their complete travel (ATT). Given the size the traffic. To do so, we will use several simulation scenarios of the example network, ATT is around three minutes, while where we vary the size of the network (medium and large), AWT is around a minute and a half. The number of plan- the fluency of traffic (fluent or congested) and the evaluated ner executions is low in the Planning and Learning sys- time period (an hour and a day). tems, and it becomes very high when using the Combined When using the learned model, it predicts the density at approach. The number of times it calls the planner is much each street at each time step, using the previous X time steps higher than in the two other approaches, as expected. as input. If it detects a high density at any subset of the street sections, it generates goals to lower the density of those Steps C02 AWT ATT PE MPE street sections. These new goals, together with the current Static 3969 1103 93 172 state of the traffic, create a PDDL planning problem that is Reactive 4059 1137 100 181 given as input to the planner. Therefore, the system is pre- Planning 4070 1117 95 175 22 10 dicting the appearance of goals in the next X time steps, and Learning 3881 1090 88 167 15 10 the planning process can anticipate to the congestions. We Combined 4104 1193 115 197 61 10 will call this new approach Learning. In [Pozanco et al., 2016], we show that if the system uses a short-horizon predic- Table 4: Performance of the different control systems with tion, having the same time steps for both building the model a fluent traffic situation in a medium-sized city. Steps, AWT and checking for goals is not that important. So, our system and ATT are given in steps (seconds), while C02 is in kg. checks for new goals every fifty seconds using the predic- MPE is in seconds. tion model built with the five minutes time step previously described. In the second experiment, we test the systems performance We compare our system with a Static one, that corre- on a very congested traffic scenario using the same city net- sponds to the default system used by SUMO. We also com- work. It was created by introducing 6000 cars in one hour pare our approach with a Reactive system, that acts locally 1 on each traffic light and sets a longer green phase on those https://helios.hud.ac.uk/cost/comp2.php (3600 steps). The results are reported in Table 5. The columns report the same metrics as the one before. Steps C02 AWT ATT PE MPE Static - 2553 582 638 Reactive 4106 1262 119 202 Planning - 2187 435 506 48 11 Learning 4070 1265 121 204 46 10 Combined 4244 1301 128 212 68 11 Table 5: Performance of the different control systems with a very congested traffic situation in a medium-sized city. Steps, AWT and ATT are given in steps (seconds), while C02 is in kg. MPE is in seconds. As we can see, even if the Planning approach out- performs the Static system, it performs worse than the Reactive mechanism and the two other autonomic ap- proaches. Both Learning and Combined can completely solve the traffic congestion. The vehicles spend much more time waiting on average than travelling in this scenario (rela- tion between ATT and AWT). However, the Learning sys- tem is able to reduce the waiting time to half of the travel time, as in a fluent traffic situation. Thus, it is effectively con- Figure 7: Large city network used in the second type of ex- verting a congested situation into a fluent traffic scenario. The periments. reduction of the pollution achieved by Learning is quite substantial too: half of the C02 levels of the static approach. In fact, they are close to those generated in a fluent traffic sce- is able to generalize to this larger city. Our system scales nario. Reactive obtains practically the same results than quite well even in a large network; it can find a plan in less the Learning approach, even if it only acts locally at each than fifty seconds, the checking-for-goals sample period. The traffic light without considering the whole network. performance of Planning is quite good in this case and it almost solves the congestion. Thus, this only-planning ap- Dense Traffic in a Large Size City Network proach works well when we have a reasonably high traffic This experiment tests the scalability of the proposed model to density (as in this experiment or in the first one), but not too larger city networks. The benchmark network in this case is high (as in the previous experiment). The Reactive method composed of 130 junctions, 520 traffic lights and 566 streets. does not scale up well to the large city network. When trying This can be considered as a large network in relation to most to locally reduce the congestion, it ends up generating traffic papers in the field, specially considering that our approaches jams and performing even worse than the default, Static. perform centralized planning. The network is shown in Fig- ure 7. We introduce 13,000 cars in one hour in order to create Full day experiment a dense traffic situation. As the city is bigger than the previous The last experiment focuses not only on trying to handle a one, a experiment will finish when all cars reach their desti- traffic peak, but also to test whether a system can deal with a nation or after 6,000 time steps. Table 6 reports the results. full day traffic flow. In these cases, the decisions spread over time. We use the medium-sized city network and a traffic de- Steps C02 AWT ATT PE MPE mand specification similar to the one presented on Figure 6 Static - 6649 439 549 for the week days. In this experiment we only measure the Reactive - 7676 605 709 AWT per hour. The other metrics could be irrelevant for the Planning - 5520 341 468 50 46 24 hours case. The results are reported on Figure 8. Vehi- Learning 5837 5231 321 445 47 44 cles routes remain static in SUMO. A car will always try to Combined - 6279 518 633 64 54 reach its destination following the shortest path. If this route is congested, the vehicle will not choose another one, but it Table 6: Performance of the different control systems with will stand still waiting for the route to be free. That is the a dense traffic situation in a large-sized city network. Steps, reason why, when using some systems, the network can get AWT and ATT are given in steps (seconds), while C02 is in congested at some time point and become congested for the kg. MPE is in seconds. whole day. We can see this effect when a given curve in the graphic reaches 200 s. When using this metric, a traffic sys- In this case, Learning outperforms the rest and it is the tem performs better if the area under its curve is smaller. only one that can finish the simulation before 6,000 steps. As we can see, only our Learning system is able to fin- The model we learned with the medium-sized urban network ish the simulation properly. The AWT grows up in the morn- 200 traffic light. A weak point of these systems is that they cannot predict incidents and they do not deal well with them. Also, 180 their models are not defined declaratively. Thus, our models are easier to update with new types of information, or new 160 metrics to be taken into account when optimizing. Other AI-related approaches have appeared in recent years. 140 Learning Static The main goal is to build semi- or fully autonomous systems AWT Planning 120 Reactive Combined with little human assistance. Most of them address traffic management from a multi-agent perspective. A single agent 100 acts over a single junction or subset of junctions and then several agents collaborate, discuss and negotiate with the 80 rest [Ossowski et al., 1998]. In [Box and Waterson, 2012], the authors propose a model based on logistic regression and 60 neural networks to learn over time how to better control the 00:00 05:00 10:00 15:00 20:00 Hour traffic signals. Other approaches focus on multi-agent rein- forcement learning [Kuyer et al., 2008], distributed geomet- Figure 8: Average waiting time in the city network per hour. ric fuzzy systems [Gokulan and Srinivasan, 2010] or creat- ing a multi-agent model predictive control [de Oliveira and Camponogara, 2010]. New approaches for efficient UTC are ing, when the cars go to the Job Center, but it does not get arising in the last years using vehicle communication as the fully congested. The AWT remains around 80 s throughout core of the control process [Ferreira et al., 2010]. But, these the morning and it starts growing again by the end of the methods are still far from being implemented in real cities and workday. The metric reaches a peak around 18:00 where the controlling traffic lights remains the most widespread way to AWT is 103 seconds at the most congested traffic situation handle urban traffic. of the day, which is still a reasonable behavior. After that time period, the system is able to reduce the congestion and 7 Conclusions and Future work the AWT starts to decrease. The Reactive system, which showed good performance in the medium-sized city network, In this paper we have presented a dynamic approach for can solve the early morning traffic problem. It obtains similar UTC based on Automated Planning and Relational Learn- results to the ones of Learning until the end of the work- ing. As we have shown, by adding a learning component day. However, it cannot deal correctly with the end of the day that can predict the city state to a planning system, we can traffic. The other systems can not face the morning rush hour. highly increase its autonomy. It can automatically generate Even if the Planning system is still better than the other its own goals, in addittion to letting the planner starts the two, it does not solve the congestion. planning process sooner. We have tested our model in sev- eral traffic control scenarios, showing that the ability to an- ticipate goals can lead to better control performance than us- 6 Related work ing only static traffic lights programns. Our system also out- The first UTC models in the 1950s and 1960s, were based performs the Planning system and overcomes its limita- on fixed-time traffic lights control mechanisms. Actions were tions, as Planning needs to know when a vehicle has been predefined following an off-line optimization using historical stopped for a long time. Instead, our model only needs the data of demand levels. TRANSYT [Robertson, 1969] is one of street density levels, which are easier to obtain from current the most well developed and widely used control systems that sensor systems. By just knowing density levels, we are able uses these techniques. These approaches could even generate to model a wide variety of circumstances that affect traffic “green waves”, simple coordination of neighbouring traffic behavior such as adverse weather conditions or different days lights in order to increase the traffic fluidity. The problem of and hours. Also, since other types of incidents (e.g., road- early systems is that they can age rapidly due to the continu- blocking or big accidents) indirectly affect the density levels, ous evolution of the traffic flows in a city. The benefits may we believe our approach could also work to alleviate conges- be lost in some years if the control plans are not updated. Our tions caused by them. proposed system overcomes this situation, as it not only can In future work, we would like to integrate the ability to react to the current traffic scenario, but it can anticipate and learn how to anticipate goals with externally supplied goals adapt to future ones. (e.g., by traffic controllers), reactively generated ones (e.g., In the last years, the use of new and better sensor systems reactively generating goals), or internally supplied ones (e.g., has allowed engineers to implement traffic-responsive sys- generated by internal motivations of the system). Although tems that use the data provided by the detectors in an on-line the proposed system scales up, we would also like to apply way. These techniques range from centralized approaches, as a multi-agent approach by dividing the city in sections in SCOOT [Bretherton et al., 1998] and SCATS [Lowrie, 1990] which an agent can apply the system in an autonomous way. to distributed ones as UTOPIA[Donati et al., 1984]. As most We think this could lead to similar performance with lower other traffic-responsive systems, they use a mathematical execution times. We would also like to compare our system framework to compute the optimal time allocation of each with other state of the art methods on traffic control, such as model predictive control (e.g., SCOOT), or other AI-based ap- [Gulic̀ et al., 2015] Matija Gulic̀, Ricardo Olivares, and proaches (e.g., reinforcement learning). Finally, we want to Daniel Borrajo. Using automated planning for traffic sig- test the proposed system in irregular city networks such as nals control. In Working Notes of ARTS-COST 2nd com- European ones and build the learning model on-line in order petition, 2015. to show the system’s real-world applicability. [Guzmán et al., 2012] César Guzmán, Vidal Alcázar, David Prior, Eva Onaindı́a, Daniel Borrajo, Juan Fdez-Olivares, Acknowledgements and Ezequiel Quintero. PELEA: a domain-independent This work has been partially supported by MINECO project architecture for planning, execution and learning. In Pro- TIN2014-55637-C2-1-R. ceedings of ICAPS’12 Scheduling and Planning Applica- tions woRKshop (SPARK), pages 38–45, Atibaia (Brazil), 2012. AAAI Press. References [Hamilton et al., 2013] Andrew Hamilton, Ben Waterson, [Behrisch et al., 2011] Michael Behrisch, Laura Bieker, Tom Cherrett, Andrew Robinson, and Ian Snell. The Jakob Erdmann, and Daniel Krajzewicz. Sumo–simulation evolution of urban traffic control: changing policy and of urban mobility. In The Third International Confer- technology. Transportation planning and technology, ence on Advances in System Simulation (SIMUL 2011), 36(1):24–43, 2013. Barcelona, Spain, 2011. [Kuyer et al., 2008] Lior Kuyer, Shimon Whiteson, Bram [Blockeel and De Raedt, 1998] Hendrik Blockeel and Luc Bakker, and Nikos Vlassis. Multiagent reinforcement De Raedt. Top-down induction of first-order logical de- learning for urban traffic control using coordination cision trees. Artificial intelligence, 101(1):285–297, 1998. graphs. In Machine learning and knowledge discovery in [Bonet and Geffner, 2005] Blai Bonet and Héctor Geffner. databases, pages 656–671. Springer, 2008. mGPT: A probabilistic planner based on heuristic search. [Lowrie, 1990] PR Lowrie. Scats, sydney co-ordinated adap- JAIR, 24:933–944, 12 2005. tive traffic system: A traffic responsive method of control- [Box and Waterson, 2012] Simon Box and Ben Waterson. ling urban traffic. 1990. An automated signalized junction controller that learns [Ossowski et al., 1998] Sascha Ossowski, José Cuena, and strategies from a human expert. Engineering applications Ana Garcı́a-Serrano. A case of multiagent decision sup- of artificial intelligence, 25(1):107–118, 2012. port: Using autonomous agents for urban traffic control. In [Bretherton et al., 1998] R. Bretherton, K. Wood, and G.T. Progress in Artificial Intelligence—IBERAMIA 98, pages Bowen. Scoot version 4. In Proceedings of 9th Inter- 100–111. Springer, 1998. national Conference on Road Transport Information and [Papageorgiou et al., 2007] M Papageorgiou, M Ben-Akiva, Control, 1998. Jon Bottom, Piet HL Bovy, SP Hoogendoorn, Nick B [de Oliveira and Camponogara, 2010] Lucas Barcelos Hounsell, Apostolos Kotsialos, and M McDonald. Its and de Oliveira and Eduardo Camponogara. Multi-agent traffic management. Handbooks in Operations Research model predictive control of signaling split in urban traffic and Management Science, 14:715–774, 2007. networks. Transportation Research Part C: Emerging [Pozanco et al., 2016] Alberto Pozanco, Susana Fernández, Technologies, 18(1):120–139, 2010. and Daniel Borrajo. On learning planning goals for traffic [Donati et al., 1984] F Donati, Vito Mauro, G Roncolini, and control. In 4th Workshop on Goal Reasoning (IJCAI’16), M Vallauri. A hierarchical decentralized traffic light con- 2016. trol system. the first realisation ”progetto torino”. In [Richter and Westphal, 2010] Silvia Richter and Matthias Proceedings of the 9th World Congress of the Interna- Westphal. The lama planner: Guiding cost-based anytime tional Federation of Automotive Control, pages 2853– planning with landmarks. Journal of Artificial Intelligence 2858, 1984. Research, 39(1):127–177, 2010. [Ferreira et al., 2010] Michel Ferreira, Ricardo Fernandes, [Robertson, 1969] Dennis I Robertson. Transyt: a traffic net- Hugo Conceição, Wantanee Viriyasitavat, and Ozan K work study tool. 1969. Tonguz. Self-organized traffic control. In Proceedings of [Vallati et al., 2016] M. Vallati, D. Magazzeni, B. De Schut- the seventh ACM international workshop on VehiculAr In- ter, L. Chrpa, and T.L. McCluskey. Efficient macroscopic terNETworking, pages 85–90. ACM, 2010. urban traffic models for reducing congestion: a pddl+ plan- [Fox and Long, 2003] Maria Fox and Derek Long. ning approach. In Proceedings of the Thirtieth AAAI Con- PDDL2.1: An extension to PDDL for expressing ference on Artificial Intelligence (AAAI-16), 2016. temporal planning domains. Journal of AI Research, [Yoon et al., 2007] Sungwook Yoon, Alan Fern, and Robert 20:61–124, 2003. Givan. FF-replan: A baseline for probabilistic planning. In [Gokulan and Srinivasan, 2010] Balaji Parasumanna Goku- ICAPS, pages 352–360, 2007. lan and Dipti Srinivasan. Distributed geometric fuzzy mul- tiagent urban traffic signal control. Intelligent Transporta- tion Systems, IEEE Transactions on, 11(3):714–727, 2010.