Urban Traffic Control Assisted by AI Planning and Relational Learning

                         Alberto Pozanco and Susana Fernández and Daniel Borrajo
                         Departamento de Informática, Universidad Carlos III de Madrid
                           Avda. de la Universidad, 30. 28911 Leganes (Madrid). Spain
                       apozanco@pa.uc3m.es, sfarregu@inf.uc3m.es, dborrajo@ia.uc3m.es


                           Abstract                               a street density is going to be high in the near future. In
                                                                  those cases, our system anticipates future problems by gener-
     Urban Traffic Control is a key problem for most big          ating new goals to the planning module and starts a planning-
     cities. An inefficient traffic control system can lead       execution-monitoring process. The proposed system can be
     to increased traffic congestions that degrade city           seen as an instance of a full autonomic (autonomous) sys-
     quality metrics such as average travel time or city          tem, given that it incorporates many self-* properties, as
     pollution. Most common approaches focus on con-              self-monitoring (continuous observation), self-diagnosis (de-
     trolling traffic by appropriately setting traffic lights.    tects undesired behavior), self-optimization (planning), self-
     Current systems in operation range from static con-          healing (executes actions) and self-adaptation (learning).
     trol of traffic light phases to adaptive systems based          The paper is organized as follows: the next section de-
     on numeric models. In this paper, we propose an au-          scribes the system architecture that integrates learning with
     tonomic approach based on declarative automated              AP; the third section formally defines AP tasks and describes
     planning to generate control plans only when the             the traffic-control domain; the fourth section briefly describes
     default behavior should be overridden. Planning              the learning system; the fifth section presents the experimen-
     is complemented with plan execution control and              tal results; and the last section draws conclusions and outlines
     monitoring, replanning, as well as self-adaptive be-         future work.
     havior using Relational Learning. Learning is used
     to anticipate the appearance of congestions and cor-
     rectly solve them. Our system outperforms static
                                                                  2   Architecture
     approaches as well as a planning-based system that           We propose to use a planning-execution-monitoring architec-
     recently won a competition on autonomic behavior             ture called PELEA to provide a framework that can integrate
     in Urban Traffic Control.                                    the various components of our system [Guzmán et al., 2012].
                                                                  Figure 1 shows a sketch of the architecture. At start, the Ex-
                                                                  ecution module receives an AP domain and problem. Then,
1   Introduction                                                  it captures the current state of the world, state, and sets the
Traffic efficient management and control in urban networks is     problem initial state. The initial goal set could be also set
an important challenge for city authorities. They usually want    by the Goal&Metrics Generation module. The Monitoring
to achieve a variety of policy-based objectives, such as re-      module calls the Planning module to obtain a plan whose
ducing atmospheric pollution or mitigating the effects of un-     actions are sent back to the Execution module. Once the ac-
expected situations like accidents or road closure. There are     tions are executed, the Monitoring module receives the nec-
many ways to set the traffic lights programs, ranging from        essary knowledge (current state, problem and domain) from
early static off-line approaches, to most recent adaptive ap-     the Execution module to initialize a new planning-execution-
proaches that change the programs according to the state of       monitoring cycle. If the execution did not produce the ex-
the city. The reader is directed to surveys in the area [Papa-    pected changes (reduction in traffic density in some streets),
georgiou et al., 2007; Hamilton et al., 2013].                    it will result in the generation of new goals and a new initial
   From a centralized perspective, Automated Planning (AP)        state for a new call to the planner. The Goal&Metrics Gen-
has been recently shown to perform well in this kind of           eration module combines these goals with possible external
tasks [Gulic̀ et al., 2015; Vallati et al., 2016]. The main ad-   ones (as the ones given directly by traffic controllers) to up-
vantage of using AP is that the domain and problem descrip-       date the problem. The environment can be substituted by a
tions are specified in a declarative language. Thus, even traf-   Simulator in some domains, as the one we focus in this paper.
fic engineers can easily include new actions, sensor informa-        One of the greatest challenges in the proposed architecture
tion or metrics. Also, these models can be automatically up-      is the generation of new goals. Here, we propose to apply ma-
dated by using learning techniques. In this paper we propose      chine learning techniques to infer when new goals should be
an approach that integrates a planning system for control-        generated to anticipate future problematic streets. In a train-
ling traffic lights with a learning system that predicts when     ing step, examples are generated by observing the traffic be-
                                                         Inputs

                           Training
                          Examples
                                                    SIMULATOR/
                                                                          plan                              problem, domain
                                                    ENVIRONMENT
                                                                                             EXECUTION
                                                                          state

                         LEARNING                                                  domain            plan
                                                                                   problem
                                                                                   state

                                                                          problem
                                        state      GOALS&METRICS           state         MONITORING
                            Learned
                                                     GENERATION
                            Model     new goals                            problem

                                                                                      domain        plan
                                  External goals                                     problem


                                                                                         PLANNING


                         Figure 1: Planning and execution architecture that includes learning capabilities.


havior during some time periods, under different traffic con-         entering the city is going to be in the following time steps).
ditions. Then, a learning algorithm can generate a model from         There have mainly been two ways to handle uncertainty. In
those examples, such that given any new state it returns new          the first type of models, uncertainty is represented explicitly
goals. We are assuming here that the learning process is per-         in the planning model and planners reason with those stochas-
formed off-line, prior to the actual use of the AP-based sys-         tic models [Bonet and Geffner, 2005]. In the second, planners
tem, but it could also be done on-line. The following section         reason with deterministic world models and when execution
formally defines AP tasks and describes the Urban Traffic             of some actions fails, the agent replans [Yoon et al., 2007]. In
Control (UTC) domain we are using on this work.                       this paper, we will use the second alternative given that, from
                                                                      a practical perspective, it is good enough for the domain we
3   Planning Tasks                                                    are focusing on.

In order to represent planning tasks compactly, the AP com-           4      Learning Traffic Behavior
munity uses the standard language PDDL (Planning Domain
Description Language) [Fox and Long, 2003]. Most planners             In this section we define the task of learning when goals will
automatically generate an instantiated planning task from the         arrive; that is, predicting the density level of the streets so
PDDL declarative description of a domain D and a problem              we can anticipate their congestion, generating the appropiate
P . The domain defines the predicates for representing states         goals for the planner. We formulate this problem as a time
and the actions that agents can perform. Figure 2 shows an            series prediction one, using Relational Learning in this case.
example of an action in the domain definition. The problem            Relational Learning is a Machine Learning technique that can
describes the task to be solved at each reasoning step; i.e., the     capture the correlations between connected elements. In our
objects involved (e.g., streets, traffic lights), the initial state   case, we conjecture that the structured layout of a city can in-
and the set of goals to achieve. Figure 3 shows a subset of a         fluence the density levels of some streets based on the ones
problem definition. The planner will receive both the domain          that are connected to some others. Thus, it is a relational do-
and the problem files as input and it will try to find a solu-        main. Relational Learning also suits AP, because it allows in-
tion plan for the given problem. In this case, the output of the      duction over structured examples that can include first-order
planner will be a set of actions to be performed over the traf-       logical representations, like the ones used in PDDL.
fic lights, such that these actions override the default control
program for a certain time period. If the planner has solved          4.1         Representation
the congestion at the next reasoning step, the default program        The representation is based on a subset of the predicates we
will take the control again. Otherwise, the next actions of the       use in the planning traffic domain. In order to represent the
previously generated plan are executed.                               time steps, we modify some of these predicates, adding the
   This planning model assumes the world is deterministic             corresponding time steps. The predicates used for the learning
and the agent has full observability, among other assump-             task are shown in Table 1.
tions. In most real-world environments, this is not the case.            We distinguish two types of predicates: the static and the
Actions have stochastic outcomes (the traffic density is not          dynamic ones. The static part of the city is represented by the
always reduced in the same way when setting a longer green            connection predicate, that indicates that a vehicle can move
phase in a traffic light), and agents have partial observabil-        from one street section to another. All the connection predi-
ity (they do not know what the density due to new vehicles            cates together represent the entire city network. The dynamic
                    (:action hm-green-to-all-ways
                     :parameters (?t - traffic-light ?c - crossing ?sin - street
                                  ?sout1 - street ?sout2 - street ?sout3 - street)
                     :precondition (and (goes-into ?sin ?c)
                                        (goes-out ?sout1 ?c)
                                        (traffic-lights-from-street ?t ?c ?sin)
                                        (not (opposite-direction ?sin ?sout1))
                                        (densityLevel ?sout1 moderate)...)
                     :effect (and (not (state-to-street ?t ?sout1 red))
                                  (densityLevel ?sin low)...)


                                     Figure 2: Part of an example description of a PDDL action.

                           (define (problem traffic1) (:domain traffic)
                             (:objects s1 ... s566 - street
                                       c1 ... c30 - crossing
                                       tl1 ... tl10 - traffic-light)
                             (:init (goes-into s1 c3)
                                    (opposite-directions s5 s7)
                                    (state-from-street tl1 s7 green)
                                    (densityLevel s1 high)...)
                             (:goal (and (densityLevel s4 low)
                                         (densityLevel s35 low) ...)))


                                          Figure 3: Part of an example PDDL problem file.

                    Predicate           Type                          where A represents the example id and the other letters the
                   density(st,l)       Dynamic                        predicates’ arguments (B is the street whose density level,
                 connection(st,st)      Static                        C, we want to predict). A minus symbol predating a vari-
                   openX(tl,st)        Dynamic                        able means that it is new in the tree, while when the variable
                  densityLX(st)        Dynamic                        appears alone, it has to be referenced before. The classes to
                                                                      predict appear in the leaf nodes of the tree between brackets.
Table 1: Predicates used in the learning task. X represents the       For example, in the model shown in Figure 4, a high den-
time step. L represents the density level.                            sity would be predicted for a street B in two cases: (1) if its
                                                                      density was low two time steps ago, but there exists another
                                                                      street D connected to B whose density was high three time
part of the city is formed by the state of the traffic lights and     steps ago and was not low in the last time step; and (2) if its
the density of the streets. The openX(tl,st) predicate repre-         density was not low neither two time steps ago nor one time
sents a green traffic light tl located at street st at time step X.   step ago.
In our approach, X can take the values from one to three (X
previous time steps, or time windows), but it is a parameter
that can be modified to extend or reduce the prediction hori-
zon. The densityLX(st) predicate indicates that a street st has a     density(-A,-B,-C)
density level L at time step X. L can take the values veryhigh,       densityLow2(A,B)?
high, moderate, low and verylow. The last predicate of each           +-yes: densityHigh3(A,-D)?
example, density(st,l), represents the current density level l of             +-yes: connection(A,B,D)?
the street st. This will represent the class of each example.                        +-yes: densityLow1(A,D)?
                                                                                            +-yes:[low]
4.2   Algorithms                                                                            +-no:[high]
We are using T ILDE [Blockeel and De Raedt, 1998] to learn                           +-no:[low]
relational decision trees. It receives two files as input: the set-           +-no:[low]
tings file, where the user can specify the algorithm parame-          +-no: densityLow1(A,B)?
ters, as well as defining the predicates and classes; and the               +-yes: [low]
knowledge base file, where both the training and test data                  +-no: [high]
are included. The output of the learning algorithm is a file
containing the resulting relational tree and its translation into                 Figure 4: Example of T ILDE output.
rules. It also contains the confusion matrix for the training and
test sets. An example output of T ILDE is shown in Figure 4,
5     Experiments and results                                                           7


On this work we use SUMO [Behrisch et al., 2011], an open                               6

source traffic simulator developed by the German Aerospace
Center (DLR). It allows to import or generate not only road                             5

networks, but also traffic demand. And it also allows users


                                                                    Miles of vehicles
to define traffic lights control programs. We want to test first                        4

if we are able to build a model to predict the appearance of
goals in advance, and then we try to apply the created model                            3

to several urban traffic control scenarios.
                                                                                        2

5.1    Results on Learning Goals                                                                                  Week
                                                                                                                Weekend
                                                                                        1
We are using a real city network in our learning experiments;
a grid-like section of Houston downtown, shown in Figure 5.                             0
                                                                                        00:00   05:00   10:00             15:00   20:00
It is composed of 35 junctions, 140 traffic lights and 164 street                                               Hours
sections. We have selected five particular street sections to
learn from (A to E). We chose these city points due to their        Figure 6: Summary of the generated traffic flows on weekdays
different traffic characteristics. C and D are street sections      and weekends. The y axis represents the number of vehicles
close to a Job Center. B is a point between the Job Center and      that enter the network at each hour, in thousands, and the x
the main exit of the city. E represents a street section far from   axis represents the hours.
the main traffic, while A is a random point with no specific
features.
                                                                       Data is collected every five minutes for the learning task,
                                                                    which means 2013 instances for the whole week. Five min-
                                                                    utes is what we call “time step”, the sample frequency. We
                                                                    have chosen this sample time as we want to collect traffic data
                                                                    from an entire week, and, at the same time, we want to keep
                                                                    a not very high number of instances so that T ILDE is able to
                                                                    handle them. In our experimental setting, a step in the simu-
                                                                    lation corresponds to a second. Each instance stores the static
                                                                    part of the city previously described, as well as the dynamic
                                                                    component of the state in the last three time steps. We learn
                                                                    one relational model for each street section shown in Figure 5,
                                                                    and then we test with data of the other street sections.
                                                                       We have also varied the density levels, both in the classes
                                                                    to predict and the predicates used on each instance. We have
                                                                    used two approaches. One is based on five density levels:
                                                                    veryhigh, high, moderate, low and verylow. A second version
                                                                    uses only two: high and low. All the generated models are
                                                                    pre-pruned, limiting the creation of new branches when the
                                                                    node has less than 10 instances.
                                                                       In the first experiment, we generated five different models
                                                                    using data from the five selected street sections and the five
Figure 5: Benchmark network in SUMO. Models are created
                                                                    density levels approach. And we tested these models in the
for points A, B, C, D and E. We assume that a Job Center is
                                                                    five street sections to check accuracy and generality of the
located on D. F corresponds to the main exit point of the city.
                                                                    learned models. The results for this first configuration are on
                                                                    Table 2.
   We have also defined a traffic demand that tries to emulate         We can observe that the accuracy is similar for all the street
the real traffic flow of a city for an entire week. So, we define   sections except for B, whose behaviour seems to be more dif-
lower vehicles traffic at night, more traffic at rush hours, and    ficult to predict. A and E, the two points away from downtown
higher traffic during week days than in the weekend. The Job        and the Job Center, present a similar behaviour as expected.
Center is included, where most of the cars want to go dur-             In the second experiment, the problem is simplified with
ing the work hours and also a main exit point, to go out of         only two density levels both for the class and the state predi-
the city at the end of the workday. The rest of the routes are      cates. The results for this last configuration are on Table 3.
randomly generated. The vehicles may enter the city by any             We can observe that as we decrease the number of den-
street section and can finish their trip in an inner (parking,      sity levels, the complexity of the problem decreases too and
mall, office...) or outer point of the network. A summary of        the prediction task becomes easier. With only two levels, the
the full traffic demand specification is shown in Figure 6.         density of a street knowing the state of the city in the last time
                  A      B       C       D       E                  whose their corresponding street density is currently high.
           A     0.90   0.68    0.85    0.77    0.83                We also compare with the AP approach proposed in [Gulic̀
           B     0.82   0.72    0.79    0.77    0.80                et al., 2015], co-winner of the ARTS-COST competition on
           C     0.83   0.66    0.88    0.77    0.81                Increasing the resilience of road traffic support systems by the
           D     0.80   0.66    0.83    0.85    0.81                use of autonomics1 . That planning system does not have any
           E     0.87   0.66    0.85    0.78    0.89                learning component and only calls the planner when a vehicle
                                                                    has been stopped for a long time. We will call it Planning.
Table 2: Accuracy results using the model obtained with five        This system is the starting point of our approach, so we use
density levels. Each cell (i, j) represents the estimated accu-     the same planning domain and planner, LAMA [Richter and
racy of learning a model with the data extracted at point i in      Westphal, 2010]. The last system we introduce in the tests
the city and testing that model against the data collected at       combines the Planning approach and the Learning one.
point j.                                                            It calls the planner when a goal (high density) is predicted or
                                                                    the current density of a street is high. We will refer to it as
                  A      B       C       D       E                  Combined.
           A     0.99   0.94    0.99    0.97    0.99                   We use the following metrics to measure the performance
           B     0.99   0.95    0.99    0.98    0.99                of each system: the number of steps it takes all cars to reach
           C     0.96   0.93    0.99    0.97    0.99                their destination; the total amount of C02 emitted by the vehi-
           D     0.96   0.93    0.99    0.98    0.99                cles; the average waiting time (AWT); the average travel time
           E     0.96   0.93    0.99    0.97    0.99                (ATT); and, if it applies, the number of planner executions
                                                                    (PE) and the mean planner execution time (MPE). We choose
Table 3: Accuracy results using two density levels for the          them simply for comparison, none of the systems explicitly
class and the predicates. Each cell (i, j) represents the esti-     reasons on optimizing these metrics.
mated accuracy of learning a model with the data extracted
at point i in the city and testing that model against the data      Experiments in a Medium-Sized City Network
collected at point j.                                               We created a fluent traffic scenario for the first experiment by
                                                                    introducing 5300 cars in 3600 steps in the same city network
                                                                    we used in the learning goals experiments. The simulation
steps can be predicted with a high accuracy, even in street sec-
                                                                    finishes if all cars reach their destination, or after 5000 steps.
tions that have very different behavior. The final model that
                                                                    The results are shown in Table 4. We can see that there is
will be used in our architecture corresponds to the one learned
                                                                    no substantial difference when the traffic is fluid among the
with the data of point B, which on average performs best. The
                                                                    different systems. But the Learning approach outperforms
relational tree was shown in Figure 4.
                                                                    the others on most metrics. So, when the traffic is fluent, one
5.2   Results on Traffic Management                                 expects that even the Static control program will perform
                                                                    well. In this traffic situation, the time spent on average per
Finally, we want to test whether a traffic control system would     vehicle in a traffic light (AWT) is approximately half of the
improve its performance if it had some predictive model of          total time spent in their complete travel (ATT). Given the size
the traffic. To do so, we will use several simulation scenarios     of the example network, ATT is around three minutes, while
where we vary the size of the network (medium and large),           AWT is around a minute and a half. The number of plan-
the fluency of traffic (fluent or congested) and the evaluated      ner executions is low in the Planning and Learning sys-
time period (an hour and a day).                                    tems, and it becomes very high when using the Combined
   When using the learned model, it predicts the density at         approach. The number of times it calls the planner is much
each street at each time step, using the previous X time steps      higher than in the two other approaches, as expected.
as input. If it detects a high density at any subset of the
street sections, it generates goals to lower the density of those                   Steps    C02      AWT      ATT     PE     MPE
street sections. These new goals, together with the current             Static      3969     1103      93      172
state of the traffic, create a PDDL planning problem that is           Reactive     4059     1137      100     181
given as input to the planner. Therefore, the system is pre-          Planning      4070     1117      95      175      22     10
dicting the appearance of goals in the next X time steps, and         Learning      3881     1090      88      167      15     10
the planning process can anticipate to the congestions. We            Combined      4104     1193      115     197      61     10
will call this new approach Learning. In [Pozanco et al.,
2016], we show that if the system uses a short-horizon predic-      Table 4: Performance of the different control systems with
tion, having the same time steps for both building the model        a fluent traffic situation in a medium-sized city. Steps, AWT
and checking for goals is not that important. So, our system        and ATT are given in steps (seconds), while C02 is in kg.
checks for new goals every fifty seconds using the predic-          MPE is in seconds.
tion model built with the five minutes time step previously
described.                                                            In the second experiment, we test the systems performance
   We compare our system with a Static one, that corre-             on a very congested traffic scenario using the same city net-
sponds to the default system used by SUMO. We also com-             work. It was created by introducing 6000 cars in one hour
pare our approach with a Reactive system, that acts locally
                                                                       1
on each traffic light and sets a longer green phase on those               https://helios.hud.ac.uk/cost/comp2.php
(3600 steps). The results are reported in Table 5. The columns
report the same metrics as the one before.

                 Steps    C02      AWT       ATT     PE     MPE
    Static         -      2553      582      638
   Reactive      4106     1262      119      202
  Planning         -      2187      435      506     48      11
  Learning       4070     1265      121      204     46      10
  Combined       4244     1301      128      212     68      11

Table 5: Performance of the different control systems with a
very congested traffic situation in a medium-sized city. Steps,
AWT and ATT are given in steps (seconds), while C02 is in
kg. MPE is in seconds.

   As we can see, even if the Planning approach out-
performs the Static system, it performs worse than the
Reactive mechanism and the two other autonomic ap-
proaches. Both Learning and Combined can completely
solve the traffic congestion. The vehicles spend much more
time waiting on average than travelling in this scenario (rela-
tion between ATT and AWT). However, the Learning sys-
tem is able to reduce the waiting time to half of the travel
time, as in a fluent traffic situation. Thus, it is effectively con-   Figure 7: Large city network used in the second type of ex-
verting a congested situation into a fluent traffic scenario. The      periments.
reduction of the pollution achieved by Learning is quite
substantial too: half of the C02 levels of the static approach.
In fact, they are close to those generated in a fluent traffic sce-    is able to generalize to this larger city. Our system scales
nario. Reactive obtains practically the same results than              quite well even in a large network; it can find a plan in less
the Learning approach, even if it only acts locally at each            than fifty seconds, the checking-for-goals sample period. The
traffic light without considering the whole network.                   performance of Planning is quite good in this case and it
                                                                       almost solves the congestion. Thus, this only-planning ap-
Dense Traffic in a Large Size City Network                             proach works well when we have a reasonably high traffic
This experiment tests the scalability of the proposed model to         density (as in this experiment or in the first one), but not too
larger city networks. The benchmark network in this case is            high (as in the previous experiment). The Reactive method
composed of 130 junctions, 520 traffic lights and 566 streets.         does not scale up well to the large city network. When trying
This can be considered as a large network in relation to most          to locally reduce the congestion, it ends up generating traffic
papers in the field, specially considering that our approaches         jams and performing even worse than the default, Static.
perform centralized planning. The network is shown in Fig-
ure 7. We introduce 13,000 cars in one hour in order to create         Full day experiment
a dense traffic situation. As the city is bigger than the previous     The last experiment focuses not only on trying to handle a
one, a experiment will finish when all cars reach their desti-         traffic peak, but also to test whether a system can deal with a
nation or after 6,000 time steps. Table 6 reports the results.         full day traffic flow. In these cases, the decisions spread over
                                                                       time. We use the medium-sized city network and a traffic de-
                 Steps    C02      AWT       ATT     PE     MPE        mand specification similar to the one presented on Figure 6
    Static         -      6649      439      549                       for the week days. In this experiment we only measure the
   Reactive        -      7676      605      709                       AWT per hour. The other metrics could be irrelevant for the
  Planning         -      5520      341      468     50      46        24 hours case. The results are reported on Figure 8. Vehi-
  Learning       5837     5231      321      445     47      44        cles routes remain static in SUMO. A car will always try to
  Combined         -      6279      518      633     64      54        reach its destination following the shortest path. If this route
                                                                       is congested, the vehicle will not choose another one, but it
Table 6: Performance of the different control systems with             will stand still waiting for the route to be free. That is the
a dense traffic situation in a large-sized city network. Steps,        reason why, when using some systems, the network can get
AWT and ATT are given in steps (seconds), while C02 is in              congested at some time point and become congested for the
kg. MPE is in seconds.                                                 whole day. We can see this effect when a given curve in the
                                                                       graphic reaches 200 s. When using this metric, a traffic sys-
  In this case, Learning outperforms the rest and it is the            tem performs better if the area under its curve is smaller.
only one that can finish the simulation before 6,000 steps.               As we can see, only our Learning system is able to fin-
The model we learned with the medium-sized urban network               ish the simulation properly. The AWT grows up in the morn-
      200
                                                                   traffic light. A weak point of these systems is that they cannot
                                                                   predict incidents and they do not deal well with them. Also,
      180
                                                                   their models are not defined declaratively. Thus, our models
                                                                   are easier to update with new types of information, or new
      160
                                                                   metrics to be taken into account when optimizing.
                                                                      Other AI-related approaches have appeared in recent years.
      140
                Learning
                   Static                                          The main goal is to build semi- or fully autonomous systems
AWT


                Planning

      120
                Reactive
               Combined
                                                                   with little human assistance. Most of them address traffic
                                                                   management from a multi-agent perspective. A single agent
      100
                                                                   acts over a single junction or subset of junctions and then
                                                                   several agents collaborate, discuss and negotiate with the
      80
                                                                   rest [Ossowski et al., 1998]. In [Box and Waterson, 2012],
                                                                   the authors propose a model based on logistic regression and
      60                                                           neural networks to learn over time how to better control the
       00:00                05:00   10:00          15:00   20:00
                                            Hour
                                                                   traffic signals. Other approaches focus on multi-agent rein-
                                                                   forcement learning [Kuyer et al., 2008], distributed geomet-
Figure 8: Average waiting time in the city network per hour.       ric fuzzy systems [Gokulan and Srinivasan, 2010] or creat-
                                                                   ing a multi-agent model predictive control [de Oliveira and
                                                                   Camponogara, 2010]. New approaches for efficient UTC are
ing, when the cars go to the Job Center, but it does not get       arising in the last years using vehicle communication as the
fully congested. The AWT remains around 80 s throughout            core of the control process [Ferreira et al., 2010]. But, these
the morning and it starts growing again by the end of the          methods are still far from being implemented in real cities and
workday. The metric reaches a peak around 18:00 where the          controlling traffic lights remains the most widespread way to
AWT is 103 seconds at the most congested traffic situation         handle urban traffic.
of the day, which is still a reasonable behavior. After that
time period, the system is able to reduce the congestion and       7   Conclusions and Future work
the AWT starts to decrease. The Reactive system, which
showed good performance in the medium-sized city network,          In this paper we have presented a dynamic approach for
can solve the early morning traffic problem. It obtains similar    UTC based on Automated Planning and Relational Learn-
results to the ones of Learning until the end of the work-         ing. As we have shown, by adding a learning component
day. However, it cannot deal correctly with the end of the day     that can predict the city state to a planning system, we can
traffic. The other systems can not face the morning rush hour.     highly increase its autonomy. It can automatically generate
Even if the Planning system is still better than the other         its own goals, in addittion to letting the planner starts the
two, it does not solve the congestion.                             planning process sooner. We have tested our model in sev-
                                                                   eral traffic control scenarios, showing that the ability to an-
                                                                   ticipate goals can lead to better control performance than us-
6      Related work                                                ing only static traffic lights programns. Our system also out-
The first UTC models in the 1950s and 1960s, were based            performs the Planning system and overcomes its limita-
on fixed-time traffic lights control mechanisms. Actions were      tions, as Planning needs to know when a vehicle has been
predefined following an off-line optimization using historical     stopped for a long time. Instead, our model only needs the
data of demand levels. TRANSYT [Robertson, 1969] is one of         street density levels, which are easier to obtain from current
the most well developed and widely used control systems that       sensor systems. By just knowing density levels, we are able
uses these techniques. These approaches could even generate        to model a wide variety of circumstances that affect traffic
“green waves”, simple coordination of neighbouring traffic         behavior such as adverse weather conditions or different days
lights in order to increase the traffic fluidity. The problem of   and hours. Also, since other types of incidents (e.g., road-
early systems is that they can age rapidly due to the continu-     blocking or big accidents) indirectly affect the density levels,
ous evolution of the traffic flows in a city. The benefits may     we believe our approach could also work to alleviate conges-
be lost in some years if the control plans are not updated. Our    tions caused by them.
proposed system overcomes this situation, as it not only can          In future work, we would like to integrate the ability to
react to the current traffic scenario, but it can anticipate and   learn how to anticipate goals with externally supplied goals
adapt to future ones.                                              (e.g., by traffic controllers), reactively generated ones (e.g.,
   In the last years, the use of new and better sensor systems     reactively generating goals), or internally supplied ones (e.g.,
has allowed engineers to implement traffic-responsive sys-         generated by internal motivations of the system). Although
tems that use the data provided by the detectors in an on-line     the proposed system scales up, we would also like to apply
way. These techniques range from centralized approaches, as        a multi-agent approach by dividing the city in sections in
SCOOT [Bretherton et al., 1998] and SCATS [Lowrie, 1990]           which an agent can apply the system in an autonomous way.
to distributed ones as UTOPIA[Donati et al., 1984]. As most        We think this could lead to similar performance with lower
other traffic-responsive systems, they use a mathematical          execution times. We would also like to compare our system
framework to compute the optimal time allocation of each           with other state of the art methods on traffic control, such as
model predictive control (e.g., SCOOT), or other AI-based ap-      [Gulic̀ et al., 2015] Matija Gulic̀, Ricardo Olivares, and
proaches (e.g., reinforcement learning). Finally, we want to          Daniel Borrajo. Using automated planning for traffic sig-
test the proposed system in irregular city networks such as           nals control. In Working Notes of ARTS-COST 2nd com-
European ones and build the learning model on-line in order           petition, 2015.
to show the system’s real-world applicability.                     [Guzmán et al., 2012] César Guzmán, Vidal Alcázar, David
                                                                      Prior, Eva Onaindı́a, Daniel Borrajo, Juan Fdez-Olivares,
Acknowledgements                                                      and Ezequiel Quintero. PELEA: a domain-independent
This work has been partially supported by MINECO project              architecture for planning, execution and learning. In Pro-
TIN2014-55637-C2-1-R.                                                 ceedings of ICAPS’12 Scheduling and Planning Applica-
                                                                      tions woRKshop (SPARK), pages 38–45, Atibaia (Brazil),
                                                                      2012. AAAI Press.
References
                                                                   [Hamilton et al., 2013] Andrew Hamilton, Ben Waterson,
[Behrisch et al., 2011] Michael Behrisch, Laura Bieker,               Tom Cherrett, Andrew Robinson, and Ian Snell. The
   Jakob Erdmann, and Daniel Krajzewicz. Sumo–simulation              evolution of urban traffic control: changing policy and
   of urban mobility. In The Third International Confer-              technology. Transportation planning and technology,
   ence on Advances in System Simulation (SIMUL 2011),                36(1):24–43, 2013.
   Barcelona, Spain, 2011.
                                                                   [Kuyer et al., 2008] Lior Kuyer, Shimon Whiteson, Bram
[Blockeel and De Raedt, 1998] Hendrik Blockeel and Luc                Bakker, and Nikos Vlassis. Multiagent reinforcement
   De Raedt. Top-down induction of first-order logical de-            learning for urban traffic control using coordination
   cision trees. Artificial intelligence, 101(1):285–297, 1998.       graphs. In Machine learning and knowledge discovery in
[Bonet and Geffner, 2005] Blai Bonet and Héctor Geffner.             databases, pages 656–671. Springer, 2008.
   mGPT: A probabilistic planner based on heuristic search.        [Lowrie, 1990] PR Lowrie. Scats, sydney co-ordinated adap-
   JAIR, 24:933–944, 12 2005.                                         tive traffic system: A traffic responsive method of control-
[Box and Waterson, 2012] Simon Box and Ben Waterson.                  ling urban traffic. 1990.
   An automated signalized junction controller that learns         [Ossowski et al., 1998] Sascha Ossowski, José Cuena, and
   strategies from a human expert. Engineering applications           Ana Garcı́a-Serrano. A case of multiagent decision sup-
   of artificial intelligence, 25(1):107–118, 2012.                   port: Using autonomous agents for urban traffic control. In
[Bretherton et al., 1998] R. Bretherton, K. Wood, and G.T.            Progress in Artificial Intelligence—IBERAMIA 98, pages
   Bowen. Scoot version 4. In Proceedings of 9th Inter-               100–111. Springer, 1998.
   national Conference on Road Transport Information and           [Papageorgiou et al., 2007] M Papageorgiou, M Ben-Akiva,
   Control, 1998.                                                     Jon Bottom, Piet HL Bovy, SP Hoogendoorn, Nick B
[de Oliveira and Camponogara, 2010] Lucas               Barcelos      Hounsell, Apostolos Kotsialos, and M McDonald. Its and
   de Oliveira and Eduardo Camponogara. Multi-agent                   traffic management. Handbooks in Operations Research
   model predictive control of signaling split in urban traffic       and Management Science, 14:715–774, 2007.
   networks. Transportation Research Part C: Emerging              [Pozanco et al., 2016] Alberto Pozanco, Susana Fernández,
   Technologies, 18(1):120–139, 2010.                                 and Daniel Borrajo. On learning planning goals for traffic
[Donati et al., 1984] F Donati, Vito Mauro, G Roncolini, and          control. In 4th Workshop on Goal Reasoning (IJCAI’16),
   M Vallauri. A hierarchical decentralized traffic light con-        2016.
   trol system. the first realisation ”progetto torino”. In        [Richter and Westphal, 2010] Silvia Richter and Matthias
   Proceedings of the 9th World Congress of the Interna-              Westphal. The lama planner: Guiding cost-based anytime
   tional Federation of Automotive Control, pages 2853–               planning with landmarks. Journal of Artificial Intelligence
   2858, 1984.                                                        Research, 39(1):127–177, 2010.
[Ferreira et al., 2010] Michel Ferreira, Ricardo Fernandes,        [Robertson, 1969] Dennis I Robertson. Transyt: a traffic net-
   Hugo Conceição, Wantanee Viriyasitavat, and Ozan K               work study tool. 1969.
   Tonguz. Self-organized traffic control. In Proceedings of       [Vallati et al., 2016] M. Vallati, D. Magazzeni, B. De Schut-
   the seventh ACM international workshop on VehiculAr In-            ter, L. Chrpa, and T.L. McCluskey. Efficient macroscopic
   terNETworking, pages 85–90. ACM, 2010.                             urban traffic models for reducing congestion: a pddl+ plan-
[Fox and Long, 2003] Maria Fox and Derek Long.                        ning approach. In Proceedings of the Thirtieth AAAI Con-
   PDDL2.1: An extension to PDDL for expressing                       ference on Artificial Intelligence (AAAI-16), 2016.
   temporal planning domains. Journal of AI Research,              [Yoon et al., 2007] Sungwook Yoon, Alan Fern, and Robert
   20:61–124, 2003.                                                   Givan. FF-replan: A baseline for probabilistic planning. In
[Gokulan and Srinivasan, 2010] Balaji Parasumanna Goku-               ICAPS, pages 352–360, 2007.
   lan and Dipti Srinivasan. Distributed geometric fuzzy mul-
   tiagent urban traffic signal control. Intelligent Transporta-
   tion Systems, IEEE Transactions on, 11(3):714–727, 2010.