<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Gora)
~ https://www.mimuw.edu.pl/~pawelg (P. Gora)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Graph-based Sparse Neural Networks for Trafic Signal Optimization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukasz Skowronek</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pawel Gora</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcin Mozejko</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arkadiusz Klemenko</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics</institution>
          ,
          <addr-line>Informatics and Mechanics</addr-line>
          ,
          <institution>University of Warsaw</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We investigate the performance of sparsely connected neural networks, with connectivity determined by road network graphs, for solving the Trafic Signal Setting optimization problem. We conducted experiments on three realistic road network topologies and found these types of graph neural networks superior to fully connected ones, both in terms of generalization properties on fixed test sets and - more importantly - near target function minima obtained in the gradient descent optimization process. We additionally confirm the soundness of our method by showing that random perturbations of the actual graph lead to consistent deterioration of model performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;trafic optimization</kwd>
        <kwd>graph neural networks</kwd>
        <kwd>Trafic Signal Setting problem</kwd>
        <kwd>surrogate modelling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Trafic optimization problems have a natural underlying graph structure, determined by the
topology of the corresponding road network. In this paper, we introduce a neural network
architecture based on a road network graph adjacency matrix to solve the so-called Trafic
Signal Setting (TSS) problem, in which the goal is to find the optimal trafic signal settings for
given trafic conditions (as defined in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Some variants of this problem were proven to be
NP-hard even for very simple trafic models ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), and therefore, heuristics and approximations
have been used to solve it ([
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), but the existing approaches still have some drawbacks. For
example, evaluating the quality of trafic signal settings using accurate trafic simulations (which
is a standard evaluation method) can be too time-consuming, especially for large-scale road
networks and/or online evaluation ([
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]). Also, the size of the space of possible solutions is
so large that it turns out infeasible, in any reasonable time, to obtain global minima (or even a
relatively good signal settings) of the simulator output by checking all the possible solutions or
doing a random search ([
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), as most points in the input space are far from the optimal solutions.
      </p>
      <p>
        A strategy used for solving these two dificulties was presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and consists of
generating a reasonably sized training set using a trafic simulator and then fitting a machine learning
model to approximate the outcomes of trafic simulations very fast and accurately. The output of
these models can be then minimized using an optimization algorithm, such as gradient descent,
in hope to obtain close to optimal trafic signal settings. This strategy turned out to be quite
successful ([
        <xref ref-type="bibr" rid="ref1 ref4 ref5">1, 5, 4</xref>
        ]), yet the models’ accuracy degraded close to points considered as minima
by the optimization algorithm and the model, making further optimization far more dificult
([
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]).
      </p>
      <p>In this paper, we show that the graph-based neural networks (GNN) that we use, built based on
road network graphs, can outperform feed-forward fully connected neural networks (FCNN) on
this task. Comparing to the standard FCNNs, the introduced GNNs have most of the connections
deleted, keeping only the crucial connections between neurons (making the information flow
corresponding to the trafic flow in the road network), which makes this architecture relatively
sparse, easier to train and generalize. As a consequence, GNNs have better accuracy on the test
set, as well as close to the local optima found using gradient descent optimization applied to
the TSS problem. We also prove that GNN architectures work better than analogous ones built
based on perturbed adjacency matrices.</p>
      <p>The rest of the paper is organized as follows. Section 2 puts our work in the context of
a research in the domain of building surrogate models for complex processes, solving TSS
problem and using graph neural networks. Section 3 presents the two types of graph neural
network architectures we used. In Section 4, we describe the setup of our main experiments
including a description of the used datasets and their generation. Section 5 summarizes our
main experiment results showing that neural network architectures introduced in this paper
outperform other models used in such a task. In Section 6, we summarize the results of several
’sanity checks’ we performed in order to confirm that our results were not obtained by chance.
Moreover, the results in Section 6 can be especially interesting for researchers in graph neural
networks, e.g. we show that out-of-sample performance of graph-based sparse neural nets
decreases (almost) monotonously as a function of the distance of the adjacency matrix we use
for constructing the network from the true adjacency matrix. We summarize the presented
research in Section 7, outlining some possible future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Complex processes, such as road trafic in cities, are dificult to study due to large number
of interacting components (e.g., vehicles), nondeterminism or sensitive dependence on initial
conditions. Very often, the only reasonable method to accurately predict the behaviour of such
systems is to apply computer simulations which can be time-consuming and usually can’t be
simplified due to computational irreducibility. However, in many tasks related to complex
processes it is not necessary to obtain very accurate predictions, it can be suficient to get
approximate outcomes but as fast as possible (due to stochasticity or sensitive dependence on
initial conditions, it may be impossible to predict the exact value anyway). Therefore, in such
cases it is natural to build the so-called surrogate models (metamodels) approximating outcomes
of simulations very fast and with a good accuracy ([
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). Such applications are especially common
in the case of optimization tasks, in which quite often it is necessary to run multiple simulations
in order to evaluate many diferent input settings ([
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ]). This is the case of trafic optimization
problems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], such as the TSS problem. Many such surrogate models are based on machine
learning methods, such as neural networks [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], and also some previous works on solving
TSS [
        <xref ref-type="bibr" rid="ref1 ref3 ref4 ref5">1, 5, 3, 4</xref>
        ] use various machine learning techniques (e.g., based on neural networks or
gradient boosted decision trees) to build metamodels of trafic simulations, which were used to
evaluate quality of trafic signal settings. Such metamodels were able to approximate outputs of
simulations (the total times of waiting on red signals) with a very good accuracy (e.g., values
of the MAPE metrics were at the level 1 − 2%) and a few orders of magnitude faster than by
running microscopic simulations [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ]. Thanks to that, it was possible to use optimization
algorithms such as genetic algorithms or gradient descent, to find heuristically optimal signal
settings without performing extensive parameter space searches that would take weeks to
complete [
        <xref ref-type="bibr" rid="ref1 ref3 ref4 ref5">1, 5, 3, 4</xref>
        ]. However, information about the road network structure has never been
used in those experiments, even though it should naturally be relevant when optimizing trafic.
      </p>
      <p>
        Introducing the direct connection between the network architecture and the graph structure
can help to leverage additional information represented by a graph. Similarly to our work, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
introduces a graph NN layer in which each vertex has specific parameters assigned to combine
information from its neighbors. However, diferently to our method, this layer uses only an
original graph matrix and skips the dual graph structure when performing computations. The
notable usage of a dual structure can be found in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], where it is compressed to a PPMI matrix
using aggregated statistics from a random graph walk procedure. This aggregation is used to
introduce a vertex neighborhood context similarly to a popular T-SNE method [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] provides
an extensive overview of diferent graph neural networks architectures and applications.
      </p>
      <p>
        Due to their capability to capture a road-network structure, GNNs were used in multiple
trafic applications. In [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ] authors used spatio-temporal GNNs for a trafic situation
prediction, whereas [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] used the same technique in order to predict the TAXI demand. However,
our application of graph neural networks in the trafic optimization domain and the Trafic
Signal Setting problem seems to be the first such approach.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Network architecture</title>
      <p>
        The key idea in defining our sparse graph-based neural network architecture is an intuitively
compelling rule that information/signal should propagate locally between the net layers. By
locality, we mean the presence of only those neuron connections that have a corresponding
nonzero entry in the adjacency matrix of the corresponding graph. In the case of the road network,
in order to implement such a rule, the neurons in the successive layers of the neural network
should be linked to the neurons corresponding to vertices and/or edges of the corresponding
graph. Thus, we propose the following general ways to build a graph neural network (see
Section 1 of Supplementary materials ([
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]) for mathematical formulas):
1. Neurons in the even numbered layers, starting with the input layer as layer 0, correspond
to graph vertices (in our case - road crossings). Neurons in the odd numbered layers
correspond to graph edges (in our case - road segments). An exception should be the final
layer with just one neuron. Connections from a vertex-localized layer to an edge-localized
layer should only be present if a given vertex is an end of a given edge in the
corresponding road network graph. There are exactly two such connections for every edge
neuron. Connections from an edge-localized layer to a vertex-localized layer should only
be present if the edge has the vertex as its end in the corresponding road network graph.
The number of such connections is equal to the number of particular vertex neighbors.
or
2. Neurons in all layers, with the exception of the output layer, correspond to road network
graph vertices. Connections from a neuron in one layer to a neuron in the next one
should only be present if the corresponding vertices are neighbors in the road network
graph. The number of connections for the vertex node is equal to the number of the
vertex neighbors.
      </p>
      <p>
        Although architecture 2 might seem to be more basic, architecture 1 appears to naturally
model a trafic flow through the road networks (see Supplementary materials ([
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]), Section 2,
for a detailed explanation). In the rest of this paper, we focus solely on GNNs of the architecture
type 1.
      </p>
      <p>It should also be pointed out that GNNs can have multiple channels at each edge/vertex. The
number of channels in each layer is a hyperparameter of the network. In the following part, we
always assume the number of channels to be constant across the hidden layers of the network.</p>
      <p>
        One may also notice a similarity between our GNN architecture 2 and the graph neural
networks proposed by Thomas Kipf [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. However, we do not share any weights in our model,
as we aim to focus on local patterns connected to roads / crossroads. Theoretically, we could
introduce some weight sharing in the ’edge’ layers of GNNs of type 1, but our first experiments
using this approach led to highly disappointing results.
      </p>
      <p>In typical ML literature terminology, our GNNs should likely be called ‘NNs with a fixed
sparse connectivity mask’. In case of multi-channel networks, sparsity is applied in the ‘spacial’,
but not in the ‘channel’ dimension.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment setup</title>
      <p>
        In order to train the surrogate models, it was necessary to generate datasets first. For this task,
we simulated vehicular trafic on 3 realistic road networks, corresponding to selected districts
in Warsaw: Centrum, Ochota and Mokotów, including 11, 21 and 42 intersections with trafic
signals, respectively. The simulations were run using a microscopic trafic simulator, Trafic
Simulation Framework [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], for which a road network description for Warsaw was obtained
from the OpenStreetMap service [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The inputs to the simulator were vectors of lengths 11,
21 and 42, respectively. Each position in a vector represented an ofset of a trafic signal on a
corresponding intersection. The ofsets are shifts with respect to a global two minute trafic
signal cycle start - times from the beginning of the simulation to the first switch from the green
signal state to the red signal state (it was assumed for simplicity that the duration of a green
signal phase is always equal to 58 seconds, while duration of a red signal phase is equal to 62,
constituting a 120-second cycle). The ofsets were provided as integers, measured in seconds,
hence they ranged from 0 to 119 (note the periodicity of these variables). The simulator output
in each case was the total waiting time on red signals, summed for all the cars participating in
the simulation on a considered area (finding the inputs minimizing this output value was the
optimization goal of the considered TSS problem instance).
      </p>
      <p>
        Each simulation lasted 10 minutes and consisted of 42000 cars on the whole road network of
Warsaw. The datasets for Ochota, Mokotów and Centrum were generated using approximately
100000 randomly selected inputs for the TSF simulator (the input ofset values from the set
{0, 1, . . . , 119} were sampled from the uniform distribution independently). These datasets are
publicly available to enable further research ([
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]).
      </p>
      <p>
        After preparing the datasets, we trained GNN and FCNN networks as metamodels to solve
TSS using gradient descent. Before training, we scaled the inputs to [
        <xref ref-type="bibr" rid="ref1">− 1, 1</xref>
        ] using the following
mappring  ↦→ sin (2/ 120) and  ↦→ cos (2/ 120), thus doubling the input size (actually,
increasing the number of input channels). This is motivated by the periodicity of the problem
the neural network may learn that the ofsets are periodic and values 0 and 120 correspond to
the same setting and this can improve the training [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. For the output, we used a standard
scaler that divides the data by its standard deviation and shifts the mean to zero.
      </p>
      <p>For each of the 3 considered road networks, we tested 9 diferent GNN architectures and 9
FCNN architectures. The 9 selected GNN architectures corresponded to all combinations of
values from the following parameter sets:
• number of hidden GNN layers: 2, 3, 4 (not counting input and output layers);
• number of channels per layer: 3, 4, 5.</p>
      <p>
        The activation function we used was tanh, indicated as superior to ReLU in preliminary
experiments and in previous works [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>For comparison, we also tested 9 FCNN architectures with tanh activation function, covering
all combinations of parameter values from the following sets:
• number of hidden layers: 2, 3, 4
• number of neurons per layer: 20, 40, 100</p>
      <p>
        For each of the 3 datasets, we used the same 90/10 train/test split for each of the considered
18 hyperparameter settings (9 GNN architectures and 9 FCNN architecture). For each of the
architectures, we ran the following procedure:
1. Train a model on the training set for about 1100 epochs (concretely, minimize on 105
random mini-batches of size 997 (997 is the closest prime to 1000 - a prime number was
chosen to assure better randomization, although it was not expected to have any real
efect) using Adam optimizer ([
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]) and a learning rate of 0.0035).
2. Evaluate the trained model on the test set using the mean relative error with respect to
the original outputs as the core metrics.
3. Generate 100 gradient descent trajectories of the trained model output with respect to its
inputs (in the original input space, backpropagating through the sin/cos transformation).
Gradients were evaluated at inputs rounded in the original parameter space (our trafic
simulator (TSF) accepts only integer inputs). Nesterov updater ([
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]) with a learning rate
of 0.01 and momentum of 0.9 is used. Each trajectory had 3000 steps. This is similar to
approach used in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
4. Every 30 steps, transform the current trajectory point to the original parameter space,
round and send to the TSF simulator. Save the inputs and the simulator outputs to a new
‘simulation’ test set.
5. Evaluate the trained model on the ‘simulation’ test set using various metrics (cf. the
discussion in Section 5).
      </p>
      <p>
        All the experiments were run on virtual machines in the Azure cloud (NC6 with NVIDIA
Tesla K80 ([
        <xref ref-type="bibr" rid="ref26">26</xref>
        ])). The code used in the experiments can be found at ([
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]). All of the models
trained for the main experiment and all the out-of-sample simulation datasets can be found at
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The core dataset can be obtained at [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiment results</title>
      <p>
        The key results of our experiments with GNNs are shown in Table 1, as well as in Figure 1,
complemented by the tables and figures in Section 3 of Supplementary Materials [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Table 1
shows a summary of core performance measures, calculated for three top GNN and three FCNN,
ranked based on the average accuracy on the test set (MAPE). The core measures presented are:
• Minimum MAPE (mean absolute percentage error) on the test set. This number can be
obtained before doing gradient descent. The minimum is taken among the 3 top ranked
GNNs or FCNNs (according to the row description). Because of the model selection
criterion we use for Table 1, this minimum is global within the respective 9-element
model universe (GNN or FCNN).
• Minimum simulation output obtained when doing gradient descent (note that while being
interesting from a trafic optimization perspective, this measure lacks robustness, as it
can be distorted by a single data point).
• Average MAPE on % (for  = 5, 10, 15) gradient descent trajectory ends, selected
according to the corresponding simulator output value (sorted lowest first). An average
is taken over the three models selected, GNN or FCNN, according to the row description.
      </p>
      <p>First, let us note that the results of Table 1 show a better performance of GNNs comparing to
FCNNs, particularly in terms of minimum MAPE of the test set and average MAPE on the lowest
points from the gradient descent trajectory according to the simulation. The improvement is
visible for all the 3 road maps (Ochota, Mokotów, Centrum) and all the 5 core measures (with
the exception of the minimum simulation output value obtained for Centrum, where one FCNN
turned out to yield slightly (less than 0.1%) lower result than all the GNNs).</p>
      <p>To summarize, the core improvement areas are:
• Much lower error on the test set.
• Lower minimum simulator output value obtained when doing the gradient descent (except
for Centrum, for which we can count a draw).
• Much lower approximation error obtained on the trajectory ends corresponding to 5%,
10% and 15% lowest simulator output values.</p>
      <p>
        Figure 1 as well as similar figures for Ochota and Mokotów (see Section 3 of Supplementary
Materials [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]) show the density of gradient descent trajectory points as heatmap plots. The
horizontal axis corresponds to gradient descent trajectory point number (recorded every 30
steps), and the vertical axis corresponds to the simulator output. Each trajectory had 3000 steps,
but we recorded points every 30 steps. The plots show a heatmap of these points on the (point
number, simulator output) plane. Thus, the more points in some area, the brighter the color.
Also, if one architecture reaches a lower minimum than another, the resulting heatmap is taller.
      </p>
      <p>
        Besides confirming some of the quantitative conclusions from Table 1, the heatmaps also
show that in many cases, the gradient descent is less ‘noisy’ for GNN, suggesting a smoother
function surface, less prone to overfit noise (this is best visible in the plots in Section 3 of [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]).
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Consistency checks</title>
      <p>The findings of the previous section call for some careful consistency checks before reaching
ifnal conclusions. In particular, it is not fully clear that the actual adjacency matrix gives any
value. Perhaps, any similar graph, even not related to the problem at hand, can do equally well.</p>
      <p>To address that question, we decided to fix the number of layers to 3 and the number of
channels to 4 per layer (for GNN of type 1), and built our nets using random graphs with
various degrees of resemblance to the true problem graph (we repeated this for all the three
road networks we considered). As a measure of graph similarity, we used the symmetric
diference between the sets of graph edges. The random graphs were generated in two ways.
The rfist method (referred later as ‘Edge/Non-edge switching’) used random edge insertions
and deletions, with the desired value of the symmetric diference kept fixed. The second method
(referred later as ‘Vertex label permutation’) used random permutations of the vertex labels
while keeping the connection graph structure exactly the same. Graphs generated by this
method were isomorphic, but not identical to the original one.</p>
      <p>It is worth mentioning that although the first method generates truly random graphs similar
to the original one, the new graph might not represent a plausible road network. The second
method, on the contrary, always keeps the same, realistic road network graph structure, but it
provides spurious insights to the training algorithm as crossroads are switched.</p>
      <p>Results obtained by the two methods are shown in Figure 2, including Ochota, Mokotów and
Centrum. The plots show that the mean relative error achieved on the test sets by neural nets
based on random graphs, after roughly 330 epochs of training, as a function of the distance of
the graph used for constructing the net to the true graph. The distance was measured using the
symmetric diference between the respective edge sets.</p>
      <p>As we can see, the median of the mean relative error, denoted with a red dot, grows almost
monotonously as a function of the distance of the graph we use to the actual problem graph.
This is visible for both graph sampling methods. The minimum average relative error attained
for a particular value of the distance also grows, perhaps with a bit more of noise.
(a) Edge/Non-edge switching
(b) Vertex label permutation</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>We demonstrated the usefulness of sparsely connected neural networks, with sparsity based on
an adjacency graph, in a problem from the trafic optimization domain. GNN consistently
outperformed FCNN on fixed test sets for the three realistic road networks we considered (Ochota,
Mokotów and Centrum districts in Warsaw). More importantly, GNN achieved approximation
quality far superior to FCNN near unseen simulator output value minima. By using randomly
perturbed graphs, we also showed that the choice of the proper graph when constructing a
GNN is important for achieving good results on a test set.</p>
      <p>The kind of NN sparsity considered in this paper, where only some of the connections are
allowed, may be regarded as a kind of a regularizer based on the problem graph. It is similar
to L1 regularization of a fully connected neural network in that it keeps only some weights
non-zero in the trained model. The resulting networks have much fewer parameters than
analogous fully connected networks and turn out to generalize significantly better than any
architecture we considered so far for solving the TSS problem.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The presented research was supported by Microsoft’s “AI for Earth” computational grant.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kurach</surname>
          </string-name>
          ,
          <article-title>Approximating trafic simulation using neural networks and its application in trafic optimization</article-title>
          ,
          <source>in: NIPS 2016 Workshop on Nonconvex Optimization for Machine Learning: Theory and Practice</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <article-title>The model and properties of the trafic light problem</article-title>
          ,
          <source>in: Proc. of International Conference on Algorithms</source>
          ,
          <year>1996</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brzeski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Możejko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klemenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kochański,</surname>
          </string-name>
          <article-title>Investigating performance of neural networks and gradient boosting models approximating microscopic trafic simulations in trafic optimization tasks</article-title>
          ,
          <source>in: "NeurIPS 2018 Workshop "Machine Learning for Intelligent Transportation Systems"</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Możejko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brzeski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Madry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Skowronek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gora</surname>
          </string-name>
          ,
          <article-title>Trafic signal settings optimization using gradient descent</article-title>
          ,
          <source>Schedae Informaticae</source>
          <volume>27</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bardoński</surname>
          </string-name>
          ,
          <article-title>Training neural networks to approximate trafic simulation outcomes</article-title>
          ,
          <source>in: 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>889</fpage>
          --
          <lpage>894</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Chowdhury,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Messac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Castillo</surname>
          </string-name>
          ,
          <article-title>Adaptive hybrid surrogate modeling for complex systems</article-title>
          , AIAA J (
          <year>2013</year>
          )
          <fpage>643</fpage>
          -
          <lpage>656</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rijnen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rhuggenaath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>d</surname>
          </string-name>
          . O. d. Costa,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Machine learning based simulation optimisation for trailer management</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Systems, Man and Cybernetics</source>
          (SMC),
          <year>2019</year>
          , pp.
          <fpage>3687</fpage>
          -
          <lpage>3692</lpage>
          . doi:
          <volume>10</volume>
          .1109/SMC.
          <year>2019</year>
          .
          <volume>8914329</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Hurrion</surname>
          </string-name>
          ,
          <article-title>A sequential method for the development of visual interactive metasimulation models using neural networks</article-title>
          ,
          <source>The Journal of the Operational Research Society</source>
          <volume>51</volume>
          (
          <year>2000</year>
          )
          <fpage>712</fpage>
          -
          <lpage>719</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Meckesheimer</surname>
          </string-name>
          ,
          <article-title>Chapter 18 metamodel-based simulation optimization</article-title>
          , in: S. G. Henderson,
          <string-name>
            <surname>B. L.</surname>
          </string-name>
          Nelson (Eds.), Simulation, volume
          <volume>13</volume>
          of Handbooks in Operations Research and Management Science, Elsevier,
          <year>2006</year>
          , pp.
          <fpage>535</fpage>
          -
          <lpage>574</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Osorio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bierlaire</surname>
          </string-name>
          ,
          <article-title>A surrogate model for trafic optimization of congested networks: an analytic queueing network approach</article-title>
          ,
          <source>in: EPFL-REPORT-152480</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Micheli</surname>
          </string-name>
          ,
          <article-title>Neural network for graphs: A contextual constructive approach</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          <volume>20</volume>
          (
          <year>2009</year>
          )
          <fpage>498</fpage>
          -
          <lpage>511</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>Dual graph convolutional networks for graph-based semi-supervised classification</article-title>
          ,
          <source>in: Proceedings of the 2018 World Wide Web Conference</source>
          , WWW '18,
          <string-name>
            <given-names>International</given-names>
            <surname>World Wide Web Conferences Steering Committee</surname>
          </string-name>
          , Republic and Canton of Geneva, CHE,
          <year>2018</year>
          , p.
          <fpage>499</fpage>
          -
          <lpage>508</lpage>
          . URL: https://doi.org/10.1145/3178876.3186116. doi:
          <volume>10</volume>
          . 1145/3178876.3186116.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L. van der</given-names>
            <surname>Maaten</surname>
          </string-name>
          , G. Hinton,
          <article-title>Visualizing data using t-SNE</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          . URL: http://www.jmlr.org/papers/v9/vandermaaten08a.html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on graph neural networks</article-title>
          , CoRR abs/
          <year>1901</year>
          .00596 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1901</year>
          .00596. arXiv:
          <year>1901</year>
          .00596.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shahabi</surname>
          </string-name>
          , Y. Liu,
          <article-title>Graph convolutional recurrent neural network: Data-driven trafic forecasting</article-title>
          ,
          <source>CoRR abs/1707</source>
          .
          <year>01926</year>
          (
          <year>2017</year>
          ). URL: http://arxiv.org/abs/1707.
          <year>01926</year>
          . arXiv:
          <fpage>1707</fpage>
          .
          <year>01926</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Spatio-temporal graph convolutional neural network: A deep learning framework for trafic forecasting</article-title>
          ,
          <source>CoRR abs/1709</source>
          .04875 (
          <year>2017</year>
          ). URL: http://arxiv. org/abs/1709.04875. arXiv:
          <volume>1709</volume>
          .
          <fpage>04875</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <article-title>Attention based spatial-temporal graph convolutional networks for trafic flow forecasting</article-title>
          ,
          <source>in: AAAI</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Deep multi-view spatialtemporal network for taxi demand prediction</article-title>
          , CoRR abs/
          <year>1802</year>
          .08714 (
          <year>2018</year>
          ). URL: http: //arxiv.org/abs/
          <year>1802</year>
          .08714. arXiv:
          <year>1802</year>
          .08714.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Supplementary</surname>
          </string-name>
          , Supplementary materials,
          <year>2021</year>
          . URL: https://drive.google.com/file/d/1sba_
          <fpage>cunGhao4z4</fpage>
          -
          <lpage>loIfQYV7u4cXCdnWk</lpage>
          /view?usp=sharing.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>Semi-supervised classification with graph convolutional networks</article-title>
          ,
          <source>in: 5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          , Conference Track Proceedings,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gora</surname>
          </string-name>
          ,
          <article-title>Trafic simulation framework - a cellular automaton based tool for simulating and investigating real city trafic</article-title>
          ,
          <source>in: Recent Advances in Intelligent Information Systems</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>641</fpage>
          -
          <lpage>653</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>OSM</surname>
          </string-name>
          ,
          <string-name>
            <surname>Openstreetmap</surname>
          </string-name>
          ,
          <year>2021</year>
          . URL: https://www.openstreetmap.org.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Dataset</surname>
          </string-name>
          , Dataset used for experiments,
          <year>2021</year>
          . URL: https://drive.google.com/file/d/ 1aLUL3QPxGxeUVmqds6HWeGnVnQ5O4Mxr/view?usp=sharing.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>in: Proceedings of the 3rd International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nesterov</surname>
          </string-name>
          ,
          <article-title>A method for unconstrained convex minimization problem with the rate of convergence o (1/2)</article-title>
          ,
          <source>Doklady AN USSR 269</source>
          (
          <year>1983</year>
          )
          <fpage>543</fpage>
          --
          <lpage>547</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <article-title>VMs, Description of virtual machines used in experiments, 2021</article-title>
          . URL: https://docs. microsoft.com/en-us/azure/virtual-machines/sizes-gpu.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Code</surname>
          </string-name>
          ,
          <article-title>Zipped repository of the code used in our experiments</article-title>
          ,
          <year>2021</year>
          . URL: https://drive. google.com/file/d/1FF6q8GTJljkYjKSNMYsL5neoXbOqPIcm/view?usp=sharing.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Models</surname>
          </string-name>
          ,
          <article-title>Models trained for the main experiment and all the out-of-sample simulation datasets</article-title>
          ,
          <year>2021</year>
          . URL: https://drive.google.com/file/d/1mPnFt1Y1wGLGE-ha2_
          <article-title>JiYsuJEebnfBYx/view?usp=sharing.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>