<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ConvLSTM Neural Network based on Hexagonal Inputs for Spatio-Temporal Forecasting of Trafic Velocities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francisco Bahamondes</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Billy Peralta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Orietta Nicolis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andres Bronfman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alvaro Soto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pontificia Universidad Católica de Chile, Departamento de Ciencias de Computación</institution>
          ,
          <addr-line>Santiago, 7820436</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Andres Bello, Facultad de Ingeniería</institution>
          ,
          <addr-line>Santiago, 7500971</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The spatial-temporal prediction of transit speeds is of great importance today as it allows for the anticipation and mitigation of vehicular congestion, thereby improving trafic eficiency. In machine learning, models such as ConvLSTM or Transformers enable reasonable predictions at the spatio-temporal level. However, these models typically assume a square grid configuration, which can limit the use of more convenient configurations in transportation, such as hexagonal grids. We propose a ConvLSTM neural network adapted to hexagonal grid sequences for transit speed prediction, incorporating a transformation of the hexagonal input to allow the use of standard spatial temporal architectures based on square grids. This work validates the proposed model through experiments comparing our approach with baseline methods using trafic data from freight transportation in the Metropolitan Region of Santiago, Chile. The results indicate that using hexagonal sequences improves the mean absolute error (MAE) in predicting freight trafic speeds by 2.7% compared to the base spatio-temporal ConvLSTM prediction model. For future work, we propose using larger databases and adapted transformers.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Spatio-temporal prediction</kwd>
        <kwd>Hexagonal inputs</kwd>
        <kwd>ConvLSTM</kwd>
        <kwd>Trafic velocities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>lyze complex and dynamic patterns in trafic, allowing
for more accurate speed predictions. In this problem,
The prediction of transit speeds emerges as a critical com- classical techniques such as Multiple Linear Regression
ponent in addressing road congestion, ofering a way to [4], ARIMA [5], Random Forests [4], Support Vector
Maanticipate and mitigate real-time setbacks [1], essential chines (SVM) [6], and MLP neural networks [7] have
for refining the distribution industry and the last mile. been applied. However, more modern models often
utiBy projecting transit speeds at diferent times and loca- lize deep learning techniques
tions, transportation companies can fine-tune the routes Conversely, deep learning (DL) models have also been
of their fleets, minimizing delays and cutting operational employed for diverse tasks like crowd mobility
prediccosts [2, 3]. This knowledge also enables drivers to make tion [8, 9, 10] or trafic prediction [ 11, 12, 13, 14]. In
better decisions regarding their itineraries, avoiding bot- trafic prediction task, some networks commonly used
tlenecks and ensuring more agile and efective deliveries. are Long Short-Term Memory (LSTM) Neural Networks
This has a tangible impact on customer satisfaction and and Gated Recurrent Unit (GRU) networks. These
modoverall supply chain eficiency. els are ideal for modeling sequential data, such as time</p>
      <p>In recent years, there has been a notable increase in the series, allowing for eficient capture of both short and
application of machine learning (ML) techniques to ad- long-term dependencies. Although current models are
dress trafic speed prediction. Thanks to the availability increasingly powerful, they naturally assume a square
of real-time data, such as GPS information from vehicles, grid, meaning the information is represented by matrices
sensor data, and online trafic, ML algorithms can ana- or tensors. However, in the context of transportation,
hexagonal grids ofer significant advantages over
tradiSTRL’24: Third International Workshop on Spatio-Temporal Reasoning tional square inputs, particularly in terms of processing
and Learning, 5 August 2024, Jeju, South Korea eficiency and accuracy in representing spatial patterns.
* Corresponding author. The hexagonal geometry allows for greater
connectiv† These authors contributed equally. ity and uniform coverage of the input space with fewer
b$illfy..bpaehraamltao@nduensasbc.hcoll(tBb.a@Peuraalntad)r;eosbrieeltltoa.e.ndiuco(lFi.s@Bauhnaamb.ocnldes); sampling points, reducing information distortion. This is
(O. Nicolis); abronfman@unab.cl (A. Bronfman); asoto@ing.puc.cl because each hexagonal point has six equidistant
neigh(A. Soto) bors, unlike the four or eight neighbors in a square grid,
0000-0002-0877-7063 (F. Bahamondes); 0000-0002-5457-2157 which facilitates better data interpolation and a more
(B. Peralta); 0000-0001-8046-6983 (O. Nicolis); 0000-0002-3122-3237 accurate representation of shapes and patterns. While
(A. Bronfman); 0000-0001-9378-397X (A. Soto)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License existing spatio-temporal prediction models can
approxCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) imate hexagonal inputs, this often results in a loss of
performance. three components; a CNN, an LSTM neural network, and</p>
      <p>In this work, we propose processing a sequence of a FFNN. This structure succeeds in predicting tracfi over
hexagons where each cell contains the trafic speed of short temporal horizons (5 minutes) as well as long-term
vehicles using a specialized library. The adaptation to (up to 4 hours) through multi-stage predictions using
a standard ConvLSTM network, designed to work with data provided by the DiDi Chuxing Gaia open data
initiasquare data, involves transforming the hexagonal data tive, and demonstrated superiority over cutting-edge ITS
into a compatible square structure. To achieve this, the algorithms, such as k-NN, SVM, or LSTM. The work of
operations of upsampling, padding, and shifting are ap- DeepSTCL [19] implements a ConvLSTM network within
plied in series to preserve the original neighborhood of a deep learning framework for travel demand prediction,
the hexagons in the square structure. Then, a custom standing out for its ability to capture spatial-temporal
kernel is applied to convolve the data and extract rele- dynamics and surpass traditional methods like AR and
vant features, which allows maintaining the hexagonal ARIMA. Its focus on analyzing proximity, period, and
structure. The use of hexagonal inputs allows greater ef- trend patterns results in more accurate predictions and
ifciency in terms of computation, according to [ 15], since better interpretation of complex travel demand data,
provthey require fewer parameters to achieve comparable ing its superiority with real data from DIDI in Chengdu.
coverage of the input domain. This can translate into Zhang et al. [20], introduced an LSTM-XGBoost model
faster training and lower resource consumption. for short-term trafic flow prediction, addressing
chalThe contributions of our article are the following: lenges such as periodicity and overfitting by combining</p>
      <p>LSTM with dropout layers and XGBoost to enhance
accu• We present a hexagonal grid-based representa- racy and generalization. Validated with trafic data from
tion for spatial-temporal data corresponding to Shenzhen, the model shows significant improvements in
vehicle speeds; accuracy and scalability, highlighting its contribution to
• We conduct comparative experiments that in- optimizing trafic prediction and eficient control. Duan
clude standard baseline machine learning models et al. [21], introduced an enhanced hybrid CNN-LSTM
along with the technique proposed in this work; model through a greedy algorithm for urban trafic flow
prediction using GPS data from taxis. This work
com• We make the source code of this work available bines spatial and temporal feature extraction to improve
to facilitate the replicability of experiments. prediction accuracy and eficiency. Validated with data
from Xi’an, the model achieves shorter training times and</p>
      <p>
        Section 2 outlines relevant prior work. Section 3 de- greater accuracy compared to previous methods, ofering
tails the proposed methodology. Section 4 presents and an efective solution to the complexity of urban trafic
discusses the results of our experiments. Lastly, Section data. Xu et al. [
        <xref ref-type="bibr" rid="ref1">22</xref>
        ], proposed a spatio-temporal deep
5 summarizes our main conclusions. learning framework, integrating ConvLSTM and Graph
Convolutional Network (GCN), for precise trafic speed
2. Related work prediction. By extracting temporal features with
ConvLSTM and spatial features with GCN, the framework
Spatial-temporal prediction often use a combination of significantly improves predictive performance against
recurrent and convolutional networks such as ConvL- baseline methods, demonstrating its eficacy in the
adSTM (Convolutional Long Short-Term Memory), which vanced analysis of large trafic data collected through
merges the spatial analysis capabilities of CNNs with the Internet of Things (IoT). Hu et al. [
        <xref ref-type="bibr" rid="ref2">23</xref>
        ] present the
the ability of LSTMs to capture temporal relationships. AB-ConvLSTM model, designed to accurately predict
Recently, Transformer neural models have been applied large-scale trafic speed in urban road networks. This
[16, 17]. A notable feature of these networks is their model combines the ConvLSTM network, an attention
ability to model long-range dependencies in sequential mechanism, and Bi-LSTM networks to extract
spatialdata. temporal and periodic features. The results show that
      </p>
      <p>
        In the literature, numerous works are focused on the AB-ConvLSTM consistently outperforms other models
spatial-temporal prediction of transit speeds, congestion, in predicting urban trafic speed, highlighting its ability
and transportation using deep neural networks. Lai et to capture historical significance and efectively extract
al. [18] used an improved ConvLSTM model (eConvL- daily and weekly periodic functions.
STM), which incorporates advanced linear features. A Regarding hexagonal models, they have typically been
Trafic Pattern Attention (TPA) block and a Squeeze-and- applied to spatial prediction tasks. Hexagdly [
        <xref ref-type="bibr" rid="ref3">24</xref>
        ]
faciliExcitation (SE) block are introduced to optimize the ac- tates the use of convolutional neural networks (CNNs) in
curacy in predicting trafic matrices, thus surpassing ex- this field without the need for data preprocessing. The
isting baseline models. Bogaerts et al. [14] presented main advantage of this approach lies in its adaptation to
Graph CNN-LSTM, a hybrid architecture composed of hexagonal grids through specific convolution and
poola georeferenced hexagonal grid from boundary
coordinates, where the number of hexagons depends on the H3
resolution parameter.
ing operations, overcoming the limitations of traditional
square convolution kernels.
      </p>
      <p>
        Previous works focus on the combination of diferent
techniques and architectures to improve accuracy and
generalization in trafic prediction considering square
inputs, while works considering hexagonal inputs propose
prediction at a spatial level. In this work, the
geometric and topological advantages of hexagonal inputs are
exploited [
        <xref ref-type="bibr" rid="ref4">25</xref>
        ]. These allow for better coverage and
connectivity in capturing the spatial characteristics of trafic,
resulting in a more eficient and accurate representation
of temporal and spatial dynamics.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Proposed method</title>
      <p>The general approach to processing sequences of data
grid sequences using spatio-temporal neural networks
assumes that the data is represented by square grids.</p>
      <p>
        However, it is not clear how to apply these models to Each hexagon is identified by a unique index that
endata represented by hexagonal grids. Particularly, the codes its position. When mapping these indices to a
neighborhood of a cell is diferent; while a hexagonal Cartesian coordinate system (, ) for visualization or
cell has six neighbors, a square cell has eight neighbors. computational purposes, hexagons sharing a common
However, the use of hexagonal grids in convolutional coordinate will form a line that traverses the grid in a
networks enhances prediction accuracy [
        <xref ref-type="bibr" rid="ref5">26</xref>
        ] due to the diagonal direction. This is due to the nature of
hexagreduced anisotropy of hexagonal filters [
        <xref ref-type="bibr" rid="ref6">27</xref>
        ]. Despite onal packing, where each hexagon touches six others
this, the reviewed spatio-temporal neural models do not in an arrangement that naturally forms diagonals when
consider this type of configuration. represented in a 2D coordinate system (see Fig. 2).
      </p>
      <p>In this work, we propose a ConvLSTM-based method
for spatial-temporal prediction utilizing hexagonal grids,
applied specifically to cargo vehicle speed data. This
method comprises three key steps outlined as follows:
(i) Initially, we transform the transit speed data onto
hexagonal grids represented in Cartesian coordinates.
(ii) Subsequently, we sequence the data in hexagonal
patterns while preserving the hexagonal constraint by
considering equivalent square grids. (iii) Lastly, we
employ a ConvLSTM network with a hexagonal constraint
(HexConvLSTM) to train on the preprocessed speed data.</p>
      <p>Now we will detail these steps. (a) Hexagonal Grid</p>
      <sec id="sec-2-1">
        <title>3.1. Cartesian Representation</title>
        <p>
          In this work, we first group the trafic speed data into
regular hexagons using a methodology that generates
a hexagonal grid. The implementation of this method
results in the generation of N regular hexagons, where N
is determined by a spatial resolution parameter. This
generation produces a hexagonal grid where each hexagon
contains the measurements that the area encompasses.
In Fig. 1 we show an example of a hexagonal mesh
considering 21 hexagons within the experimental region.
Fortunately, this hexagonal organization is typically
facilitated by specialized libraries; in our case, we used
the H3 library from Uber [
          <xref ref-type="bibr" rid="ref7">28</xref>
          ]. This library generates
(b) Cartesian Grid
        </p>
        <sec id="sec-2-1-1">
          <title>While in a square grid, a cell typically has eight direct</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Square Preprocessing</title>
        <p>Given that the data from hexagonal cells are represented
as ordered pairs (, ), the hexagonal grid can be
represented as a square grid, that is, in the form of matrices.</p>
        <p>However, in a square grid, a cell has 8 neighbors, while
hexagonal cells have 6 neighbors. Therefore, it is
necessary to prepare the data so that a convolution
operation, provided by ConvLSTM, respects the hexagonal
constraint.</p>
        <p>This pre-processing is performed through a sequence
of matrix operations involving upsampling, padding, and
shifting. This approach results in a representation where
it is feasible for a convolution to respect the hexagonal
arrangement through a kernel constraint of a ConvLSTM.
3.2.1. UpSampling
The first step in data preprocessing is UpSampling. The
goal of this operation is to increase the vertical resolution
of the matrix by duplicating each row, while keeping
the horizontal content unchanged. Assuming that the
original matrix  ×  and that the result of upsampling
is ′, the relationship between the elements of these
matrices can be expressed as:
′, = ⌊ 2 ⌋,</p>
        <p>, ∀ ∈ [1, 2], ∀ ∈ [1, ].</p>
        <p>Visually, if we consider  as the original matrix, then,
after applying the UpSampling process, ′ results as
follows:
′′, =
{︃′, , if 1 ≤  ≤ 2
0,
if 2 &lt;  ≤ 2 + 
,</p>
        <p>∀ ∈ [1, ],
This equation specifies how  rows of zeros are added
at the bottom of ′.</p>
        <p>Visually, we can see that while ′ is a 2 ×  matrix
resulting from the UpSampling process, the result of the
Padding, ′′, will be visualized with the last  rows
composed of zeros,
′1,2
′1,3</p>
        <p>...
′,2
′,2
0
...
0
· · ·
· · ·
. . .
· · ·
· · ·
· · ·
. . .
· · ·
′1, ⎤
′1, ⎥
. ⎥
.. ⎥
⎥
⎥
′, ⎥⎥ .
′, ⎥
0 ⎥⎥
. ⎥
. ⎥
. ⎦
0</p>
        <p>In this matrix ′′, the elements ′, represent the
values of ′, and the last  rows are zeros, creating a final
matrix of (2 + ) × . This adjustment in the padding
process ensures that the extended matrix has the
appropriate size for the Shifting operation.
3.2.3. Shifting
The final step in the preprocessing is the Shifting, which
shifts each column of the matrix upwards by a number
of positions equal to the column index. This procedure
introduces a shift that depends on the column position,
achieving the necessary configuration to apply the
hexagonal constraint kernel. For the matrix ′′, the resulting
matrix ′′′ is obtained as follows:
′′,′ = ′(′+) mod 2, ,
∀ ∈ [1, 2], ∀ ∈ [1, ],
neighbors (up, down, left, right, and the four diagonals), Specifically, this step adds  rows of zeros at the bottom
in a hexagonal grid, each cell is adjacent to six neigh- of ′, resulting in a new matrix ′′ with size (2 + ) ×
bors. Therefore, the hexagonal neighborhood structure . The transformation from ′ to ′′ can be described
significantly alters the spatial distances between cells. as follows:
The second step, Padding, adds additional rows to the
matrix to prepare the data for the Shifting process, which
requires a specific number of rows to operate correctly.
⎡ 1,1
⎢ 2,1
 = ⎢ .</p>
        <p>⎢⎣ ..</p>
        <p>· · ·
· · ·
. . .
,1 · · ·
1, ⎤
2, ⎥
. ⎥
. ⎥
. ⎦
,
′ = ⎢⎢ 2,1
⎢ .
⎢⎢ ..
⎡ 1,1
⎢ 1,1
⎢⎢ 2,1
⎢⎣,1
,1
· · ·
· · ·
· · ·
· · ·
. . .
· · ·
· · ·
1, ⎤
1, ⎥
2, ⎥⎥
2, ⎥⎥ .</p>
        <p>. ⎥
. ⎥
. ⎥</p>
        <p>⎥
, ⎦
,
⎡ ′1′,1</p>
        <p>.
⎢ ..</p>
        <p>⎢
′′ = ⎢⎢⎢′′,1
⎢ 0
⎢ .
⎢ .
⎣ .</p>
        <p>0
· · ·
· · ·
.</p>
        <p>This pattern demonstrates how the elements of each
column shift upwards, and those exceeding the upper
limit of the matrix reappear at the bottom. The outcome
of this final step enables the use of a kernel constraint
in any ConvLSTM neural network implementation,
ensuring strict adherence to the original hexagonal grid
neighborhood.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. HexConvLSTM Architecture</title>
        <p>Assuming that the data were preprocessed into a square
grid according to 3.2, we propose using a ConvLSTM
neural network with a kernel constraint. Next, we will
describe the kernel constraint mask that enables
adherence to the hexagonal arrangement in the grid, followed
by the neural network used.
3.3.1. Kernel constraint
The kernel constraint is defined by a binary mask given
by:
⎡ 0
⎢ 1
⎢⎢ 0
⎢⎣ 1
0</p>
        <sec id="sec-2-3-1">
          <title>In this matrix, the positions where there is a 1 indicate</title>
          <p>the cells that will be active, allowing convolution at those
specific positions; otherwise, the cells are not processed.</p>
          <p>
            The positions are represented in the left matrix, P is the
target cell and Ne(P) is the neighbor of target P. In this
way, the 6-neighborhood of a hexagonal cell is
recovered in the square grid when using standard convolution
operations.
3.3.2. The HexConvLSTM network
By introducing the kernel constraint, mentioned in the
previous subsection 3.3.1, into a standard ConvLSTM-2D
layer, we can recover the hexagonal neighborhood in
a matrix tensor. We refer to this network as
HexConvLSTM, where a diagram of it can be seen in Fig. 3. For this work, we have limited our data to a specific
subThe variables and parameters of the ConvLSTM network region (see Fig. 4), considering a particular area with
are typically well-known and are detailed in [
            <xref ref-type="bibr" rid="ref8">29</xref>
            ]. The the highest data density in the city of Santiago de Chile,
diference from a standard ConvLSTM lies in the
application of the kernel constraint, which allows the network
to consider only the neighbors provided by the original
hexagonal configuration.
          </p>
          <p>In a nutshell, our proposal entails representing a
hexagonal grid in a Cartesian representation (see Section 3.1),
preprocessing to preserve hexagonal neighborhood (see
Section, 3.2), and ultimately applying a ConvLSTM
neural network (see Section 3.3). Subsequent experiments
aim to evaluate the eficacy of our approach on a real
dataset.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <p>4.1. Data
The database used in this work corresponds to data
extracted from the Transportation and Logistics Center of
Andrés Bello University, a center dedicated to
researching routing problems, last-mile, logistics optimization,
among others. The raw data includes 22 million GPS
measurements of last-mile cargo vehicle speeds in
Santiago, Chile, Metropolitan Region. This data contains the
following information:
the capital of Chile. A high data density is considered
to minimize missing data, since cargo vehicles tend to
prefer certain streets. The boundaries of the chosen area
are between latitudes -33.4331 and -33.4524, and
longitudes -70.6253 and -70.6655, forming a rectangle that
includes the Santiago Centro commune and parts of its
neighboring communes.</p>
      <p>
        In terms of experimental design, the HoldOut method
for time series [
        <xref ref-type="bibr" rid="ref9">30</xref>
        ] was followed, where data were
sequentially divided into training (70%), validation (15%),
and testing (15%) sets, with MinMax scaling applied
to each set. All methods were evaluated considering
mean absolute error (MAE), mean squared error (MSE),
root mean squared error (RMSE), and coeficient of
determination (R2). Furthermore, to ensure
replicability, the demo source code for this work is available
at: https://github.com/Francisco0178/HexConvLSTM. At
this point, we state that our method is generic, and in
future work we will test it on public datasets [
        <xref ref-type="bibr" rid="ref10">31</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>4.2. Data Imputation</title>
        <p>Our dataset consists of a time series with 1,884 temporal
steps, each representing one hour between 8:00 a.m. and
7:00 p.m. over 157 days. Using a fixed grid of 110
hexagonal cells, each time step contains average trafic speed
information for each hexagonal cell. These 110 cells are
derived from the hexagonal preprocessing described in
Section 3.1 using the H3 library at a resolution of 9.</p>
        <p>However, it is worth noting that the data originates
from geolocated sensor data of cargo vehicles. Upon
analyzing this data, it becomes apparent that these vehicles
tend to favor certain routes and schedules, resulting in
some regions being underrepresented in the data. For
instance, at 8 am, the few vehicles that do transit may
predominantly utilize main roads, leaving certain areas
unmeasured. Consequently, in the utilized
representation, there are hexagonal cells with missing measurement
information, with the percentage of missing data
depending on the H3 resolution parameter.</p>
        <p>In our implementation using the H3 library, we opted
Figure 4: The upper image corresponds to the city of Santiago, for an H3 resolution of 9, which generates 110 hexagons.
while the lower image corresponds to the study area. When represented in a square format, it yields 15x15
matrices (225 cells) with 54% missing data. Although this</p>
        <p>
          The measurements for this subregion span from Jan- is a high percentage of missing values, using the next H3
uary 4th to July 25th, 2020. All measurements recording resolution, 10, results in grids of 5x6, which are too small
a speed of zero were removed, indicating that the vehicle for the use of convolutional models; however, using an
was stopped or out of operation. Additionally, records H3 resolution of 8 results in 500 hexagons leading to 90%
outside the time range of 8:00 a.m. to 7:00 p.m. were missing data, which complicates the training of neural
excluded, as this interval has the highest concentration models
of measurements. Measurements outside this range were In this study, various imputation methods were
experiexcluded due to their low frequency. Similarly, measure- mented with, and we found experimentally that the PPCA
ments from Sundays were discarded as they also showed method performs better than Gaussian-based or MICE
similarly low frequency. It should be noted that there imputations. It is worth noting that in [
          <xref ref-type="bibr" rid="ref11">32</xref>
          ], PPCA also
were no measurements during the month of April during emerges as a competitive imputation model for trafic
the measurement period. prediction tasks.
        </p>
        <p>Regarding temporality, the measurements will be
treated as hourly time series, which can be divided into 4.3. Experimental Results
157 days, with each day having 12 hours of measurement
(from 8:00 a.m. to 7:00 p.m.), resulting in a total of 1,884 Comparative experiments were conducted among an
time series. Each of these intervals will be treated as a MLP network, GRU, LSTM, ConvLSTM, and our
Hexgrid with values imputed according to Section 4.2. ConvLSTM network. The MLP network comprises two
layers with 256 and 128 neurons, while the LSTM and
GRU networks consider 128 and 50 recurrent units,
respectively. For the ConvLSTM and HexConvLSTM
networks, 128 ConvLSTM units are employed. In all neural
networks, Mean Squared Error (MSE) was utilized as the
loss function.</p>
        <p>In the first experiment, we trained the networks using
data imputed by the three methods described in Section
4.2. The second experiment involved training the
models with data imputed using the method that yielded the
best results, but with a reshaping of the time series. This
reshaping involved grouping the averages of two
consecutive hourly periods, which resulted in halving the total
dimension of our time series.</p>
        <p>Table 3 presents the results of each tested method.</p>
        <p>
          The HexConvLSTM network has once again achieved
4.3.1. One-Hour Granularity Experiment the best values across all metrics, surpassing ConvLSTM
with relative improvements of 2.7%, 1.3%, 0.7%, and 2.8%
Table 2 shows that the proposed HexConvLSTM network in MAE, MSE, RMSE, and R2, respectively. This reafirms
achieved the best values across all metrics, surpassing that the hexagonal constraint efectively captures the
ConvLSTM with relative improvements of 1.3%, 1.3%, dynamics between the cells. Moreover, the results are
0.7%, and 0.9% in MAE, MSE, RMSE, and R2 respectively. globally better than those from the one-hour
granularThis indicates that the hexagonal constraint better cap- ity due to less variability since two-hour averages are
tures the dynamics between the cells, leading to improved considered, which appear to be more predictable for all
performance of a ConvLSTM network. However, when models in general. In this experiment, HexConvLSTM
comparing all models, HexConvLSTM yielded the best further increases its advantage over the other models.
results, outperforming its closest competitor, MLP. We
believe this model performs well due to the low
resolution of the 15x15 grid. The competitiveness of MLP on 5. Conclusions
small images, such as on the MNIST dataset, is shown in
[
          <xref ref-type="bibr" rid="ref12">33</xref>
          ]. However, in the context of transportation in large
cities we need to increase the size of the grids to improve
the spatial resolution of prediction.
This work demonstrates that the proposed
HexConvLSTM model outperforms ConvLSTM across all metrics,
indicating superior capture of transit dynamics. It
consistently shows an advantage in all metrics, and this
advantage is expected to increase as larger grids and longer
temporal intervals are used in the sequence of input grids.
        </p>
        <p>The temporal grouping experiment shed light on
another critical aspect: eficiency in data representation can
be as crucial as the quality of the data itself. In this
context, HexConvLSTM not only handled the imputed data
well but also benefited significantly from the grouping,
enhancing its predictive capacity. This result underscores
how HexConvLSTM can extract value from adjustments
in data preparation, a considerable advantage for any
practical application.</p>
        <p>As future work, we plan to use databases with more
records, include larger study regions, and incorporate
self-attention layers to improve the model’s performance.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>B. Peralta and A. Soto appreciate the support of the
National Center for Artificial Intelligence CENIA FB210017,
Basal ANID.
4.3.2. Two-Hour Granularity Experiment
Another experiment involved aggregating our data into
the average of 2 consecutive time steps, resulting in
sequences that still contain 12 steps, but now each step
represents aggregated information from 2 consecutive
days (6 steps per day), instead of one day per step. This
grouping approach efectively reduces the temporal
resolution of our data but enriches each time step with a
more integrated view of temporal features.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Khosravi</surname>
          </string-name>
          ,
          <article-title>Spatiotemporal deep learning framework for trafic speed forecasting in iot</article-title>
          ,
          <source>IEEE Internet of Things Magazine</source>
          <volume>3</volume>
          (
          <year>2020</year>
          )
          <fpage>66</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Attention-based convlstm and bi-lstm networks for large-scale trafic speed prediction</article-title>
          ,
          <source>The Journal of Supercomputing</source>
          <volume>78</volume>
          (
          <year>2022</year>
          )
          <fpage>12686</fpage>
          -
          <lpage>12709</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Steppa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Holch</surname>
          </string-name>
          ,
          <article-title>Hexagdly-processing hexagonally sampled data with cnns in pytorch</article-title>
          ,
          <source>SoftwareX</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>193</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fadaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rashno</surname>
          </string-name>
          ,
          <article-title>A framework for hexagonal image processing using hexagonal pixel-perfect approximations in subpixel resolution</article-title>
          ,
          <source>IEEE Transactions on image processing 30</source>
          (
          <year>2021</year>
          )
          <fpage>4555</fpage>
          -
          <lpage>4570</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Korn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Hexcnn:
          <article-title>A framework for native hexagonal convolutional neural networks</article-title>
          ,
          <source>in: 2020 IEEE International Conference on Data Mining (ICDM)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1424</fpage>
          -
          <lpage>1429</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hoogeboom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , Hexaconv, arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>02108</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [28]
          <string-name>
            <surname>I. Brodsky</surname>
          </string-name>
          ,
          <article-title>H3: Uber's hexagonal hierarchical spatial index</article-title>
          , https://eng.uber.com/h3/,
          <year>2018</year>
          .
          <article-title>Available from Uber Engineering website</article-title>
          .
          <source>Accessed: 22 June</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , D.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yeung</surname>
          </string-name>
          , W.-K. Wong, W.-c. Woo,
          <article-title>Convolutional lstm network: A machine learning approach for precipitation nowcasting</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cerqueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Torgo</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mozetič</surname>
          </string-name>
          ,
          <article-title>Evaluating time series forecasting models: An empirical study on performance estimation methods</article-title>
          ,
          <source>Machine Learning</source>
          <volume>109</volume>
          (
          <year>2020</year>
          )
          <fpage>1997</fpage>
          -
          <lpage>2028</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shibasaki</surname>
          </string-name>
          ,
          <article-title>Dl-traf: Survey and benchmark of deep learning models for urban trafic prediction</article-title>
          ,
          <source>in: Proceedings of the 30th ACM international conference on information &amp; knowledge management</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4515</fpage>
          -
          <lpage>4525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Trafic missing data imputation: a selective overview of temporal theories and algorithms</article-title>
          ,
          <source>Mathematics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>2544</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baldominos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Isasi</surname>
          </string-name>
          ,
          <article-title>A survey of handwritten character recognition with mnist and emnist</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>3169</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>