<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MOBI-AID: A Big Data Platform for Real-Time Analysis of On Board Unit Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karl Determe Brussels Mobility Brussels</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Belgium</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arnau Dillen Machine Learning Group, Université Libre de Bruxelles Brussels</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Gianluca Bontempi Machine Learning Group, Université Libre de Bruxelles Brussels</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Giovanni Buroni Machine Learning Group, Université Libre de Bruxelles Brussels</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yann-Aël Le Borgne Machine Learning Group, Université Libre de Bruxelles Brussels</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Every day large amounts of goods are transported by heavygoods vehicles over the road network. Being able to monitor and analyse heavy-goods vehicle trafic is essential to define policies able to minimize the impact of negative efects. However, this requires dealing with large amounts of data and often a dense road network, especially in an urban setting. This paper introduces a platform that makes use of state-of-the-art big data technologies to process data pertaining to the positions and properties of heavy-goods vehicles. This platform aims to provide policy-makers and other stakeholders with the tools that allow large-scale analysis of heavy-goods vehicle data in a near realtime fashion. Additionally, the platform allows for forecasting of future trafic conditions based on historical data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Road freight transport is an essential aspect of any country’s
infrastructure policy due to its economic, environmental and
social impact. Among other issues, freight vehicles are
responsible for a large part of the congestion on urban road networks
(economic impact), pollutant emissions such as carbon dioxide
(environmental impact) and physical consequences of pollutant
emissions on public health (social impact) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Urban planners and policy-makers therefore demand
Intelligent Transportation Systems (ITS) which are able to foresee
the mobility behavior and support the definition of appropriate
policies [
        <xref ref-type="bibr" rid="ref26 ref29">26, 29</xref>
        ]. Tools such as accurate trafic forecasting models
[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], advanced mobility indicators of freight transport [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and
more general mobility models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] can assist policy makers in
making appropriate decisions.
      </p>
      <p>
        Trafic on a road network exhibits features which are
common to most complex systems: self-organization, emergence of
transient space-time patterns based on local and global feedback
loops, which makes analysis of these types of data dificult. Due
to this, few studies [
        <xref ref-type="bibr" rid="ref18 ref29 ref3 ref31">3, 18, 29, 31</xref>
        ] address a complete
transportation network including both freeways and urban contexts or
limit themselves to ofline analysis [
        <xref ref-type="bibr" rid="ref15 ref25">15, 25</xref>
        ]. One of the main
reasons is the scarce availability of data gathered from point
detectors or interval detectors and the lack of methods able to tackle
the trafic prediction problem at a larger scale [
        <xref ref-type="bibr" rid="ref2 ref29">2, 29</xref>
        ]. However,
thanks to the more ubiquitous availability of new information
and communication technologies, more trafic data, especially
moving sensors data, are collected and made openly available by
both public and private companies, allowing the development of
data-intensive approaches for trafic analysis.
      </p>
      <p>In Belgium, trafic data is gathered for heavy-goods vehicles
(HGV) by Bruxelles Mobilité1, the public administration
responsible for equipment and infrastructure related to mobility issues
in the Brussels Capital Region (BCR). They continuously receive
data on HGV positions, which is normally used to charge HGVs
for kilometers driven on toll roads in Belgium. Every day, an
average of 19 Gigabytes of data are therefore accumulated and
need to be processed in a timely manner, in order to monitor
HGV trafic in Brussels.</p>
      <p>
        Bruxelles Mobilité currently stores this data in a centralized
PostgreSQL [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] database which is set up to handle geographical
data through the PostGIS [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] extension (see figure 1). However,
this solution is unable to cope with the massive amounts of data
that are ingested on a daily basis. While it would be possible
to optimize queries and create database indices to minimize the
time it takes to retrieve a solution to a query, the main issue with
a classical relational database system lies in the constant updates
and additions of rows. Even the most performant database on the
fastest hardware will result in a bottleneck. Additionally, reading
and writing these amounts of data from and to a regular file
system is too slow for the amounts of data that are being dealt
with.
1https://mobilite-mobiliteit.brussels/en
      </p>
      <p>The Machine Learning Group (MLG) of the Université Libre
de Bruxelles (ULB) collaborates with Bruxelles Mobilité to design
a big data architecture able to provide near real-time processing
and querying of the incoming data. For example, a query that
retrieves the number of trucks on each street is required to make
forecasts on future trafic conditions.</p>
      <p>
        An initial version of the architecture was implemented in
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and has been collecting data on the MLG cluster for some
time now. We were able to successfully collect and process large
amounts of data thanks to the joint use of an Apache Hadoop
cluster [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and Apache Spark [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. However, big data
technologies are evolving fast and an appropriate interface to visualize
and analyze trafic related data is necessary. Data aggregation is
necessary to get a high level view and loading large amounts of
data into the interface client is slow and impedes responsiveness
of the interface. These are important aspects to take into account
when deciding what visualizations the interface should provide
and which data should be loaded to the client.
      </p>
      <p>
        The aim of this research is to be able to perform
networkscale analysis and forecasting in near real-time. The presented
architecture allows to make real-time forecasts based on
incoming data using both well-established [
        <xref ref-type="bibr" rid="ref29 ref30">29, 30</xref>
        ] and state-of-the-art
[
        <xref ref-type="bibr" rid="ref18 ref2">2, 18</xref>
        ] methods on a network-wide scale. It also enables
performing analyses, such as identifying important points of congestion
caused by HGV trafic in changing conditions for example, which
were previously computed ofline, in real-time. Next to the
previously mentioned models, there is a fair amount of related work
that proposes possible forecasting models which could be
candidates for a real-time forecasting model on road networks and
their diferent sections [
        <xref ref-type="bibr" rid="ref13 ref23 ref28">13, 23, 28</xref>
        ]. A large corpus of literature
discusses this issue.
      </p>
      <p>
        The main contributions of this paper are twofold. In a first
place it introduces an extension to the big data architecture that
was implemented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which enables near real-time
processing of the incoming data. Secondly, it proposes a design for a
dashboard that enables analyses and visualization of data, which
is implemented as a web interface. Together, these make up a
platform that provides the tools that are necessary to Bruxelles
Mobilité to monitor the trafic of HGVs in Brussels and provide
insights that should be useful in establishing future policies
related to transportation of goods within the BCR. The platform
was named the MOBIlity Advanced Indicators Dashboard (MOBI
AID), after the project that supports this research. Additionally,
the work done in this research could also serve as an example for
other cities and potentially whole countries to deploy their own
platforms to assist in decision making on policies with regards
to road freight transport.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODS AND IMPLEMENTATION</title>
      <p>The data that are gathered concern all HGVs that are currently
present in the Belgian territory. At this time we are only
interested in HGVs that are present in the Brussels Capital Region,
which still concerns thousands of HGVs on a daily basis. To get
useful insights from this data, a platform is necessary that can
handle such large amounts of data and present forecasts or the
results of analyses in a meaningful way. For this purpose, next to
the data, two essential components were identified to implement
the envisioned platform.</p>
      <p>The remainder of this section is structured as follows. Firstly,
we will describe the gathered data. Secondly, we discuss the
architecture that allows processing and storage of such large
amounts of data. Finally, we present a prototype web interface
that would be used by policy makers and data scientists to get
insights on trafic conditions, from the processed data. This would
assist policy makers in making informed decisions regarding
urban planning with relation to road infrastructure and freight
transport.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Viapass and On Board Unit (OBU) Data</title>
      <p>As of April first of the year 2016, heavy-goods vehicles having a
Maximum Authorized Mass (MAM) exceeding 3.5 tonnes must
pay a kilometer charge for driving on certain paying toll roads
in Belgium. Any vehicle that is not exempt from the toll must
have an On Board Unit (OBU) installed. The public organization
in charge of supervising the kilometer charge is called Viapass2.
With the aid of GPS/GNSS satellite technology and mobile data,
the OBU records the distance that a HGV travels on Belgian public
roads. Mobile wireless technology is used to send the number
of kilometers charged to the Viapass data center, after which an
invoice is issued to the owner of the vehicle.</p>
      <p>Because of their evident value as a mobility indicator, the OBU
data are also made available to several mobility agencies,
including Bruxelles Mobilité which uses this data to analyze freight
trafic in the Brussels Capital Region (BCR). The BCR is a
separate region from the Flanders region, where it is geographically
located, and consists of 19 administrative districts named
communes. These districts will be referred as such for the remainder
of this article. The models and analyses used in this paper will
use OBU data from HGVs within the BCR and its communes.</p>
      <p>
        On average more than nine thousand HGVs are recorded every
working day in the larger Brussels Metropolitan Area [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Each
OBU device sends an update to the server approximately every
30 seconds. An OBU record contains an anonymous identifier,
which is reset every day at 4 a.m., the timestamp at which the
position was recorded, the GPS coordinates (latitude, longitude),
the speed (km/h) and the direction (degrees). Additionally, the
data includes vehicle characteristics such as the weight category
(MAM), country code and European emission standards
classification of the engine (EURO value). This results in an average of
19GB of data incoming on a daily basis and several terabytes of
data being generated every year.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Design of The Big Data Architecture</title>
      <p>
        Handling such large amounts of data requires an architecture that
can process the incoming data fast enough and store processed
data in an eficient manner. A well-known architecture that meets
these requirements is the Lambda architecture [
        <xref ref-type="bibr" rid="ref19 ref20 ref27">19, 20, 27</xref>
        ] which
has proven itself in several settings [
        <xref ref-type="bibr" rid="ref10 ref16">10, 16</xref>
        ] and is used in
practice by Twitter among others [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. An overview of our current
implementation of the architecture can be found in figure 2.
      </p>
      <p>With this architecture, three separate layers can be
distinguished, which each handle diferent aspects of the platform. The
speed layer takes care of processing incoming data in a timely
manner and send the processed data to the serving layer for
visualization and analysis. This layer handles the real-time aspect
of the platform. The batch layer stores immutable data (i.e.
observations) and processes it for later user queries on historical
data. The serving layer consists of multiple views that are each
used to fulfill a specific type of user queries. For example, data
that are stored in a specific format which is used for a specific
visualization, or predetermined queries that retrieve data that
2https://www.viapass.be/
are required for a specific analysis. This layer can also merge the
information that comes from both speed and batch layers, such
as discrepancies between the real-time trafic conditions and the
typical case for example. In our current implementation there
are two views available. The real-time view provides data that
comes from the speed layer directly. The historical view uses the
data from the batch layer to query for events and states that have
been observed in the past.</p>
      <p>
        The initial implementation of this architecture was deployed
on an Apache Hadoop [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] cluster, which is an open-source
framework for distributed computing that is widely used for big data
processing [
        <xref ref-type="bibr" rid="ref24 ref27 ref32 ref7">7, 24, 27, 32</xref>
        ]. The data are collected with a Python
script that queries the Viapass servers for new data at a fixed time
interval, which is currently set to two minutes. The script loads
the data in a GeoPandas [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] DataFrame (a data structure with
named columns and index-based rows), which is an extension
of the well-known Pandas library for the Python programming
language, to support geometric data types and functions. The
DataFrame contains all observations that were collected by
Viapass since the last data request.
      </p>
      <p>Observations consist of a HGV’s current position as a
geometry point, which is represented by a given latitude and longitude,
together with the unique ID that was assigned to the HGV for
that day. Additionally, an observation contains a timestamp of
when it was recorded by the OBU and the HGV’s characteristics,
which were described in section 2.1. Observations are augmented
with the current date and time to indicate when the observation
was received by our servers. This is done because there is no
guarantee that the observations within the retrieved batch will
all be for the current day, as it is not uncommon to have
observations from previous days come in. As it can not be known when
all observations for a day have been received, the system needs
to take this into account.</p>
      <p>The observations that were retrieved by the script are
consequently split by the day on which the observation was recorded
and then saved to CSV files on the local file system. The files
are stored in a folder that corresponds to the day on which the
observations were recorded. These CSV files are used to run
simulations of the Lambda architecture by reading batches of data
that represent incoming data from Viapass and sending them to
the appropriate layers. In real-world scenarios, the incoming data
would be sent directly to the appropriate layers of the Lambda
architecture.</p>
      <p>
        For the currently deployed implementation of the batch layer,
we aggregate the CSV files per day and store them on Hadoop
Distributed File System (HDFS) in Parquet format. HDFS
allows distributed storage with replication and improved read and
write speeds compared to regular file systems. The Parquet file
format is a column-oriented format that provides eficient data
compression and fast query access. To process the raw CSV files,
Apache Spark [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is used to deduplicate the observations and
store them in HDFS as Parquet files. HDFS takes care of
distributing file data over the diferent nodes of the cluster. With this
approach, these operations can be processed in parallel and
distributed over multiple compute nodes thanks to the integration
of Hadoop and Spark. Using Spark we can eficiently run SQL
queries and advanced analytics on the data by parallelizing a
large part of the computations. An overview of this process is
shown in figure 3.
      </p>
      <p>In experiments with an alternative implementation of the
batch layer the CSV data is read into a PostGIS database that
stores the daily route of a HGV with a given ID. The route is
stored as a LineString object (i.e. a sequence of points)
constructed from all available observations for a given HGV ID on
a given day. In the same database information on Brussels
communes is stored, both geographical (e.g. commune boundaries)
and non-geographical (e.g. name, population, etc.). Using the
geographical operations that are provided by PostGIS, information
such as the number of HGVs in a given commune at a given time
can eficiently be queried. This alternative batch layer
implementation was created, because the current approach lacks data types
and functions that are optimized for operations with regards to
space and time. Ideally we would like to use both approaches in
conjunction, for example by storing raw observations in Parquet
format and aggregate these observations over a day to form the
route of a truck over that day, to take advantage of the strengths
of both approaches.</p>
      <p>
        However, while PostGIS introduces the concept of space with
geographic data types and functions, it lacks a concept of both
space and time taken together without having to introduce
additional complexity. PostGIS is not optimized for queries that
involve both space and time dimensions taken together. This
means that while the sequence of HGV positions can be stored
for a certain day, the associated time at which the HGV was at
that position can not be stored without introducing additional
ifelds or dimensions and having to make certain assumptions
about the data. This results in a loss of speed and data eficiency,
which is one of the essential aspects of this platform. For this
reason, we are currently investigating a further extension of
PostGIS that introduces data types that introduce the concept
of a position at a certain time, which is called MobilityDB [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ].
This would allow us to perform the necessary queries without
being concerned with the underlying representation of the data
Application
      </p>
      <p>State
(a) State updates according to incoming data stream.</p>
      <p>Batch N
Batch K</p>
      <p>Measurement Time: 04:00:00
(b) Transition from the 3 a.m. hour-of-the-day window to the
4 a.m. window when data comes in that was sampled at 4 a.m..
and optimization of the geographic functions. We are currently
in the process of experimenting with the mentioned alternatives
to identify the most appropriate approach for the batch layer.</p>
      <p>
        The speed layer of our Lambda architecture implementation
uses the Apache Kafka [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] streaming platform to store
incoming data from queries to Viapass as a continuous stream of data.
For the purpose of initial simulations, a Python script reads a
batch of observations from the stored CSV files into a GeoPandas
DataFrame. As a preprocessing step, a diferent DataFrame, which
was loaded in memory beforehand, contains the geographic
information of a set of Brussels street segments. We used a subset
of Brussels streets for testing, however, in practice this would
contain all streets in Brussels. By performing a join of the two
DataFrames with the within geographic function provided by
GeoPandas, we obtain a new DataFrame where every
observation also contains the internal ID of the street segment the HGV
was on at that time. These data are sent to Kafka for processing
in the next step of the streaming pipeline.
      </p>
      <p>At the receiving end of the data stream, the streaming facilities
that are provided by Spark are used to process the data, which
can directly be integrated with a Kafka stream. Incoming data is
processed accordingly and used to update the current state of all
street segments that are being kept track of. This approach, which
is referred to as stateful streaming, is illustrated in figure 4a. The
state of a street segment is represented by the average number of
HGVs and the average velocity of passing HGVs for every
hourof-the-day of the current day. For every new day at midnight,
the state for each street segment is re-initialized to zero values
for all properties. Values are subsequently updated continuously
with a running mean for the current hour-of-the-day. Values for
past hours-of-the-day will contain the mean observed statistics
for that day and future values will be zero until the current time
falls within the window for that hour-of-the-day. This process is
illustrated by figure 4b.</p>
      <p>In addition to keeping track of the observed values, forecasts
are also made for future hours-of-the-day. Currently, predictions
are made using a type of model that is referred to as a persistence
model, more specifically, a sliding window persistence model.
With this type of model a forecast is based directly on previously
observed values for the same day-of-the-week and hour-of-day.
In this implementation, the data is divided in one week seasons,
meaning that predictions look at the data for the whole week
rather than at the currently observed values, to make forecasts.
As an example, if the data is hourly and the forecasting target is
9 a.m. on Monday, then given a window size of 1, the observation
of last Monday at 9 a.m. will be returned as the predicted forecast.
A window of size 2 means returning the average of the
observations of the last two Mondays at the same hour and so on for
larger window sizes. However, while simple and explainable, this
approach is rather naive, as it does not take the current trafic
conditions or information that is known in advance, such as a
special event that is planned for example, into account. More
advanced Machine Learning methods could incorporate this type
of additional information for improved forecasting.</p>
      <p>
        The final results are written to a JSON file, which is formatted
according to the GeoJSON [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] specification. In this format, every
street segment is described by a LineString instance that
corresponds to the path of the street segments. In addition to this,
each street segment is annotated with HGV counts and average
velocities for each hour-of-the-day as properties. The outputted
ifle serves as the real-time view for the considered street
segments and can be read by the dashboard for display on a map, or
to perform further analysis using the data, such as identifying
the busiest streets at the current time for example.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Implementation of The MOBI-AID</title>
    </sec>
    <sec id="sec-6">
      <title>Dashboard</title>
      <p>
        To provide an interface that would allow stakeholders to monitor
the current trafic situation for HGVs in Brussels or perform
historical analyses for future planning, a dashboard interface was
implemented. A web interface provides this dashboard and was
implemented with the Django [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] web framework, additionally
making use of the first-party GeoDjango extension. Using this
extension provides a direct integration with databases such as
PostGIS and other useful geographical tools. These technologies
where chosen for their flexibility, maturity and due to the fact
that they required minimal additional learning, given our
computer science backgrounds. The fact that these components are
also very low level allows us to easily experiment with diferent
alternative approaches.
      </p>
      <p>The web interface is comprised of three pages: Home, Dashboard
and About. The Home page provides an overview of the available
features and displays a map that shows real-time HGV counts
for the diferent communes that compose the Brussels Capital
Region. Hovering over a specific commune will show the total
number of HGVs that have last been observed in this commune.
The HGV counts per commune are also shown in a table beside
the map, where they are also divided by weight category. Figure
5 shows a prototype implementation for the home page with the
user hovering over the Brussels City commune. The About page
provides more detailed information on the web interface and
contains the documentation on the dashboard. It also mentions
the sources of our funding and the project supporters.</p>
      <p>The Dashboard page provides the core functionality of the
web application. This page consists of several tabs which provide
a certain type of visualizations or allows for specific analyses to be
performed. In it’s current implementation, the dashboard consists
of the following tabs: Real-time, Maps, Charts, Analytics and
Predictions.</p>
      <p>The Real-time tab is composed of several panels that display
diferent types of real-time information, which are retrieved from
the Lambda architecture’s real-time view. In this tab, users can
select the type of information they want to see, which will then
be displayed on the map. A table next to the map displays a user
selected overview of the information that is displayed on the map.
For example, the top ten most busy streets can be displayed in this
table. Figure 6 shows the current prototype for the Real-time
tab.</p>
      <p>Note that in this figure the time-window for collecting
statistics is 15 minutes as opposed to the one hour window that is used
for the state of a street. This window corresponds to the interval
between consequent updates of the state rather than the
hourof-day window that is being updated in the state. Additionally
note that streets in the table are identified by ID’s. In practice we
would use street names in the final implementation.</p>
      <p>The Maps tab contains a large map that shows historical data
about the observed HGV trafic as selected by the user. We
distinguish two distinct ways to look at historical data in this situation.
The user can select to either look at the data at a specific time on
a specific date, or they can choose to look at data that is typical
for a certain hour-of-the-day on a certain day-of-the-week. The
user can also select at which level of aggregation they want to see
information displayed on the map. The currently provided levels
of aggregation are commune level, street level and at the level
of individual HGVs. Individual HGVs can not be shown when
looking at the typical trafic situation, as concrete HGV positions
evidently vary with time. However, in this case clusters would be
shown at locations where HGVs are often present at the chosen
hour-of-day and day-of-the-week. Figure 7 shows the
work-inprogress Maps tab, without the website header, footer and the
tab-selection menu. Note that the selection controls should be
separated based on the previously selected type of visualization.
These controls would also be shown on the map rather than
above, as is currently the case.</p>
    </sec>
    <sec id="sec-7">
      <title>EVALUATION OF THE INITIAL</title>
    </sec>
    <sec id="sec-8">
      <title>PLATFORM</title>
      <p>For the MOBI-AID dashboard to provide an optimal user-experience
and be a useful contribution to the field of big mobility data, two
main aspects are of particular importance. These essential
features are adequate performance of the real-time data processing
pipeline and the usability of the web interface. To evaluate
performance, scalability tests were performed with a simulated stream
that is read from the data which is currently being collected from
Bruxelles Mobilité. The user interface was evaluated through user
testing and feedback.
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Experimental setting</title>
      <p>
        Scalability testing was already performed with a previous version
of the architecture in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These experiments were performed
on the Hadoop big data cluster of the MLG. This cluster is made
up of 10 slave nodes, each with 24 CPU cores, managed by a
master node which is the point of access for users and handles
user interaction (interactive node). The resource manager Yarn,
which is an integral part of the Hadoop ecosystem, allocated 150
cores and 805GB RAM for the purpose of these tests.
      </p>
      <p>Preliminary experiments with the new real-time architecture
were run on a local machine with a 2.3 GHz Intel Core i5 CPU
with 4 cores and 16 GB of RAM. This hardware setup is far from
the processing power that is available on the cluster and will
have much slower IO due to the absence of Hadoop. However, it
should give an initial insight of potential real-time capabilities
of the implemented pipeline. Note that the code that is used in
these simulations has not yet been optimized, as implementing
the architecture was the priority in this phase. There are also
some overheads introduced by the simulation environment, such
as running docker containers and local applications from the
testing machine sharing CPU cycles.</p>
      <p>The implemented simulation uses previously collected data
that was stored in CSV files. These files contain collected
observations for three days, being the 23d, 24th and 25th of September
of the year 2018. As the simulation was performed on limited
hardware and accelerates the ingestion of data compared to the
real situation, these files were filtered beforehand to only contain
observations concerning three predetermined streets. New data
is sampled from these files to simulate incoming data over one
hour windows. This is a much larger sampling rate than in the
real case, as we want to accelerate the simulations and are mostly
interested in the correct functioning of the pipeline. The batch
interval within which the processing should be completed was
set to 10 seconds. This means that the simulation has to process
the incoming batches 360 times faster than in the real case. This
is one of the main reasons why the number of observed streets
were so severely limited for the simulation. To evaluate the
simulation, the output provided by the SparkUI interface, which is
used to inspect the state of Spark execution, was analyzed. A
snapshot of SparkUI after running the simulation is shown in
ifgures 8 and 9.</p>
      <p>Regarding user evaluation of the web interface, informal user
evaluations were performed. Stakeholders from Bruxelles Mobilité
were shown the work-in-progress interface and asked to provide
informal feedback on the application. Additionally, colleagues
with expertise in the area of data visualization, especially
regarding mobility data, also gave their initial feedback on the currently
provided functionalities.
3.2</p>
    </sec>
    <sec id="sec-10">
      <title>Results</title>
      <p>Figure 8 shows an overview of some relevant statistics
collected by SparkUI. Here, the most informative charts are the
top (input rate) and third from the top (processing time) ones.
The variation in input rate shows that data ingestion peeks at
certain points in a day, this illustrates the variation in HGV trafic
depending on hour-of-the-day. The most important aspect of this
ifgure is that the processing time for a batch is below the batch
interval. As can be seen in the figure, the average batch
processing time is 1.6 seconds, which is well below the batch interval
of 5 seconds. The second chart from the top shows scheduling
delay, i.e. delay between scheduling of the job and the start of
processing, which always remained 0 as batches were always
processed within the batch interval. For this reason the bottom
chart (total delay) is the same as the processing time chart, since
processing time is the only source of delay.
(a) Table showing the diferent tasks of the job, distributed over 4
cores.</p>
      <p>(b) Event timeline of the parallel execution of the job.</p>
      <p>Figure 9 shows essential information which SparkUI provides
on a specific Spark job. Figure 9a shows that the job which
processes a batch was parallelized over four tasks that are each
handled by a diferent CPU core. Figure 9b shows the timeline of
events that are part of handling a Spark job. The blue parts of
the timeline correspond to scheduling of the job, the red parts
to deserialization of the data and the green parts to actually
processing the incoming records. The timeline shows that most of
processing time is actually spent on scheduling an deserialization
of the tasks. This is because the number of records in a batch
in this experiment are much smaller than in the real-world data
stream. Figure 10 shows the same timeline as figure 9b when
running the same task on the full dataset, i.e. with significantly
more records in the processed batch. In this experiment 8 cores
were allocated.</p>
      <p>Regarding user evaluation of the web interface, the general
consensus was that the current interface can already provide
some basic insights, but requires more advanced tools and
visualizations to provide an added value to our potential users,
compared to equivalent tools that are currently available.
3.3</p>
    </sec>
    <sec id="sec-11">
      <title>Discussion</title>
      <p>
        The results from the performed experiments indicate that the
current architecture is promising for use in a real-life scenario.
Taking the results from the previous experiments in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the
well-known reliability of the used technologies into account, it
is expected that given appropriate hardware and optimization,
there should be no issue in dealing with the amounts of data we
are working with.
      </p>
      <p>Initial tests with the full data set where also performed on
the same hardware as the preliminary experiments. Results are
promising given the single node setting, but further experiments
are needed to assess the architecture on a cluster setting.
However, these preliminary results let us anticipate that no
performance issues should be expected when using the full processing
power of a big data cluster.</p>
      <p>SparkUI was an important tool in debugging and analyzing
performance of the implemented pipeline. The insights it
provides into the execution of jobs enables detailed monitoring of
how well the implemented code for a big data project performs
in the Hadoop + Spark environment. These insights are
especially useful for assessing whether the implemented pipeline will
perform well, even without the use of big-data capable hardware.
For example, it is with the help of SparkUI that we can clearly
see that the scheduling and serialization overheads that can be
seen in figure 9b become insignificant when working with larger
data batches, as shown by the results seen in figure 10.
4</p>
    </sec>
    <sec id="sec-12">
      <title>FUTURE WORK</title>
      <p>Future work consists of finalizing the pipeline architecture and
connecting the diferent components of the MOBI-AID big data
platform together. One possible extension that is currently
envisioned is to add a merged view that uses data from both the speed
and batch layers to, for example, show discrepancies between
the real-time trafic conditions and typical conditions. Figure 11
visualizes this extension of our current implementation.</p>
      <p>Given this finalized implementation, we will perform
extensive experiments on the MLG big data cluster which is
powered by Apache Hadoop, as opposed to a regular ofice machine.</p>
      <p>MLG is currently in the process of migrating to a new cluster
which should provide the necessary facilities for large-scale
experiments. The goal of these experiments would be to move
beyond simulation. Concretely, we would hook up the implemented
pipeline to the actual stream of incoming data.</p>
      <p>Implementing and experimenting with more advanced
Machine Learning approaches for forecasting will also be an
important task in providing more nuanced predictions. Additionally,
integrating existing mobility indicators and advanced ITS models
from related research will provide appropriate metrics to policy
makers. The platform should be able to perform such processing
in real time and use the forecasts to simulate the impact of a
policy.</p>
      <p>Next to this, a finalized web interface will provide stakeholders
with the necessary tools to make informed decisions on how to
optimize trafic of goods in the Brussels Capital Region. Further
extending the current interface with feedback from the users
should allow us to provide this ideal interface. Concretely, further
versions of the real-time tab will also include other visualizations
besides the map, such as relevant charts and diferences with the
typical trafic situation at this hour-of-the-day. The final version
of this tab should allow users to easily spot anomalies in the
current trafic situation compared to historical observations.</p>
      <p>Prototypes for the Charts, Analytics and Predictions tabs
have not been implemented yet. It is currently under review
whether these should be separate tabs, or if they should be
combined into a single general Analysis tab. Conceptually, the Charts
tab would contain several types of charts that show useful
information, such as the typical distribution of HGVs over communes
for example. The analytics tab would contain tools that allow the
user to perform a specific analysis, such as constructing a model
of trafic flow based on the available data. The predictions tab
would put more emphasis on training and using the previously
mentioned forecasting methods to predict future states of the
HGV trafic in Brussels. These models could then be used by
policy makers to simulate efects of certain decisions, such as
modifying existing roads for example. Determining where the
functionality that is envisioned should live will be one of the
next steps in the design of the interface.</p>
      <p>After the full prototype of the web interface has been
implemented, extensive user studies and formal retrieval of user
requirements will be done to get a better insight as to what the
ifnal web interface should provide. Iterating further and using
agile software development methods should allow us to provide
the end-users with the tools they need in a user friendly manner.</p>
      <p>Finally, packaging the platform for deployment will give the
diferent stakeholders the envisioned platform that fits their
requirements and allow them to easily deploy it on their own
hardware. This platform should also scale to be used for the whole
country and given appropriate data, it could also be used for
other countries.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENTS</title>
      <p>Arnau Dilen, Giovanni Buroni, Yann-aël Le Borgne and
Gianluca Bontempi acknowledge the support of Programme
Opérationnel FEDER 2014-2020 de la Région de Bruxelles Capitale
(ICITY MOBI-AID project). The authors are also grateful to
Bruxelles Mobilité for having provided the OBU data necessary for
the work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Julian</given-names>
            <surname>Allen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Browne</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Urban logistics--how can it meet policy makers' sustainability objectives</article-title>
          ?
          <source>Journal of Transport Geography</source>
          <volume>13</volume>
          ,
          <issue>1</issue>
          (
          <year>2005</year>
          ),
          <fpage>71</fpage>
          -
          <lpage>81</lpage>
          . https://doi.org/10.1016/j.jtrangeo.
          <year>2004</year>
          .
          <volume>11</volume>
          .
          <article-title>002 Sustainability and the Interaction Between External Efects of Transport (Part Special Issue</article-title>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>99</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Angarita-Zapata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Masegosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Triguero</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Taxonomy of Trafic Forecasting Regression Problems From a Supervised Learning Perspective</article-title>
          .
          <source>IEEE Access</source>
          <volume>7</volume>
          (
          <year>2019</year>
          ),
          <fpage>68185</fpage>
          -
          <lpage>68205</lpage>
          . https://doi.org/10.1109/ACCESS.
          <year>2019</year>
          .2917228
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Barbosa</surname>
          </string-name>
          , Marc Barthelemy, Gourab Ghoshal,
          <string-name>
            <surname>Charlotte R. James</surname>
            , Maxime Lenormand, Thomas Louail, Ronaldo Menezes,
            <given-names>José J.</given-names>
          </string-name>
          <string-name>
            <surname>Ramasco</surname>
            , Filippo Simini, and
            <given-names>Marcello</given-names>
          </string-name>
          <string-name>
            <surname>Tomasini</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Human mobility: Models and applications</article-title>
          .
          <source>Physics Reports</source>
          <volume>734</volume>
          (
          <year>2018</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>74</lpage>
          . https://doi.org/10.1016/j.physrep.
          <year>2018</year>
          .
          <volume>01</volume>
          .
          <article-title>001 Human mobility: Models and applications</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Buroni</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yann-Aël Le</surname>
            <given-names>Borgne</given-names>
          </string-name>
          , Gianluca Bontempi, and
          <string-name>
            <given-names>Karl</given-names>
            <surname>Determe</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Cluster Analysis of On-Board-Unit Truck Big Data from the Brussels Capital Region</article-title>
          .
          <source>21st IEEE International Conference on Intelligent Transportation Systems</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Buroni</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yann-Aël Le</surname>
            <given-names>Borgne</given-names>
          </string-name>
          , Gianluca Bontempi, and
          <string-name>
            <given-names>Karl</given-names>
            <surname>Determe</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>On-Board-Unit Data: A Big Data Platform for Scalable storage</article-title>
          and
          <source>Processing</source>
          . 1-
          <fpage>5</fpage>
          . https://doi.org/10.1109/CloudTech.
          <year>2018</year>
          .8713342
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Howard</given-names>
            <surname>Butler</surname>
          </string-name>
          , Martin Daly, Allan Doyle, Sean Gillies, Hagen Stefan, and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Schaub</surname>
          </string-name>
          .
          <year>2016</year>
          . GeoJSON. Internet Engineering Task Force. https://tools.ietf. org/html/rfc7946
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Carcillo</surname>
          </string-name>
          , Andrea Dal Pozzolo,
          <string-name>
            <surname>Yann-Aël Le</surname>
            <given-names>Borgne</given-names>
          </string-name>
          , Olivier Caelen, Yannis Mazzer, and
          <string-name>
            <given-names>Gianluca</given-names>
            <surname>Bontempi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>SCARFF: A scalable framework for streaming credit card fraud detection with spark</article-title>
          .
          <source>Information fusion 41</source>
          (
          <year>2018</year>
          ),
          <fpage>182</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Kafka Comitters</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Apache Kafka. Apache Software Foundation</article-title>
          . https://kafka.apache.org/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Spark Committers</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Apache Spark. Apache Software Foundation</article-title>
          . https://spark.apache.org/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Konstantinos</surname>
            <given-names>Demertzis</given-names>
          </string-name>
          , Lazaros Iliadis, and
          <string-name>
            <surname>Vardis-Dimitris Anezakis</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Machine Hearing Framework for Real-Time Streaming Analytics Using Lambda Architecture</article-title>
          .
          <source>In Engineering Applications of Neural Networks</source>
          , John Macintyre, Lazaros Iliadis, Ilias Maglogiannis, and Chrisina Jayne (Eds.). Springer International Publishing, Cham,
          <fpage>246</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <article-title>GeoPandas developers</article-title>
          .
          <year>2019</year>
          . GeoPandas. GeoPandas developers. http: //geopandas.org/index.html#
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>PostgreSQL</given-names>
            <surname>Developers</surname>
          </string-name>
          .
          <year>2019</year>
          . PostgreSQL. The PostgreSQL Global Development Group. https://www.postgresql.org
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Anzhelika</surname>
            <given-names>Dombalyan</given-names>
          </string-name>
          , Viktor Kocherga, Elena Semchugova, and
          <string-name>
            <given-names>Nikolai</given-names>
            <surname>Negrov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Trafic Forecasting Model for a Road Section</article-title>
          .
          <source>Transportation Research Procedia</source>
          <volume>20</volume>
          (
          <year>2017</year>
          ),
          <fpage>159</fpage>
          -
          <lpage>165</lpage>
          . https://doi.org/10.1016/j.trpro.
          <year>2017</year>
          .
          <volume>01</volume>
          .040 12th International Conference on Organization and
          <article-title>Trafic Safety Management in large cities</article-title>
          ,
          <source>SPbOTSIC-2016</source>
          ,
          <fpage>28</fpage>
          -
          <lpage>30</lpage>
          September 2016,
          <article-title>St</article-title>
          . Petersburg, Russia.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] PostGIS Development Group.
          <year>2019</year>
          .
          <article-title>PostGIS. The Open Source Geospatial Foundation</article-title>
          . https://postgis.net/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hadavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Verlinde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Verbeke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macharis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Guns</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Monitoring Urban-Freight Transport Based on GPS Trajectories of Heavy-Goods Vehicles</article-title>
          .
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>20</volume>
          , 10 (Oct
          <year>2019</year>
          ),
          <fpage>3747</fpage>
          -
          <lpage>3758</lpage>
          . https://doi.org/10.1109/TITS.
          <year>2018</year>
          .2880949
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kiran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Monga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dugan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Baveja</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Lambda architecture for cost-efective batch and speed big data processing</article-title>
          .
          <source>In 2015 IEEE International Conference on Big Data (Big Data)</source>
          .
          <fpage>2785</fpage>
          -
          <lpage>2792</lpage>
          . https: //doi.org/10.1109/BigData.
          <year>2015</year>
          .7364082
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Narayan</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Twitter's tweets analysis using Lambda Architecture</article-title>
          . https://blog.knoldus.com/twitters-tweets
          <article-title>-analysis-using-lambdaarchitecture/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>I.</given-names>
            <surname>Lana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Velez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. I.</given-names>
            <surname>Vlahogianni</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Road Trafic Forecasting: Recent Advances and New Challenges</article-title>
          .
          <source>IEEE Intelligent Transportation Systems Magazine 10, 2 (Summer</source>
          <year>2018</year>
          ),
          <fpage>93</fpage>
          -
          <lpage>109</lpage>
          . https://doi.org/10.1109/MITS.
          <year>2018</year>
          .2806634
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Jure</surname>
            <given-names>Leskovec</given-names>
          </string-name>
          , Anand Rajaraman, and Jefrey David Ullman.
          <year>2014</year>
          .
          <article-title>Mining of massive datasets</article-title>
          . Cambridge university press.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Nathan</given-names>
            <surname>Marz</surname>
          </string-name>
          and
          <string-name>
            <given-names>James</given-names>
            <surname>Warren</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Big Data: Principles and best practices of scalable real-time data systems</article-title>
          . New York; Manning Publications Co.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Hadoop Project Members</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Apache Hadoop. Apache Software Foundation</article-title>
          . https://hadoop.apache.org/
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Django</surname>
            <given-names>Team</given-names>
          </string-name>
          <string-name>
            <surname>Members</surname>
          </string-name>
          .
          <year>2019</year>
          . Django. Django Software Foundation. https: //www.djangoproject.com/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>David</given-names>
            <surname>Myr</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Real time vehicle guidance and trafic forecasting system</article-title>
          .
          <source>US Patent 6</source>
          ,
          <issue>615</issue>
          ,
          <fpage>130</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Daiga</surname>
            <given-names>Plase</given-names>
          </string-name>
          , Laila Niedrite, and
          <string-name>
            <given-names>Romans</given-names>
            <surname>Taranovs</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Accelerating data queries on Hadoop framework by using compact data formats</article-title>
          .
          <source>In Advances in Information, Electronic and Electrical Engineering (AIEEE)</source>
          ,
          <source>2016 IEEE 4th Workshop on. IEEE</source>
          , 1-
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Mohammed</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Quddus</surname>
            ,
            <given-names>Chao</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Stephen</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Ison</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Road Trafic Congestion and Crash Severity: Econometric Analysis Using Ordered Response Models</article-title>
          .
          <source>Journal of Transportation Engineering</source>
          <volume>136</volume>
          ,
          <issue>5</issue>
          (
          <year>2010</year>
          ),
          <fpage>424</fpage>
          -
          <lpage>435</lpage>
          . https://doi.org/10.1061/(ASCE)TE.
          <fpage>1943</fpage>
          -
          <volume>5436</volume>
          . 0000044 arXiv:https://ascelibrary.org/doi/pdf/10.1061/%28ASCE%
          <fpage>29TE</fpage>
          .
          <fpage>1943</fpage>
          -
          <volume>5436</volume>
          .
          <fpage>0000044</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>John</given-names>
            <surname>Ratclife</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ela</given-names>
            <surname>Krawczyk</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Imagineering city futures: The use of prospective through scenarios in urban planning</article-title>
          .
          <source>Futures</source>
          <volume>43</volume>
          ,
          <issue>7</issue>
          (
          <year>2011</year>
          ),
          <fpage>642</fpage>
          -
          <lpage>653</lpage>
          . https://doi.org/10.1016/j.futures.
          <year>2011</year>
          .
          <volume>05</volume>
          .005 Alternative City Futures.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Dilpreet</given-names>
            <surname>Singh and Chandan K Reddy</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A survey on platforms for big data analytics</article-title>
          .
          <source>Journal of Big Data</source>
          <volume>2</volume>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Hongyu</surname>
            <given-names>Sun</given-names>
          </string-name>
          , Henry X. Liu, Heng Xiao,
          <string-name>
            <surname>Rachel R. He</surname>
            , and
            <given-names>Bin</given-names>
          </string-name>
          <string-name>
            <surname>Ran</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Use of Local Linear Regression Model for Short-Term Trafic Forecasting</article-title>
          .
          <source>Transportation Research Record</source>
          <year>1836</year>
          ,
          <volume>1</volume>
          (
          <year>2003</year>
          ),
          <fpage>143</fpage>
          -
          <lpage>150</lpage>
          . https://doi.org/10. 3141/1836-
          <fpage>18</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>CP</given-names>
            <surname>Van Hinsbergen</surname>
          </string-name>
          ,
          <source>JW Van Lint, and FM Sanders</source>
          .
          <year>2007</year>
          .
          <article-title>Short term trafic prediction models</article-title>
          .
          <source>In PROCEEDINGS OF THE 14TH WORLD CONGRESS ON INTELLIGENT TRANSPORT SYSTEMS (ITS)</source>
          , HELD BEIJING,
          <year>OCTOBER 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>JWC</given-names>
            <surname>Van Lint and CPIJ Van Hinsbergen</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Short-term trafic and travel time prediction models</article-title>
          .
          <source>Artificial Intelligence Applications to Critical Transportation Issues</source>
          <volume>22</volume>
          ,
          <issue>1</issue>
          (
          <year>2012</year>
          ),
          <fpage>22</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Eleni</surname>
            <given-names>I. Vlahogianni</given-names>
          </string-name>
          , Matthew G. Karlaftis,
          <string-name>
            <given-names>and John C.</given-names>
            <surname>Golias</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Shortterm trafic forecasting: Where we are and where we're going</article-title>
          .
          <source>Transportation Research Part C: Emerging Technologies</source>
          <volume>43</volume>
          (
          <year>2014</year>
          ),
          <fpage>3</fpage>
          -
          <lpage>19</lpage>
          . https://doi.org/10. 1016/j.trc.
          <year>2014</year>
          .
          <volume>01</volume>
          .005 Special Issue on
          <article-title>Short-term Trafic Flow Forecasting</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Matei</surname>
            <given-names>Zaharia</given-names>
          </string-name>
          , Reynold S Xin, Patrick Wendell,
          <string-name>
            <surname>Tathagata Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Michael Armbrust</surname>
          </string-name>
          , Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman,
          <string-name>
            <surname>Michael J Franklin</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Apache spark: a unified engine for big data processing</article-title>
          .
          <source>Commun. ACM</source>
          <volume>59</volume>
          ,
          <issue>11</issue>
          (
          <year>2016</year>
          ),
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Esteban</surname>
            <given-names>Zimányi</given-names>
          </string-name>
          , Mahmoud Sakr, Arthur Lesuisse, and
          <string-name>
            <given-names>Mohamed</given-names>
            <surname>Bakli</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>MobilityDB: A Mainstream Moving Object Database System</article-title>
          .
          <source>In Proceedings of the 16th International Symposium on Spatial and Temporal Databases (SSTD '19)</source>
          . ACM, New York, NY, USA,
          <fpage>206</fpage>
          -
          <lpage>209</lpage>
          . https://doi.org/10.1145/3340964.3340991
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>