<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Joint Conference (March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Heterogeneous Industrial Vehicle Usage Predictions: A Real Case</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dena Markudova</string-name>
          <email>dena.markudova@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Mellia</string-name>
          <email>marco.mellia@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Baralis</string-name>
          <email>elena.baralis@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Vassio</string-name>
          <email>luca.vassio@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Cagliero</string-name>
          <email>luca.cagliero@polito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elvio Amparore</string-name>
          <email>eamparore@topcon.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Loti</string-name>
          <email>rloti@tierratelematics.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Salvatori</string-name>
          <email>lsalvatori@topcon.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tierra Spa</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>26</volume>
      <issue>2019</issue>
      <abstract>
        <p>Predicting future vehicle usage based on the analysis of CAN bus data is a popular data mining application. Many of the usage indicators, like the utilization hours, are non-stationary time series. To predict their values, recent approaches based on Machine Learning combine multiple data features describing engine status, travels, and roads. While most of the proposed solutions address cars and trucks usage prediction, a smaller body of work has been devoted to industrial and construction vehicles, which are usually characterized by more complex and heterogeneous usage patterns. This paper describes a real case study performed on a 4-year CAN bus dataset collecting usage data about 2 250 construction vehicles of various types and models. We apply a statistics-based approach to select the most discriminating data features. Separately for each vehicle, we train regression algorithms on historical data enriched with contextual information. The achieved results demonstrate the efectiveness of the proposed solution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Vehicle usage prediction is an established data mining problem,
which has found application in both industrial and academic
contexts. Since approximately one third of the energy usage is
due to transportation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], predicting vehicle usage is particularly
useful for optimizing resources thus reducing the emissions of
CO2 and other pollutant agents. Forecasting key vehicle usage
indicators (e.g., utilization hours and fuel consumption levels)
is a parallel issue, which is deemed as crucial to optimize many
industrial processes [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] such as (i) managing fleets of vehicles
in construction sites, (ii) planning periodic maintenance actions
on the vehicles of a company, and (iii) optimizing truck routes.
      </p>
      <p>
        Thanks to the advent of Controlled Area Network (CAN)
standards and Internet-of-Things technologies, vehicles are nowadays
equipped with smart sensors and tracking systems that capture
and transmit high-resolution and multivariate time series data
regarding fuel consumption, vehicle movements (e.g., accelerations
and drifts), engine conditions (e.g., oil temperature), and route
characteristics (e.g., slope). A relevant research efort has been
devoted to analyzing CAN bus data by means of supervised
Machine Learning techniques [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to predict the main usage indicator
values associated with vehicles. The problem can be modelled as
a multivariate time series forecasting task. For example, in [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ]
the authors applied regression models to predict the future fuel
consumption values of trucks based on the past time series values
as well as based on the values taken by correlated time series (e.g.,
travelled distance, average speed, average road slope). The time
series describing travel characteristics appeared to be the most
discriminating features. Similarly, the studies presented in [
        <xref ref-type="bibr" rid="ref13 ref2">2, 13</xref>
        ]
focused their analyses on CAN Bus and trip data to predict the
fuel consumption of cars and trucks. The empirical comparisons
reported in the aforesaid studies showed that Support Vector
Regression models appeared to be the best performing regression
algorithms in the considered scenario. Diferent types of on-road
vehicles were considered in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They proposed to
use Random Forests to learn predictive models for public buses,
waste collectors, and heavy duty trucks, respectively. A more
extensive review of the literature on on-road vehicle consumption
models is given in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Due to the peculiar characteristics of the environment (e.g.,
construction sites, rural areas) and to their context of use,
industrial and construction vehicles show fairly heterogeneous and
hardly predictable usage patterns. Hence, a smaller body of work
has been devoted to predicting their future usage. Most of the
presented works in this field targeted very specific challenges.
For example, the works presented in [
        <xref ref-type="bibr" rid="ref1 ref12">1, 12</xref>
        ] focused on modelling
fuel consumption of mining trucks, where a large portion of fuel
consumption was due to avoidable idle times. To the best of our
knowledge, extensive studies considering and comparing various
construction and industrial vehicle types and models with each
other have not been presented in literature.
      </p>
      <p>This paper addresses the automatic prediction of the daily
utilization hours of industrial and construction vehicles
belonging to multiple types and models. It presents a real case study
performed on a 4-year real dataset collecting CAN bus data of
2 239 industrial vehicles with various characteristics working in
construction sites placed all over the world. The aim of the study
is to help site managers to properly schedule short-term fleet
management and maintenance actions (e.g., schedule refueling).
To address utilization hours prediction, we apply regression
models trained on past vehicle data. Training data consist of CAN
bus data (engine rpm, oil temperature, fuel level) enriched with
contextual information (e.g., day of the week, season, location).
Since the analyzed time series shows non-stationary and rather
heterogeneous usage trends (independently of vehicle type and
model), we train regression models separately on each vehicle.
To tailor prediction models to the most discriminating vehicle
characteristics, we apply a statistics-based approach to select
the most relevant data features. This reduces the bias due to the
presence of uncorrelated variables and thus allows to focus the
model training on the most salient information.</p>
      <p>The rest of the paper is organized as follows. Section 2
describes the dataset. Section 3 formalizes the problem addressed
in the paper and describes the methodology we adopt. Section 4
summarizes the main experimental results, while Section 5 draws
conclusions and discusses the future research directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>DATA OVERVIEW</title>
      <p>This study focuses on analyzing telematics data acquired from
industrial vehicles used in heterogeneous contexts (e.g.,
construction sites). The data was provided by Tierra S.p.A. Tierra is a
company that provides telematics solutions for tracking vehicles
of multiple vendors.</p>
      <p>Data description. The study encompasses a quite large set
of heterogeneous industrial vehicles. Overall, we analyzed data
related to 2 239 vehicles belonging to 10 diferent types and
located in 151 diferent countries spread all over the world. The
dataset collects approximately 4-year data (from January 2015
to September 2018). For each vehicle we consider diferent data
characteristics. The main classes of data features are enumerated
below.</p>
      <p>• CAN bus information (e.g., Engine ON/OFF, CAN
paramet</p>
      <p>ric messages, Diagnostic Messages, and status reports);
• Vendor information (e.g., unit/asset info, maintenance
ser</p>
      <p>vices);
• Information provided by embedded devices installed
on</p>
      <p>board (e.g., digital inputs report);
• Contextual information
– Spatial information (geographical location of the vehicle,</p>
      <p>region, country)
– Temporal information (time stamp, day of the week,
holiday/working day – depending on the country, week
of the year, month of the year, season, year)</p>
      <p>CAN messages are generated by the vehicle sensors and the
Machine Control Systems at a high frequency (up to 100 Hz)
and gathered by a controller, where they are collected and
preprocessed. The system then sends an aggregated report to a
centralized server every 10 minutes. For each vehicle, the report
contains a set of data features describing the engine and vehicle
statuses, e.g., fuel level, engine oil pressure, engine coolant
temperature, engine fuel rate usage, speed, working hours, percent
load, digging press, pump drive temp, oil tank temperature, etc.</p>
      <p>Based on acquisition time and number of acquired samples we
derive the daily utilization hours for each vehicle.</p>
      <p>Data preparation. To prepare CAN bus data for the Machine
Learning process, we apply the following preparation steps: (i) Data
cleaning, to handle missing values, make data formats uniform,
and verify the absence of data inconsistencies. This is
particularly important since vehicles operate in remote regions where
the sudden absence of connectivity may afect data collection.
(ii) Normalization, to normalize the values of continuous features
in order to make them comparable with each other, (iii)
Aggregation, to aggregate feature values on a daily basis. (iv) Enrichment,
to enrich CAN bus data with multiple-level and multi-faceted
contextual information, and (v) Transformation, to tailor input
data to a relational data format.</p>
      <p>Vehicles are characterized by a unique identifier (the Vehicle
id) and are classified based on the type of construction vehicle
(e.g., refuse compactor, single drum roller, tandem roller, coring
machine, paver, recycler, cold planner, and grader). Each type
is then split into several models (i.e., a type subcategory). For
example, the dataset contains vehicles of 44 diferent models of
refuse compactors, 65 models of single drum rollers, 10 models
of recyclers, etc. Finally, the dataset contains data for multiple
units for each model. CAN bus data has been enriched with
contextual information, to allow the exploitation of seasonality
and short-term periodic trends to enhance prediction accuracy.</p>
      <p>For example, for most of the vehicles located in the northern
hemisphere, the number of days in which they were unused was
maximal in December and January due to Christmas holidays
and unfavourable weather.</p>
      <p>Data characterization. Data characterization is instrumental
to discover similarities and diferences among vehicle usage
patterns. We focus on the number of hours a vehicle is active each
day, for the whole considered time period. We remove the days
where we did not record any usage. In a Cumulative
Distribution Function (CDF), a curve value F (x ) indicates the fraction of
days where the number of daily utilization hours are less than
or equal to x . Figure 1(a) shows the empirical CDF of vehicle
usage. The plot highlights the heterogeneity of the vehicle usage
distributions across diferent types. For instance, graders and
refuse compactors are used more than 6 hours per day in median,
whereas the coring machines showed opposite usage patterns,
with a median usage of less than one hour. Some vehicle types
expose a long tail in the CDF, meaning that they are sometimes
working up to 24 hours per day.</p>
      <p>Within specific vehicle types and models, vehicles still show
highly variable usage patterns. For example, Figure 1(b) shows
the boxplots of the utilization hours for all the 44 models of
type refuse compactor (that is the mostly used vehicle type).</p>
      <p>Models are sorted in ascending order according to their median
utilization. The boxplots display the full range of variation (from
minimum to maximum), and the first, second and third quartiles.</p>
      <p>Values with + marker are classified as outliers (deviation of more
than 1.5 times interquartile range from the first and third
quartiles). The plot confirms a large variance of the utilization hours
across vehicle types. Figure 1(c) deepens the analysis into specific
vehicle units of the refuse compactor model. For each vehicle,
we analyze the utilization hour series across single units. The
results indicate that even within the same model, units have very
diferent usage patterns. Finally, Figure 1(d) plots the weekly
utilization hours series for five vehicle units at random. Daily patters
are even more uncorrelated and noisy. Despite the units being of
the same type and model, usage patterns shows non-stationary
and uncorrelated trends.</p>
      <p>This large variability suggests us to address the prediction
problem by training a per-vehicle regression model. Building a
model for a vehicle type or model would results in a too generic
approach.
(a) Cumulative Distribution Functions of number of daily utilization hours per vehicle (b) Boxplots of number of daily utilization hours for diferent models of a single
type. Inactive days are removed. vehicle type (all refuse compactors)
(c) Boxplots of number of daily utilization hours for single units of a specific model (d) Time series of weekly utilization hours for 5 diferent single vehicles of a specific
of a refuse compactor type. model of refuse compactor.</p>
    </sec>
    <sec id="sec-3">
      <title>3 METHODOLOGY</title>
      <p>This section formalizes the problem we address in this study.
We describe the methodologies to generate per-vehicle training
datasets, filter uncorrelated features from training data, and train
the regression models.</p>
      <p>Problem statement. Let Htx be the daily utilization hours of
vehicle x on day t . The classical univariate series forecasting
problem entails predicting the utilization hours on the n1e,.x.t.,dHayxHtx+1
based on the series of historical values Htx , Htx− t −w +1
within a time window of size w. Since utilization hours are likely
to be temporarily correlated with each other, we consider them
as a function f of the most recent values within a specified time
window TW – hereafter denoted as training window. Formally
speaking,</p>
      <p>Htx+1 = f (Htx , Htx−1, . . . , Htx−w +1)
where Htx+1 is the value of the target variable and f(·) is the
prediction function we need to define.</p>
      <p>Let F be the set of CAN bus and contextual features stored
in the relational dataset (according to the data description in
Section 2), and let Ftx be the value of an arbitrary feature F ∈ F
(e.g., engine oil pressure) associated with vehicle x on day t . We
can extend the classical forecasting problem to a multivariate
context by considering not only the historical values of the
utilization hours themselves, but also those of the other features in
F . Formally speaking, the problem can be formalized as follows:
Htx+1 = f (Htx , Htx−1, . . . , Htx−w +1, Ftx , Ftx−1, . . . , Ftx−w +1, . . .)
According to our preliminary data exploration, most vehicles
are used only for few days a week (e.g., vehicles of type refuse
compactor were used 36% of the days in 2017). For this reason,
we investigate two variants of the multivariate problem:
• Predict the utilization hours on the next day (next-day
scenario, in short),
• Predict the utilization hours on the next working day, i.e.,
the next day on which the vehicle will be used at least 1
hour (next-working-day scenario).</p>
      <p>Training data generation. Given dataset D, target vehicle x ,
and training window TW , in this phase we build a per-vehicle
training dataset Tx by applying the sliding window approach.
More specifically, a smaller window SW slides over the entire
training window TW to capture interesting temporal correlations
among close utilization hours. Each slide generates a record in the
training dataset. For instance, if we set the window size w = |SW |
to 7 to capture weekly periodicity, we obtain |TW | − 7 training
samples, i.e., we can have up to |TW | − 7 windows containing
the next targeted day and the previous 7 days. In a nutshell, each
record contains a target value of utilization hours on arbitrary
day t + 1, and the feature values associated with each of the
previous 7 days, i.e., t − 6, . . ., t .</p>
      <p>
        Statistics-based feature selection. Since the considered vehicles
show variable usage patterns, the correlation between the
features in the training dataset and the target feature Htx+1 is likely
to change from vehicle to vehicle. To tailor prediction models
to the most discriminating features, we adopt a statistics-based
approach to filter training features prior to regression model
learning. Given a vehicle x , we compute the autocorrelation
function [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] of the original daily utilization hour time series to decide
which days in the sliding window (among t − w, . . ., t ) the past
series values) are actually correlated with the target one (t + 1).
The autocorrelation function estimates the correlation of a series
with a delayed copy of itself (adding a time lag). The larger the
autocorrelation value associated with an arbitrary lag l , the more
correlated the target day t + 1 with previous day t + 1 − l .
      </p>
      <p>Figure 2 shows an example of autocorrelation function
associated with the utilization hours of a specific refuse compactor unit.
Obviously, the autocorrelation value is maximal with lag equal
to 0, i.e., comparing the series with itself. In the example, the
autocorrelation function is able to capture a weekly periodicity
(i.e., high autocorrelation value at l =7, 14, 21, . . .). Similarly, the
day after (l = 1, 8, 15, . . .), and before (l = 6, 13, . . .) also exhibits
a high correlation with the target day. Other days instead are
marginally correlated.</p>
      <p>To exploit this correlation, we pick the K lags l ∈ [1, w] with
maximal autocorrelation value, which correspond to the K days
that are mostly correlated with the target day. Then, we select
the features in the training dataset corresponding to these days.
Formally speaking, let t ∗, t ∗∗, . . . be the subset of selected days.
The problem is reformulated as follows:</p>
      <p>Htx+1 = f (Htx∗ , . . . , Htx∗∗ , . . . , Ftx∗ , Ftx∗∗ , . . .)</p>
      <p>
        Regression model learning. To finally build the regression model
f (·), we rely on standard approaches. Specifically, we focus on
the following algorithms: (i) Linear Regression (LR), (ii) Lasso
Regression, (iii) Support Vector Regression (SVR), and (iv)
Gradient Boosting (GB). The last approach is an ensemble method (i.e.,
an ensemble of decision tree models) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. To train the models we
use the scikit-learn implementations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We also consider two
naive baseline methods: (i) Using the last observed value (LV),
(ii) using a moving average value (MA) over the past w days.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTAL RESULTS</title>
      <p>This section summarizes the most relevant results obtained by
applying the explained methodology on the Tierra dataset. We
conducted an extensive experimental campaign to (i) analyze
the impact on prediction outcome, (ii) test multiple algorithm
configurations to identify the best settings in diferent scenarios,
and (iii) estimate the prediction errors to get confidence intervals
for the estimations, and (iv) use the best obtained models on
vehicles of diferent models and types. Due to the lack of space,
hereafter we will report a selection of the most significant results.</p>
      <p>The experiments are performed on an Intel(R) Core(TM)
i78550U CPU with 16 GB of RAM running Ubuntu 18.04 server.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental design</title>
      <p>
        To evaluate the performance of the proposed approach we apply a
hold-out validation. We adopt two established strategies for time
series forecasting, i.e., sliding window and expanding window
methods [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The sliding window strategy entails the following
steps:
(1) Define a fixed-size sliding window SL of up to w = 150
days sliding over the whole time period (approximately
4-year data).
(2) For each window slide, prepare the relational dataset using
the windowing approach (see Training data generation in
Section 3) and apply the feature selection step.
(3) Separately for each vehicle train diferent regression
models on the prepared training dataset.
(4) Apply the regression models to predict the utilization
hours of the vehicles either on the next day (Next-day
scenario) or on the next working day (Next-working-day
scenario).
(5) Evaluate the per-vehicle prediction errors by averaging
the errors over the entire period.
(6) Evaluate the overall prediction error by averaging the
prediction errors over all the vehicles.
      </p>
      <p>The expanding window strategy difers from the sliding
window one because the training window at Step (1) is not fixed-size,
but it includes all the preceding days in the original dataset (see
Figure 3).</p>
      <p>To assess the quality of the prediction outcomes at Steps (5) and
(6), we computed the Percentage Error (PE) between predicted
and actual utilization hours:</p>
      <p>P E = 100 ·
Ín
i=1 |H pir ed − H aictual |
Ín</p>
      <p>i=1 |H aictual |
4.2</p>
    </sec>
    <sec id="sec-6">
      <title>Algorithm settings</title>
      <p>
        For each algorithm we run a grid search to fit the model to the
analyzed data distribution. The selected settings are summarized
below. More details on the algorithm parameters are provided
by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>• Lasso: α =0.1
• Support Vector Regressor (SVR): kernel = rbf, C = 10, ϵ =
0.1, γ = 1
• Gradient Boosting (GB): learning rate=0.1, n_estimators =
100, max_depth = 1, loss = lad
• Baseline - Predict the Moving Average (MA): moving
average period = 30.
4.3</p>
    </sec>
    <sec id="sec-7">
      <title>Efect of the feature selection step</title>
      <p>We first analyze the efect of varying the number of considered
previous days (K ) on the average Prediction Error. Since its efect
is influenced by the sliding window size w, we jointly analyzed
the two efects. Figure 4 reports results, with one curve per
different value of window size w. The results show that
• A smart feature selection yields up to 10% improvement
in terms of PE.
• The optimal number of considered previous days K ranges
between 10 and 30.
• The more previous days you consider, the more features
you add in the training dataset.
• Focusing on a limited number of days (&lt; 10) makes
prediction models sensitive to noise and data overfitting.
• Including a very large number of features in the training
dataset may significantly increase the complexity of the
learning generation phase.
• The more training data you get (i.e., larger w), the more
robust the model (except for very small K values, for which
the robustness of the generated models is not guaranteed).
• Expanding the training window (i.e., increasing |SW | on
all past data) performs better, but at the cost of additional
computational complexity (not reported - training time
grows with data).</p>
      <p>To balance model accuracy and complexity, hereafter, we will
set K = 20 and w = 140 (unless otherwise specified).</p>
    </sec>
    <sec id="sec-8">
      <title>Analysis of prediction errors</title>
      <p>We now compare the performance of diferent regression
models. Figure 5 summarize the results in two considered scenarios
(Next-day and Next-working-day). As expected, Machine
Learning approaches perform better than baseline strategies in both
cases. Within each strategy, single and ensemble methods achieve
similar percentage errors, with SVR performing comparably to
Gradient Boosting. Learning an ensemble of multiple models as
such does not yield significant performance improvements.</p>
      <p>In the Next-working-day scenario prediction errors are much
better - approaching 15% vs. 30% average error in the Next-day
scenario. This because removing non-working days simplifies the
forecasting problem, being the latter almost randomly present.
To exemplify this efect, Figures 6(a) and 6(b) plot the actual
and predicted series in the two scenarios. In the
Next-workingday scenario the predicted curve appears to fit better the actual
series thanks to the absence of less-predictable non-working days.
Hence, in the contexts in which holidays/non-working days are
known in advance, adapting the raw data to this simpler scenario
yields significant advantages in terms of model accuracy.
4.5</p>
    </sec>
    <sec id="sec-9">
      <title>Prediction time</title>
      <p>We measured the execution time of the applied methodology,
which includes (i) Data preparation and feature selection, (ii)
Model training, and (iii) Model application. Learning the
regression models, i.e., Step (ii), turned out to be the most
computationally expensive task. According to the performed experiments, the
time spent in accomplishing the other tasks is negligible.</p>
      <p>The overall execution time taken by single regression models
(trained with the settings recommended in Section 4.2) varied
between few seconds for simpler models (MA, LV, LR, Lasso)
and few tens of seconds for the most complex ones (SVR). The
time spent by ensemble methods (i.e., Gradient Boosting) was
approximately one order of magnitude higher than single models.
However, as discussed in Section 4.4, combining multiple models
did not provide significant performance improvements.
5</p>
    </sec>
    <sec id="sec-10">
      <title>CONCLUSIONS AND FUTURE WORKS</title>
      <p>The paper describes our experience in using supervised
regression techniques to predict industrial vehicle usage based on the
analysis of CAN bus and contextual data. The presented case
study corroborates previous studies focused on specific
construction vehicle types by considering a broader set of vehicle types
and models. To efectively cope with a heterogeneous set of
vehicles, we selected ad hoc feature sets tailored to each vehicle prior
to model learning. The generated models appeared to be quite
efective (i.e., with 15% error) in predicting vehicle utilization
hours on the next working day. Without any apriori knowledge
about the days of idleness, prediction errors on average double,
but for many vehicle types and models it was still possible to
(a) Error distribution. Scenario: Next-day.
(a) Scenario: Next-day: idle days are hard to predict.
(b) Scenario: Next-working-day: working hours are easier to predict.
accurately forecast non-stationary trends. Future developments
of this research will entail the integration of additional contextual
information (e.g., weather) and the use of classification models
to predict discrete usage levels.</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgements</title>
      <p>The research leading to these results has been funded by the
SmartData@PoliTO center for Big Data and Machine Learning
technologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.M.</given-names>
            <surname>Bajany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A MILP Model for Truck-shovel Scheduling to Minimize Fuel Consumption</article-title>
          .
          <source>Energy Procedia</source>
          <volume>105</volume>
          (
          <year>2017</year>
          ),
          <fpage>2739</fpage>
          -
          <lpage>2745</lpage>
          . https://doi.org/10.1016/j.egypro.
          <year>2017</year>
          .
          <volume>03</volume>
          .925 8th International Conference on Applied Energy,
          <year>ICAE2016</year>
          ,
          <fpage>8</fpage>
          -
          <issue>11</issue>
          <year>October 2016</year>
          , Beijing, China.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ahmet</given-names>
            <surname>Gurcan</surname>
          </string-name>
          <string-name>
            <surname>Caapraz</surname>
          </string-name>
          , Pinar Ozel,
          <source>Mehmet Azevkli, and Amer Faruk Beyca</source>
          .
          <year>2016</year>
          .
          <article-title>Fuel Consumption Models Applied to Automobiles Using Real-time Data: A Comparison of Statistical Models</article-title>
          .
          <source>Procedia Computer Science</source>
          <volume>83</volume>
          (
          <year>2016</year>
          ),
          <fpage>774</fpage>
          -
          <lpage>781</lpage>
          . https://doi.org/10.1016/j.procs.
          <year>2016</year>
          .
          <volume>04</volume>
          .166
          <source>The 7th International Conference on Ambient Systems, Networks and Technologies (ANT 2016) / The 6th International Conference on Sustainable Energy Information Technology (SEIT-</source>
          <year>2016</year>
          ) / Afiliated Workshops.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Oscar</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
            ,
            <given-names>Nigel N.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            , and
            <given-names>Gregory J.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Heavy Duty Truck Fuel Consumption Prediction Based on Driving Cycle Properties</article-title>
          .
          <source>International Journal of Sustainable Transportation</source>
          <volume>6</volume>
          ,
          <issue>6</issue>
          (
          <year>2012</year>
          ),
          <fpage>338</fpage>
          -
          <lpage>361</lpage>
          . https: //doi.org/10.1080/15568318.
          <year>2011</year>
          .613978
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Kebin</given-names>
            <surname>He</surname>
          </string-name>
          , Hong Huo, Qiang Zhang, Dongquan He,
          <string-name>
            <surname>Feng An</surname>
            ,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>Michael P.</given-names>
          </string-name>
          <string-name>
            <surname>Walsh</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Oil consumption and CO2 emissions in China's road transport: current status, future trends, and policy implications</article-title>
          .
          <source>Energy Policy</source>
          <volume>33</volume>
          ,
          <issue>12</issue>
          (
          <year>2005</year>
          ),
          <fpage>1499</fpage>
          -
          <lpage>1507</lpage>
          . https://doi.org/10.1016/j.enpol.
          <year>2004</year>
          .
          <volume>01</volume>
          .007
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Erik</surname>
            <given-names>HellstrÃČÂűm</given-names>
          </string-name>
          , Maria Ivarsson, Jan ÃČâĂęslund, and
          <string-name>
            <given-names>Lars</given-names>
            <surname>Nielsen</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Look-ahead control for heavy trucks to minimize trip time and fuel consumption</article-title>
          .
          <source>Control Engineering Practice</source>
          <volume>17</volume>
          ,
          <issue>2</issue>
          (
          <year>2009</year>
          ),
          <fpage>245</fpage>
          -
          <lpage>254</lpage>
          . https: //doi.org/10.1016/j.conengprac.
          <year>2008</year>
          .
          <volume>07</volume>
          .005
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Gareth</given-names>
            <surname>James</surname>
          </string-name>
          , Daniela Witten, Trevor Hastie, and
          <string-name>
            <given-names>Robert</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>An Introduction to Statistical Learning: With Applications in</article-title>
          R. Springer Publishing Company, Incorporated.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , Anand Rajaraman, and Jefrey David Ullman.
          <year>2014</year>
          .
          <article-title>Mining of Massive Datasets (2nd ed</article-title>
          .). Cambridge University Press, New York, NY, USA.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Thuy</surname>
            <given-names>T.T.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          and Bruce G. Wilson.
          <year>2010</year>
          .
          <article-title>Fuel consumption estimation for kerbside municipal solid waste (MSW) collection activities</article-title>
          .
          <source>Waste Management &amp; Research</source>
          <volume>28</volume>
          ,
          <issue>4</issue>
          (
          <year>2010</year>
          ),
          <fpage>289</fpage>
          -
          <lpage>297</lpage>
          . https://doi.org/10.1177/0734242X09337656
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Federico</surname>
            <given-names>Perrotta</given-names>
          </string-name>
          , Tony Parry, and
          <string-name>
            <surname>Luis</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Neves</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Application of machine learning for fuel consumption modelling of trucks</article-title>
          .
          <source>In 2017 IEEE International Conference on Big Data, BigData</source>
          <year>2017</year>
          , Boston, MA, USA, December
          <volume>11</volume>
          -
          <issue>14</issue>
          ,
          <year>2017</year>
          .
          <fpage>3810</fpage>
          -
          <lpage>3815</lpage>
          . https://doi.org/10.1109/BigData.
          <year>2017</year>
          .8258382
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] Chotirat Ann Ratanamahatana, Jessica Lin, Dimitrios Gunopulos,
          <string-name>
            <given-names>Eamonn J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          , Michail Vlachos, and
          <string-name>
            <surname>Gautam Das</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Mining Time Series Data</article-title>
          .
          <source>In Data Mining and Knowledge Discovery Handbook</source>
          , 2nd ed.
          <fpage>1049</fpage>
          -
          <lpage>1077</lpage>
          . https: //doi.org/10.1007/978-0-
          <fpage>387</fpage>
          -09823-4_
          <fpage>56</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Elnaz</given-names>
            <surname>Siami-Irdemoosa and Saeid R. Dindarloo</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Prediction of fuel consumption of mining dump trucks: A neural networks approach</article-title>
          .
          <source>Applied Energy</source>
          <volume>151</volume>
          (
          <year>2015</year>
          ),
          <fpage>77</fpage>
          -
          <lpage>84</lpage>
          . https://doi.org/10.1016/j.apenergy.
          <year>2015</year>
          .
          <volume>04</volume>
          .064
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zeng</surname>
            <given-names>Weiliang</given-names>
          </string-name>
          , Miwa Tomio, Wakita, and
          <string-name>
            <given-names>Morikawa</given-names>
            <surname>Takayuki</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Exploring Trip Fuel Consumption by Machine Learning from GPS and CAN Bus Data</article-title>
          .
          <source>Journal of the Eastern Asia Society for Transportation Studies</source>
          <volume>11</volume>
          (
          <issue>12</issue>
          <year>2015</year>
          ),
          <fpage>906</fpage>
          -
          <lpage>921</lpage>
          . https://doi.org/10.11175/easts.11.906
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Sandareka</given-names>
            <surname>Wickramanayake</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. M. N. Dilum</given-names>
            <surname>Bandara</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Fuel consumption prediction of fleet vehicles using Machine Learning: A comparative study</article-title>
          .
          <source>2016 Moratuwa Engineering Research Conference (MERCon)</source>
          (
          <year>2016</year>
          ),
          <fpage>90</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Min</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Hui Jin, and
          <string-name>
            <given-names>Wenshuo</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>A review of vehicle fuel consumption models to evaluate eco-driving and eco-routing</article-title>
          .
          <source>Transportation Research Part D: Transport and Environment</source>
          <volume>49</volume>
          (
          <year>2016</year>
          ),
          <fpage>203</fpage>
          -
          <lpage>218</lpage>
          . https: //doi.org/10.1016/j.trd.
          <year>2016</year>
          .
          <volume>09</volume>
          .008
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>