<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Viswanathan, J. Appel, L. Chang, I. V. Man,
R. Saba, A. Gamel, Development of an assessment
model for predicting public electric vehicle charg-
ing stations, European Transport Research Review</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/electronics8101190</article-id>
      <title-group>
        <article-title>Data-driven energy demand forecasting for electric vehicle charging infrastructure</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Silvia Meddi</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Cavaglion</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tania Cerquitelli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele Manfredi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Regalia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafaele Menolascino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Zardo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Accenture S.p.A</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Control and Computer Engineering</institution>
          ,
          <addr-line>Politecnico di Torino</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>F2M eSolutions</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Politecnico di Torino</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>10</volume>
      <issue>2018</issue>
      <fpage>763</fpage>
      <lpage>771</lpage>
      <abstract>
        <p>In the era of Big Data and electric vehicles growth by market, data-driven methodologies assume a crucial role to create valuable information. The focus is on supporting the decision-making process for the development of an accurate charging infrastructure. Forecast analysis allows prediction of energy demand over the network. This supports growing trends with a consequent increase in customer satisfaction. By anticipating potential breakdowns due to infrastructure overloads, maintenance costs are reduced. In this paper, we focus on analyzing charging sessions data together with external data (weather and population information and energy/fuel prices) collected from diferent sources. The proposed methodology, named GEORGE (enerGy dEmand fOrecasting foR charGing infrastructurE), ofers a building-blocks based approach for the monthly energy demand forecasting. The approach is both generalisable and data-specific. We discuss the results of a classification learning approach to predict a belonging range of kwh for a charge point. In particular the most promising model has good performances in predicting high utilization and is more advantageous to support the company's decision-making process. Many possible developments are discussed to improve the prediction.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Mining</kwd>
        <kwd>Electric Vehicle</kwd>
        <kwd>Predictive Model</kwd>
        <kwd>Applied Data Science</kwd>
        <kwd>Interpretable Model</kwd>
        <kwd>Energy Demand</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        duce 2. Moreover, the overproduction of energy due
to a disproportionate installation of charging stations
In order to ensure transport emissions reduction in Eu- will decrease, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A relevant player in this scenario is
rope and to meet, for example, the Paris Agreement, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], F2M eSolutions that designs innovative technologies to
and Agenda 2030 objectives, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], many goals need to be lead the transition to electric mobility. They ofer
chargachieved over the next twenty to forty years. Two of the ing solutions and services that will make this change
17 goals, set by the Agenda, to be completed by 2030 at intuitive and seamless. Additionally, with the
anticipaEuropean level are: the fight against climate change and tion of widespread adoption of Electric Vehicles (EVs) in
the creation of sustainable communities and cities. These Europe, both public and private developers of charging
objectives include the following targets. Integration of infrastructure heavily rely on proper data collection and
national policies, strategies and plans against climate usage to make informed decisions, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Electric mobility
change and reduction of negative environmental impact is currently a topic of great interest for Europe and for
of cities with regard to air quality. Transport sector in researchers. Many studies focus on how to take
advangeneral and circulation of cars in particular directly in- tages from data to support Low-Carbon Road Transport
lfuence greenhouse gas emissions, being responsible of Policies in Europe, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Both the development of charging
12% of total emissions at European level, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Thus, the infrastructures, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the promotion of e-mobility to
promotion of e-mobility through an interconnected and improve the user experience, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] depend on an accurate
optimized network of charging stations will help to re- utilization of data. Finding a general method to
examine the impact of external factors on the energy demand
Published in the Workshop Proceedings of the EDBT/ICDT 2023 Joint for charging infrastructure in Italy is currently an
unreConference (March 28-March 31, 2023, Ioannina, Greece). solved topic in the literature. From a business side, this
$ s282109@studenti.polito.it (S. Meddi); paper aims to answer to two main necessities. First,
suptsaanraia.c.caevraqguliiotenl@li@acpcoelnittou.irte.(cTo.mCe(Srq.uCiatevlalig);lion); port and improve the management of charging stations
emanuele.manfredi@accenture.com (E. Manfredi); through a forecast of monthly energy demand. Second,
andrea.regalia@accenture.com (A. Regalia); let the client to identify targeted interventions in specific
rafaele.menolascino@accenture.com (R. Menolascino); areas. These actions can be supported by interpretable
guido.zardo@f2m-esolutions.com (G. Zardo)
      </p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License and accessible results that allow instant interaction with
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) data. To reach the goals we used F2M eSolutions’
charging session historical data integrated with external data many research orientate the analysis on data-centric
apsources. Data collection and data preprocessing were proaches.
the most time-consuming activities. The collection goes Studies cited in this section can be grouped in four
from the company’s internal data (charge point ID, City, macro-categories: (i) application of data science to the
Country, kwh, . . . ) to datasets containing information world of EVs, (ii) support for the transition to electric , (iii)
about the population, weather data, gas oil/lpg/fuel and monitoring of infrastructure utilisation through Key
Perenergy prices and characteristics of the territory, collected formance Indicators (KPIs) and (iv) study of the exogenous
from third parties. Diferent data science models like factors that afect energy demand .</p>
      <p>
        Random Forest, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and XGBoost, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], have been trained Applications of data science methods to electric
sceto determine the approach that better represents the phe- nario already include a wide spectrum of supervised
nomenon under analysis. In the considered case study, and unsupervised learning approaches. Several works
we followed a classification approach as it was the most have already been proposed in this context. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] analyses
consistent with business needs: predicting the kwh range the economic benefits of applying data-driven models,
of monthly energy demand. In fact, unlike a daily fore- i.e. data mining, and machine learning techniques, in a
cast, it provided a general overview of the infrastructure business context. Paper [11] provides a comprehensive
status. This approach showed good results in predicting overview of use cases that link this scenario to data
scithe observations belonging to the class with the high- ence, through machine learning algorithms. The authors
est demand, which was the major focus from a business emphasise the scientific interest of the study in the field
side. The final output is thus able to support business of e-mobility. A data-driven approach to extract useful
decisions in order to achieve a more eficient charging information from electric vehicle charging events is also
infrastructure usage. The use case on which the project suggested in paper [12]. In this case, a framework was
has been developed is the 2022 Italian scenario. The pa- developed to characterise the demand for electric vehicle
per is organized as follow. The first section is composed charging in a specific geographical area. A key step in
by a brief literature review. It introduces the state-of-art this methodology is the one of data cleaning and
foron the promotion and support of e-mobility through data. matting that is defined specifically for dealing with EVs
Section two details the methodology, from the prepro- charging data.
cessing step to the evaluation of the predictive results. Investing into realistic technological solutions,
emSection three shows the results obtained by using the powering citizens and aligning action in key areas, i.e.
proposed methodology on Italian use case. Finally, sec- industrial policy, ensure a fair transition to electric. The
tion four summarizes the content of our work, providing authors of [13] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] focus their studies on predicting
conclusions and suggestions for future improvements. charging point occupancy, to support this shift. In [13]
the aim is to understand whether the city of San Diego
has suficient charging points to meet the energy demand
1. Literature Review of EVs through a quantitative and qualitative analysis. In
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the goal is to support users in planning their charging
processes. In particular, they provide a double approach:
classification to predict individual charging point
occupancy and regression to predict overall charging station
occupancy both in public and a workplace site. [14]
comNowadays, collecting vast amounts of data has become a
widespread practice in many scenarios. If done properly,
it can be a powerful tool that can ofer significant
beneifts to companies. With the growing market of electric
vehicles, interconnection and data transfer technologies,
pares performances of two diferent approaches. The 3. propose a general approach that can be expanded,
ifrst one based the installation of new charging points thanks to the integration of external data sources,
on a request by an electric vehicle driver. The second to investigate the influence of diferent factors on
one chooses for the placement of a CP near strategic lo- future behaviors.
cations, considering the decision of a local government.
      </p>
      <p>Results show that not one rollout strategy is favorable
over the other. Moreover, the best strategy needs to be 2. Methodology
chosen according to municipal objectives, the maturity
of the market and the technologies available. Here we present the GEORGE methodology, whose</p>
      <p>
        The large amount of data collected brings with it the building-blocks representation is shown in Figure 1. It
need to understand how to extract the most meaning- combines a solid theoretical background with a necessity
ful knowledge. Monitoring properly constructed KPIs to solve real business needs. It uses historical data to
usually allows to achieve the goal. [15] presents a study monitor charging stations and develop forecasting
moddeveloped using data from the vast public charging infras- els to predict the monthly energy demand. It consists
tructure of the Dutch metropolitan area. The researchers of a KDD process (Knowledge Discovery process from
want to identify diferent charging patterns between five Data) adapted to business needs: support more eficient
areas and relevant related KPIs. Thanks to forecasting development and utilization of the charging
infrastrucand simulation models, they answer significant ques- ture.
tions like where, when and what type of new charge GEORGE consists of four building blocks:
points should be installed. In the development of the • data preprocessing step to clean and merge data
study, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the authors propose a web-based dashboards from diferent sources;
to explain particular well- or ill-performing charging • data transformation block to adapt the input data
stations. The platform aims to support the projected structure to the final purpose;
growth of electric mobility through the extraction of rel- • predictive analytics step to derive the most
suitevant knowledge. Performances of the existing charging able descriptive model to perform accurate
preinfrastructure, measured by KPIs developed, drive the dictions;
know-how.
      </p>
      <p>
        Many studies confirm that exogenous factors influence • tehvealguoaotidonneassndofint hteerpmroetdaetliotnhroofurgehsuslptsectoificamsseets-s
charging behaviors. Just to name a few, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] presents a rics.
data processing platform TEMA (Transport tEchnology
and Mobility Assessment) designed for supporting EU In the following sub-sections, we reported a detailed
transport policies through big data. This study shows description of each building block.
the implementation of a method capable of managing a
significant amount of data from various sources. Thanks 2.1. Data Preprocessing
to data-driven model, TEMA is able to recognize subtle
connections and hidden patterns, performing customized Data preprocessing usually is the first step in the KDD
analyses. Many governments started to base their charg- pipeline because prepares data for analysis in the most
ing network definition on data-driven roll-out strategies. suitable way. For internal data, the json files recorded for
The study [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] identifies and interprets the most impactful each charging sessions were stored in AWS (Amazon Web
characteristics that are correlated with energy consump- Services) s3 bucket. Data was transformed into tables via
tion. Authors ofer useful perspectives on what data need Athena, an analytics service of AWS, downloaded and
to be utilized to create prediction models and to guide the saved locally as csv files. The external data was extracted
planning and implementation of charging infrastructure. from open source data base in csv format. All input data
      </p>
      <p>Literature shows that several prediction models enable was processed through python code using Anaconda
relevant knowledge to be extracted from the data. Most environment (e.g. Spyder). GEORGE preprocessing was
of the studies are very objective-specific and do not allow customized in three specific steps: (i) Data Ingestion and
generalization. Main contributions brought by our study Cleaning, (ii) Data Integration from diferent sources and
are: Alignment to the same granularity, (iii) Data Aggregation
and Creation of new KPIs.</p>
      <p>The objective of the Data Preprocessing was to have
two macro categories of features:</p>
      <sec id="sec-1-1">
        <title>1. bring the analysis to Italian level. The growing</title>
        <p>circulation of electric vehicles in Italy allows to
start analysing the Italian scenario. Until now
this panorama is little mentioned in literature;
2. introduce a Sliding Window based approach to
align the input structure to the final need;</p>
        <sec id="sec-1-1-1">
          <title>Monthly Based Features (named Type 1 in the follow</title>
          <p>ing) that are information available on a monthly basis
and include: charging session information, weather data
and energy, gas oil, fuel and lpg prices.</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Social-Economic and Geographical Features (named</title>
          <p>Type 2 in the following) refer to economic and social
aspects of the population and to characteristics of the
territory. This set includes, for example: average age
of the population, employment rate and population density.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Since environmental factors can influence energy de</title>
        <p>mand, we included external data in addition to
charging behavior information. The comparison with domain
experts led to the following considerations. External
temperatures can afect performance of electric vehicles
batteries and therefore the number of charging sessions.
Moreover, energy prices have direct influence on the
demand. On the other hand, since an electric car owner also
owns a traditional vehicle, we decide to include gas oil,
fuel and lpg prices. Social-economic and geographical
factors are able to model the wealth of the population and
therefore, indirectly, the ownership of electric vehicles.</p>
        <p>Data ingestion for Type 1 features originated from
various sources:
KPIs. For each couple (, ) = (ℎ, ℎ),
with at least one charging session, we computed the
following metrics through a monthly level aggregation:
total number of sessions, total session time, kwh provided
and main statistics for weather values.</p>
        <p>From them we computed the following additional
indicators, average session time and charge point Occupation
Rate:
ave. sess. time [min] =
tot. session time [min]</p>
        <p>tot. #of sessions
tot. session time [min]
OR [%] = tot. #of days × 24 × 60[min] × 100</p>
      </sec>
      <sec id="sec-1-3">
        <title>From Type 2 features prospective, the preprocessing</title>
        <p>was easier. Data ingestion source included only Italian
National Institute of Statistics (ISTAT) and data
granularities were at Italian region and annual level. It was
not possible to decrease the aggregation data level and
therefore this set of information was simply added to the
dataset, skipping step (ii) and (iii).
2.2. Data transformation
• time level
• geographical level
and for this reason they had diferent granularities:
• F2M eSolutions, for charge points and charging
sessions data;
• Meteo.it, for weather information;
• Italian Government web site, for prices trends.</p>
      </sec>
      <sec id="sec-1-4">
        <title>The step of data cleaning was managed in a specific way for each data source. In presence of internal anomaly, e.g., transmission or estimation error, the corresponding data was removed.</title>
        <p>In case of external information not available, diferent
strategies were implemented like replace the missing
values with the closest geographical and temporal
information.</p>
        <p>During data integration, diferent data sources were
aligned to the same granularity using supporting datasets,
e.g., for geographical mapping.</p>
        <p>Once, the collections had the same granularity we
proceeded with the aggregation and definition of new</p>
      </sec>
      <sec id="sec-1-5">
        <title>Data transformation step can be essential for the success</title>
        <p>of the entire KDD process and it is typically very
projectspecific.</p>
        <p>In order to forecast energy consumption using
historical data, we decided for a Sliding Window based approach.</p>
        <p>This method uses a moving window to analyse sequential
data and predict future values over a time period. The
– state, region, city, longitude and latitude approach is justified by the capacity of historical trends
for charge points and charging session; to afect current needs. We defined two variables, one to
– region and province for weather informa- identify the time window width, X, and the other one to
tion; define the time horizon for the prediction, or time forecast
– state for prices; window, Y.</p>
        <p>Data transformation process consisted of four steps to
modify dataset structure, plus a preliminary step for the
– year/month/day hour:minutes:seconds for encoding of all nominal variables.</p>
        <p>charging sessions; Step 1 We divided dataset into smaller ones each with
– year/month/day for weather information; information of historical data and prediction window, e.g.
– month for prices.  +  months.  months, then named as  − , . . . ,
 − 1, are for the construction of the time window while
the last month,  0, refers to the time horizon.</p>
        <p>Step 2 Not all charge points had data about the entire
time window. In absence of useful information to
reconstruct the lack of charging session data, we decided to
ifll missing information with zeros. Thus, we modelled
in the same way, a hypothetical failure of charge points
and lack of charging sessions due to other reasons.</p>
        <p>Step 3 Separately, each subdataset has been
transformed to obtain the Sliding Window based structure.</p>
        <p>Figure 2 the configuration with a time window of two
months and a time horizon of one month.</p>
        <p>Step 4 Finally, all sub-datasets were merged together, changes for each subset of charge points as shown in
resulting in a single dataset where rows contained pre- Figure 2.
dictors for  +  consecutive months. Then, through feature selection phase, conducted by
correlation analysis, Recursive Feature Elimination with
Cross Validation (RFECV ) and domain expert support,
only the optimal set of predictors was identified to give
rise to the best model. Usually, selecting a restricted set
of top predictors helps to reduce noise in the model and
makes result interpretation more straightforward and
efective for business experts.
2.4. Evaluation and Interpretation</p>
      </sec>
      <sec id="sec-1-6">
        <title>Model evaluation is an essential part to understand relia</title>
        <p>bility of prediction against real value.</p>
        <p>In this study, identifying charging points that had a
growing trend and whose demand might overload the
network was the focus of the analysis. Particularly, we
2.3. Predictive analytics were interested on avoiding misclassification of charge
points that belonged to high demand class. Otherwise
To obtain the data-driven model that best represented the this underestimation of the load might lead to potential
analysed phenomenon, GEORGE required the validation risk of breakage in the infrastructure. In this sense, we
of diferent algorithms. Therefore, the optimal model were primarily interested in reaching high values for
was the one with the best combination of performance metrics of this class, named H in the following.
and ability to provide results. To extract the best combination of model-purpose</p>
        <p>Monthly energy forecasting could be approached using GEORGE integrated stratified-cross validation technique
two diferent strategies: punctual or categorical value and, in case of limited data, the leave-one-out method.
forecasting. While the first one is more specific, it would The latter technique consists of using a subset of data for
require a large amount of data to obtain accurate models. training the model and one to validate it, applying then
Therefore, a multi-class classification task would be of the learning process for each subset, [16]. For LOOCV
wide interest on several case studies where, especially (Leave One Out Cross Validation) the test set consists of
initially, data might be limited. In fact, from a business one observation.
side was definitely more functional to have a range of For the classification task, chosen for business need,
kwh rather than a point value. model performance was estimated based on
misclassifi</p>
        <p>We defined classes based on diferent levels of per- cation. Below the metrics selected to evaluate multi-class
centiles, a statistical approach that grants balanced classes. classification models on class H, [17]:
This procedure allowed the identification of diferent Precision
amounts of load for the charging infrastructure.</p>
        <p>
          SliGdiEnOgRWGiEndinowtegbraasteedd svtaruricotuusrefodreesccarsitbinedg amboodveel.s Museitnhg-  = #oftootbasl.#coofrorebcst.layscsliagsnseifieddtoinclcalassss  .
ods chosen are tree-based: Random Forest (for details, Recall
see [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]) and XGBoost (eXtreme Gradient Boosting), see
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. These are two examples of ensemble learning
techniques. In particular, Random Forest combines multiple #of obs. correctly classified in class 
decision tree models to improve general performance.  = total #of observations belonging to class 
One of the advantages is that this technique has higher
accuracy than single decision trees. Additionally it is F1 score for class H is the harmonic mean of the
preinterpretable, robust to noise and outliers. Then, the sta- cision and recall
tistical framework of XGBoost casts boosting as a numer- 2 *  * 
ical optimization problem. The objective is to minimize  1 =  + 
model loss by adding weak learners using a stochastic
gradient descent-like procedure.
        </p>
        <p>Models operated on historical data, corresponding to The interpretation of model performances allowed to
predictors of Type 1 and Type 2 for  − , . . . ,  − 1 re-evaluate some previous steps such as feature selection.
and  0, to forecast the energy demand for  0, which</p>
      </sec>
      <sec id="sec-1-7">
        <title>The trade of between results and characteristics of</title>
        <p>models (such as interpretability) determined the choice
of the final model to be applied to real time data.</p>
        <p>In the following section we present the preliminary
results of the optimal model for a use case in Italy.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Preliminary results</title>
      <sec id="sec-2-1">
        <title>An accurate manipulation of the huge amount of data</title>
        <p>that are available today is essential in order to make
informed decisions. In this scenario the Energy Demand
Forecaster for Italian charging infrastructure can assume
a relevant role.</p>
        <p>The goal was to create a model, based on temporal
aggregation of input data, able to predict the monthly
energy demand for single charge point. At the same time we
were interested on understanding which factors assumed
relevant role in the prediction and their interpretation.</p>
        <p>The business value is to support informed decisions to
help the network optimization. Moreover, adjusting the
load of the charging infrastructure can help in avoiding
overloads and breakages.
3.1. Data
diferent widths for the time window  and with the
support of domain experts we set  = 2 while for the
forecast window  = 1. This confirms that energy demand
is linked to historical data by a short-term relationship.</p>
        <p>The final dataset was thus composed by predictors for
three consecutive months of all the charge points. Two
months populated the data history while the kwh of the
last one was identified as target variable.
3.3. Predictive analytics and evaluation</p>
      </sec>
      <sec id="sec-2-2">
        <title>Since diferent algorithms were integrated into GEORGE</title>
        <p>we performed various experiments, leveraging LOOCV,
to compare performances.</p>
        <p>We recall that, from a business side, it was more
functional a classification approach that predicted the range
of kwh for each charge point rather than a punctual value.</p>
        <p>For each classification algorithm, we evaluated the
number of classes, the best subset of features, the size of
time window and the impact of hyperparameters. The
best model-purpose combination was obtained for the
Random Forest with 3 classes, 37 predictors summarized
in Figure 3, a 2-month data history and the following
hyperparameters:
In this section we analyse the results of a case study n_estimators = 250
in Italy for which we had data collected from April to
October 2022. On the territory the highest percentages max_depth = 109
were 20% in Piedmont and Lombardy and 10% in Lazio max_features = 18
and Emilia-Romagna. min_samples_leaf = 9</p>
        <p>Dataset contained data such as spatial, temporal and min_samples_split = 7
energy with diferent granularities: state, region, city,
longitude and latitude for the geographical level and year/- Where n_estimators is the number of trees in the
formonth/day hour:minutes:seconds for charging sessions est, max_depth is the the maximum depth of each tree,
for the time level. max_features is the maximum number of features to
con</p>
        <p>Principal information recorded in the internal data sider when splitting a node, min_samples_leaf is the the
were: charging session duration, geographical localization, minimum number of samples required in a node to be at
kwh provided during each sessions. Some charge points a leaf and min_samples_split is the minimum number of
characteristics like if it was subject to any restrictions, i.e. observations in any given node in order to split it.
was located in a parking station or it was not open 24/7, The model showed high precision, recall and F1 score
were also included in the dataset. in the third (or H ) class, resulting thus more performing</p>
        <p>On a monthly basis the mean of kwh provided was on forecasting high usage:
about 14.000 kwh for a mean of 500 session per month.
3.2. Data Preprocessing</p>
      </sec>
      <sec id="sec-2-3">
        <title>To obtain the best results from predictive model, we</title>
        <p>had to understand and structure our data precisely. We
started with data manipulations steps.</p>
        <p>We exploited cases and limit setting, obtaining more
than 3.000 charge points with at least one session and
about 10.000 charging sessions. Through the
aggregation and transformation steps we structured the dataset
to develop the forecasting model. After the evaluation of
improve the management of the charging network, but
also the user experience of electric vehicle owners.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Discussion</title>
      <p>Through GEORGE methodology, based on a solid
theoretical background and motivated by concrete business
needs, we succeeded in bringing tangible benefits to
customer and company. In particular, it made possible the
development of an interpretable and generalisable
estimator for forecasting the energy demand of the charging
infrastructure. The deployment of the model then enables
the methodology to be applied to everyday scenario. The
real time data stream is processed by GEORGE which
extracts relevant patterns. The results are summarised in
a customised dashboard that can monitor the state of the
network and guide the operators’ interventions based on
informed knowledge. Figure 5 summarizes key steps of
the process on real-time data stream. This can not only</p>
      <sec id="sec-3-1">
        <title>1Gini’s impurity index calculates each feature importance as the sum</title>
        <p>over the number of splits (accross all tress) that include the feature,
proportionately to the number of samples it splits.</p>
      </sec>
      <sec id="sec-3-2">
        <title>For the specific use case, the biggest efort was on the</title>
        <p>prediction of class with the highest values of kwh.
Correct classifications allow to prevent potential breakages
due to an overload of the infrastructure.</p>
        <p>Although it is a preliminary work, it is promising
because it shows good results for the prediction of the third
class. In addition, thanks to the intuitive interpretation,
results can be easily shown in an interactive dashboard
which can display the history and the forecast for each
charge point in navigable map.</p>
        <p>Applying GEORGE to Italian’s charging infrastructure
we can support business decisions with success. Among
various benefits, the more relevant are:
• network monitoring based on continuous
collection of data;
• promotion of underutilized areas to exploit the
infrastructure;
• load monitoring preventing potential breakages
and consequent spread of moneys for
maintenance;
• implementation of prompt actions guided by
predictive analysis;
• support of growing trend in high demand areas,
increasing customer satisfaction.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Although GEORGE is a general-purpose methodology,</title>
        <p>the feature modeling step can not be fully automated.
This phase must be modeled according to the final goals,
with specific and constant support from domain experts.
In addition, understanding which features afect the
prediction is an essential aspect to target informed decisions,
and it needs conscientious supervision.</p>
        <p>GEORGE works with diferent type of data and this
allows future developments of the analysis exploring
aspects of the phenomenon as deep and specific as the
business needs. It is possible to analyse the influence of
population behaviors and characteristics. On the other
hand, deeply understanding the impact of weather on
battery performance and its consequence on charging
demand assume an important role. In this sense, we can
leverage GEORGE to address diferent use cases
optimally, adjusting the blocks of the process based on new
needs.</p>
        <p>Together with the evolution of an electric reality,
research progresses and so future directions of this study
can be explored.</p>
        <p>Possible improvements are:
1. develop an interactive dashboard to monitor the
network and support informed decision through
the visualization of both historical and forecast
trends;
2. increase data history and quality. All the models
that we applied are data driven so more quality
data will definitely let the model to understand
better the relationship between input and output.
Decrease granularity, i.e., for the Type 2
predictors including information at municipality level,
will help to explicate the dependence of energy
demand on social-economics habits. Considering
other type of external data may help to discover
more hidden patterns, i.e., trafic information and
electric vehicles growth by market for sure
impact the forecast of energy demand. The first may
support distribution of load through the
infrastructure based on pick hours. The latter
introduces dependencies on past and, possibly, future
sales trend;
3. implement a hierarchical approach that includes
a three-class classification model and a
regression model for each class to also have a point
prediction for each charge point.</p>
      </sec>
      <sec id="sec-3-4">
        <title>In conclusion, improving the dataset is desirable to achieve a prediction based on a larger time forecast window, from a month to 3/6 months, to support the optimization of charging points installation plans.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work was partially supported by</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Straka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. V. D.</given-names>
            <surname>Poel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Buzna</surname>
          </string-name>
          ,
          <article-title>Analysis of energy consumption at slow charging infrastructure for electric vehicles</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>53885</fpage>
          -
          <lpage>53901</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3071180</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>per la coesione territoriale</article-title>
          ,
          <source>Agenda 2030 per lo sviluppo sostenibile</source>
          ,
          <year>2015</year>
          . URL: https: //www.agenziacoesione.gov.it/comunicazione/ agenda-2030
          <string-name>
            <surname>-</surname>
          </string-name>
          per
          <article-title>-lo-sviluppo-sostenibile/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>European</surname>
          </string-name>
          <string-name>
            <surname>Union</surname>
          </string-name>
          ,
          <article-title>Co2 emission performance standards for cars and vans, 2020</article-title>
          . URL: https://climate.ec. europa.eu/eu-action/
          <article-title>transport-emissions/ road-transport-reducing-co2-emissions-vehicles/ co2-emission-performance-standards-cars-and-vans_ en.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U. N. E.</given-names>
            <surname>Programme</surname>
          </string-name>
          ,
          <article-title>Supporting the global shift to electric mobility, 2021</article-title>
          . URL: https://www.unep.org/explore-topics/ transport/what
          <article-title>-we-do/electric-mobility/ supporting-global-shift-electric-mobility.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Troilo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Howarth</surname>
          </string-name>
          ,
          <article-title>Better energy eficiency policy with digital tools</article-title>
          ,
          <year>2021</year>
          . URL: https://www.iea.org/articles/ better
          <article-title>-energy-eficiency-policy-with-digital-tools.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M. De Gennaro</surname>
          </string-name>
          , E. Pafumi, G. Martini,
          <article-title>Big data for supporting low-carbon road transport policies in europe: Applications, challenges and opportunities</article-title>
          ,
          <source>Big Data Research</source>
          <volume>6</volume>
          (
          <year>2016</year>
          )
          <fpage>11</fpage>
          -
          <lpage>25</lpage>
          . URL: https: //doi.org/10.1016/j.bdr.
          <year>2016</year>
          .
          <volume>04</volume>
          .003. doi:
          <volume>10</volume>
          .1016/ j.bdr.
          <year>2016</year>
          .
          <volume>04</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Maase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dilrosun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kooi</surname>
          </string-name>
          , R. Van den Hoed,
          <article-title>Performance of electric vehicle charging infrastructure: Development of an assessment platform based on charging data</article-title>
          ,
          <source>World Electric Vehicle Journal</source>
          <volume>9</volume>
          (
          <year>2018</year>
          ). URL: https://www.mdpi.com/2032-6653/9/2/ 25. doi:
          <volume>10</volume>
          .3390/wevj9020025.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ostermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fabel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ouan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Koo</surname>
          </string-name>
          ,
          <article-title>Forecasting charging point occupancy using supervised learning algorithms</article-title>
          ,
          <source>Energies</source>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.3390/en15093409. doi:
          <volume>10</volume>
          .3390/ en15093409.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Friedman,</surname>
          </string-name>
          <article-title>The Elements of Statistical Learning</article-title>
          .
          <source>Data Mining, Inference, and Prediction</source>
          , Springer, New York, NY,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , KDD '16,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          . URL: https://doi.org/ 10.1145/2939672.2939785. doi:
          <volume>10</volume>
          .1145/2939672. 2939785.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>