<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Forecasting Li-ion battery State of Charge using Long-Short-Term-Memory network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irene Capodicasa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tania Cerquitelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Torino, Corso Duca degli Abruzzi</institution>
          ,
          <addr-line>24, Turin, 10129</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Estimating the state of charge (SOC) for lithium-ion batteries (LIB) has become a highly desirable task, but also critical, especially as electrified vehicles become more common. However, due to the non-linear behaviour of these batteries, accurately estimating SOC remains a challenge. As a result, traditional theory-based methods are often being replaced by data-driven approaches, thanks to the greater availability of battery data and advances in artificial intelligence. Recurrent neural networks (RNNs), in particular, are promising methods to be exploited, because they can capture temporal dependencies and predict SOC without a battery model. Long short term memory (LSTM), a specific type of RNN, can accurately predict SOC values in real-time and forecast future SOC values within diferent time horizons.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;State of Charge</kwd>
        <kwd>Long short term memory</kwd>
        <kwd>battery</kwd>
        <kwd>neural network</kwd>
        <kwd>estimation</kwd>
        <kwd>electric vehicle</kwd>
        <kwd>time horizons</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        (ECM) that requires extensive battery test models and
parameters [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Lithium-ion batteries have gained immense popularity In recent times, data-driven techniques have gained
in various industries, particularly in electric vehicles, significant popularity owing to advancements in
artifiand have become increasingly prevalent in recent years. cial intelligence and machine learning, coupled with the
LIBs are highly eficient, delivering a greater amount wider availability of battery data.
of energy for the same volume and mass compared to The most commonly used algorithms are:
conventional batteries like lead-acid batteries. Accurately
estimating the state of charge of a battery is crucial for • Artificial Neural Networks (ANN), which have
making informed decisions at all stages of its life. SOC the ability to function under non-linear
condiestimation also helps in enhancing vehicle performance, tions, and utilize inputs such as battery terminal
safety, and passenger comfort, reducing costs associated voltage, discharge or charge current, and
temperwith battery over-sizing, and improving overall vehicle ature. Unfortunately, they require large amounts
eficiency. However, direct measurement of SOC is not of training data to derive accurate data-driven
possible and must be estimated. models, thus high computational power and large</p>
      <p>
        The key technologies for estimating the state of charge memory storage are needed to perform the
learnof lithium-ion batteries for electric vehicles can be di- ing phase [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] eficiently;
vided into three main groups: (1) model-based methods • Support Vector Machine (SVM), which can deal
such as simplified electrochemical models and equivalent with noisy data and incorporate knowledge from
circuit models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], (2) machine learning methods other indicators such as energy, power, etc., but
, including neural networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; and (3) on the other hand, training can be very
timehybrid methods composed of two or more of the previ- consuming [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ];
ously mentioned algorithms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The most common • some other data-driven algorithms, e.g., fuzzy
model-based methods for SOC estimation are Coulomb logic and genetic algorithms (see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for further
counting and open-circuit voltage, which require many details).
parameter measurements, usually afected by noise. More
sophisticated models have been developed to deal with
these uncertainties, including an equivalent circuit model
      </p>
      <sec id="sec-1-1">
        <title>Hybrid techniques are utilized to enhance the precision</title>
        <p>and efectiveness of battery models while
circumventing the limitations of a single algorithm. The primary
disadvantage of these techniques is their reliance on
significant memory and computational power to execute
complex mathematical calculations.</p>
        <p>
          Battery modeling is a crucial step in developing a
prelimitations in accurately assessing the battery’s aging By mathematically describing the internal processes
process and updating the models continuously. Further based on electrochemical mechanisms, SEMs are
caparesearch is required to enhance the precision of battery ble of reflecting the battery characteristics: for example,
modeling. The process of modeling the behavior of bat- Coulomb Counting is an ampere-hour (Ah) counting
estiteries and their adaptive control technology involves the mation method that integrates the discharging or
chargutilization of expert system theories and artificial intel- ing current to determine the remaining charge in the
ligence. With the huge amounts of data generated by battery; another method is open-circuit voltage (OCV),
energy storage systems, it is natural to utilize machine which employs the battery’s stable electromotive force
learning algorithms for state and parameter estimation, while in the open-circuit state, and uses the correlation
although the accuracy of these approaches is low, which between the OCV and SOC to approximate the SOC value
may become a problem in the optimal monitoring of the [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The ECMs are created by combining resistors,
capacLIB state. itors, and voltage sources to generate a circuit network,
        </p>
        <p>
          In this paper, we present a data-driven methodology which results in their high level of accuracy, robustness,
based on LSTM cells to estimate SOC accurately. The and insensitivity to external disruptions [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]: the Kalman
proposed approach maps battery measurement signals iflter is one such example of an ECM that employs
mathesuch as voltage, current, and temperature to the battery matical equations to recursively compute a linear optimal
SOC. Specifically, the novel contribution of this paper is ifltering solution for SOC estimation [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
twofold: Machine learning methods for estimating SOC, which
• the application of the LSTM network to predict includes neural network methods, are summarized in
SOC at diferent time horizons. SOC works like a [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The Support Vector Machine (SVM) is one of them:
fuel gauge in a car, so an accurate prediction of it has an excellent generalization capability compared
SOC at a given time horizon can be critical for to neural networks that may have local minimization
users to know how long they can drive their car problems. A regression SVM is applied by [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to
prebefore the battery dies or stops working. On the dict SOC of a LIB. For estimating SOC, neural network
other hand, from car manufacturers perspective, algorithms can generally be divided into recurrent and
SOC real-time estimation and prediction at dif- non-recurrent. Recurrent algorithms have a memory
ferent future horizons can help monitor battery for the past, while non-recurrent algorithms depend on
status and plan future maintenance and repair the data input at the current time step. At McMaster
work. In addition, it is crucial when the battery University, Carlos Vidal and his team have shown that
reaches the end of its first life, and manufactur- training a feed-forward neural network is faster than
ers must decide whether it can be used for other training a recurrent network and that the error is smaller
purposes or disassemble it. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Creating a suficiently large battery dataset to train
• With the proposed method, we show how power- a deep neural network (DNN) for SOC estimation can
ful the LSTM is in predicting SOC: the key point be challenging. To reduce the required test data, a
Canais that data collected at diferent frequencies can dian university [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] proposes to use a specific DNN, the
lead to diferent results in predicting future val- LSTM, which can estimate the SOC of diferent types of
ues of SOC. This means that the granularity of lithium-ion batteries. In [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the authors propose a LSTM
the input data used to train an LSTM must be network to estimate SOC of a Panasonic 18650 battery
determined concerning the time horizon we want cell, and show that it provides competitive estimation
to use to estimate the battery SOC. performance compared to other algorithms reported in
the literature. In contrast to unidirectional RNNs, Yang’s
        </p>
        <p>
          The remainder of this paper is organized as follows. team at Beihang University proposed a model using a
Section 2 contains a literature review, highlighting the bidirectional LSTM and the study demonstrated the
netnew contribution of this work. Section 3 introduces the work’s ability to comprehend the temporal information
data set and describes the data preprocessing and model- present in sequential sensor data obtained from LIBs.
building steps. Section 4 presents the experimental re- This includes variables such as voltage, current, and
temsults, and Section 5 provides some concluding remarks. perature measurements captured in both forward and
backward directions. Additionally, the network
efec2. Related work tively summarizes temporal dependencies from past and
future contexts. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>SOC estimation methods can be divided into the fol</title>
        <p>
          lowing three categories: the simplified electrochemical
models (SEM)[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ][
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the equivalent circuit models (ECM)
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ][
          <xref ref-type="bibr" rid="ref3">3</xref>
          ][
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and the machine learning models [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ][
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] which
includes neural network models [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ][
          <xref ref-type="bibr" rid="ref8">8</xref>
          ][
          <xref ref-type="bibr" rid="ref9">9</xref>
          ][
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology</title>
      <sec id="sec-2-1">
        <title>In this section, we provide a detailed description of the</title>
        <p>steps we took to obtain a SOC estimate: as shown in
Figure 1, the first step is data collection of battery’s
parameters thanks to measurement sensors; the second step
is to preprocess and prepare the data to obtain a training
set and a test set to be fed into the network; the last step
is LSTM training and performance evaluation.</p>
        <sec id="sec-2-1-1">
          <title>3.1. Data collection</title>
          <p>
            Data plays a crucial role in driving innovative
advancements in battery development, modeling, and
management. For instance, the development of a battery
management system (BMS) to regulate battery operations
necessitates the usage of data both for its creation, as well as
for the training and calibration of the models employed
to estimate battery states, such as State of Health (SOH),
SOC, and Remaining Useful Life (RUL). The authors in
[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] presented an overview of publicly available battery
datasets: electric vehicle (EV) battery requirements difer
from those for laptops, cell phones, stationary energy
storage, and other devices. Thus, application-specific
data are needed for:
• cycle aging data: typically, input data include
in-cycle measurements of current, voltage, and
temperature and per-cycle measurements of
capacity and internal resistance or impedance;
• drive cycle data: input data are collected by
cycling batteries according to the drive schedules;
• chemistry cell modeling: is mainly based on the
short-term responses of current and voltage and
focuses on the impedance variance at diferent
battery SOC levels and temperatures;
• calendar aging: data include information related
to battery cycler such as voltage, current, capacity,
and energy from periodic characterization tests.
          </p>
          <p>
            Various countries and organizations create driving
cycles, which are employed to evaluate vehicles’
performance in terms of factors like fuel consumption, pollutant
emissions, and trafic impact. Cycling batteries can gather
a lot of data based on these driving schedules, which can
then be utilized for SOC estimation algorithms under
realistic conditions. The universally recognized driving
cycle tables can be classified into European, American,
and Asian driving cycles [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
          </p>
          <p>Most used in literature are the following American
(US) driving cycles:
• highway fuel economy test (HWFET) is a
chassis dynamometer driving schedule representing
highway driving conditions under 60 mph, for
the determination of fuel economy of light-duty
vehicles over highway driving cycle;
• federal test procedure (FTP-75) has been created
by US EPA (Environmental Protection Agency) to
represent a commuting cycle with a part of urban
driving including frequent stops, and a part of
highway driving;
• US06 is a supplemental federal test procedure
cycle that represents aggressive, high speed and/or
high acceleration driving behaviour;
• LA92 unified dynamometer driving schedule was
developed as an emission inventory
improvement tool, and is for Class 3 heavy-duty vehicles
(power-to-mass ratio is greater than 34);
• urban dynamometer driving schedule (UDDS) is
also known as "the city test" and represents city
driving conditions. It is used for light-duty
vehicle testing.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>3.2. Data preprocessing</title>
          <p>3.2.1. Data gathering</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>In this paper, the dataset used for training, testing and val</title>
        <p>
          idation of the network was released by the Department
of Electrical and Computer Engineering, McMaster
University, Hamilton, Ontario, and is freely available online
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>All tests were performed in a thermal chamber with
cell test equipment showed in Figure 2: the control
computer contains the databases with the driving schedules
and then, after the termal chamber is set to the desired
ambient temperature and the cell is fully charged, the
system starts to record the current drive cycle. The 3Ah
LG 18650HG2 battery cell was subjected to four drive
cycles, UDDS, HWFET, LA92, US06, and eight drive cycles
(mix 1-8) consisting of a random mix of UDDS, HWFET,
LA92, and US06. Following every test, the battery was
charged at a rate of 1C with 50mA until it reached a
voltage of 4.2V, after which it was turned of. It was ensured
that the battery temperature remained at or above 22°C
throughout this process. The cycler collects the following
measurements:</p>
      </sec>
      <sec id="sec-2-3">
        <title>The training data set consists of the eight mixed driv</title>
        <p>ing cycles and their corresponding charges, while the
test consists of the UDDS, LA92, and US06 driving cycles
and their corresponding charges. All the measurements
refers to tests performed only in a 25°C thermal chamber
since analyzing how diferent temperatures can afect the
SOC estimation is out of our work’s scope.</p>
        <p>• Time (time in seconds)
• TimeStamp (timestamp in MM/DD/YYYY</p>
        <p>HH:MM:SS AM format)
• Voltage (measured cell terminal voltage, sense strongly correlated with SOC. Unfortunately, as
menleads welded directly to battery terminal) tioned earlier, capacity was used to evaluate SOC, so
• Current (measure current in amps) both capacity and energy must be excluded as predictors.
• Capacity (measured amp-hours (Ah), with Ah So the variables used as inputs are voltage (V), current
counter, typically reset after each charge, test, or (A) and temperature (°C).</p>
        <p>
          drive cycle)
• Energy (measured watt-hours (Wh), with Wh 3.2.3. Sampling
counter, reset after each charge, test, or drive
cycle) The collected data from battery cyclers may have
dif• Battery_Temp_degC (battery case temperature, ferent frequencies: in our dataset [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], data related to
at the middle of battery, in degrees Celsius) driving cycles and mix have a time step of 0.1 seconds,
while data related to the charging phase have a slower
dynamic and were considered less important, so they
were stored at a lower data rate of 1 minute.
        </p>
        <p>As mentioned in Section 3.2.1, both the training and
test data contains drive cycles and charges, so both must
be up-sampled to obtain the same frequency.
3.2.4. Normalization
3.2.2. Feature selection: correlation</p>
      </sec>
      <sec id="sec-2-4">
        <title>The features relating to the battery are voltage, current,</title>
        <p>capacity, energy, and temperature. The response variable,
the state of charge of the battery, is obtained from the
capacity (Ah) of the battery calculated by the battery
cycler (detailed procedure is explained in Section 3.2.5).</p>
        <p>
          To measure how much the relationship between dataset
features and target variable (SOC) is close to a linear
function, the Pearson correlation coeficient is used: the
more the absolute value of the correlation coeficient is
higher, the more the correlation is stronger [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          From the correlation matrix in Figure 3, the strongest
correlation is between capacity and energy and both are
Since machine learning algorithms generally do not work
well with numerical attributes with diferent scales, the
dataset must be rescaled. To ensure that input features are
within a bounded range of 0 to 1, a technique called
minmax scaling, or normalization, is employed. While there
are several other methods available, min-max scaling is
preferred specifically for neural networks as unbounded
input features may pose dificulties [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
3.2.5. Data Labelling
Prediction algorithms require apriori knowledge about
the values to be predicted (i.e., SOC in our research).
Unfortunately, SOC cannot be measured directly but can
be easily estimated. Following the SOC definition, it is time of the network, and each of these batches is then
defined as the battery percentage remaining charge and fed into the LSTM network [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. A forward pass begins
is obtained by dividing its remaining capacity (Ah), by when the training data (all batches) are fed into the
netits nominal capacity. Each label (SOC) is associated with work, and ends when the SOC estimates are generated
the corresponding features for the real-time prediction of at each time step . Each forward pass is followed by a
SOC. On the other hand, when predicting SOC for difer- backward pass where the network weights and biases are
ent future time horizons, each label is shifted with respect updated: this cycle is referred to as epoch, and is denoted
to its initial position by the number of rows needed to by  .
reach that time horizon: for example, if we try to predict Training a very large neural network can be very slow.
SOC within 10 minutes and the data is sampled at 610 Hz, Beyond the mini-batches, one way to speed up training
each label will be shifted by 10 rows. is to use a faster optimizer than gradient descent [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
The most commonly used one [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ][
          <xref ref-type="bibr" rid="ref9">9</xref>
          ][
          <xref ref-type="bibr" rid="ref7">7</xref>
          ][
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is adaptive
3.3. Data modelling moment estimation (Adam): it keeps track of an
exponentially decaying average of past gradients and also an
Long Short Term Memory is a type of recurrent neural exponentially decaying average of past quadratic
gradinetwork. RNNs have an internal unit that can form a ents of the loss function, when weights and biases are
cycle to show the state history of the previous input. A updated. Once the network has computed all the hidden
general LSTM unit consists of: states ℎ of the last epoch, the estimated SOC is obtained
by a fully (dense) connected layer:
• input gate  =  (ΨΨ  + ℎℎ− 1 + )
which controls which value of the input should
be used to modify the memory. The sigmoid
function  decides which values to let through 0 or
1;
• memory
where  is the weights matrix and  is the bias
corresponding to the last fully connected layer. A loss function
cell  = − 1 + is needed to measure the prediction’s accuracy at each
tanh (ΨΨ  + ℎℎ− 1 + ) where time step. Mean absolute error (MAE) is chosen because
tanh function gives weights to the values which of its simple structure and easy calculations [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]:
are passed, deciding their level of importance
ranging from -1 to 1;
 = ℎ + 
• forget gate  =  (Ψ Ψ  + ℎ ℎ− 1 +  )
that regulates the details to be discarded from the
block using the sigmoid function  ;
• output gate  =  (ΨΨ  + ℎℎ− 1 + )
that is the result of the input and the memory
cell gate. Again here, the sigmoid function can
be zero-valued, so it can inhibit the flow of
information to the next computational node;
• state of the cell ℎ =  tanh  where tanh
function gives weights to the values which are
passed, deciding their level of importance ranging
from -1 to 1.
        </p>
        <p>Each gate has its set of network weights denoted by 
and a bias  is added at each matrix multiplication to
increase the flexibility of the network to the data. Ψ
denotes the vector of inputs to the network, which are the
voltage, current, and temperature of the battery measured
at time step .
3.3.1. Training
The training dataset consists of Ψ  =
[ (), (),  ()] as input and SOC as the response
variable to be predicted. Since neural networks require
large amounts of data to be trained, they are collected in
smaller fixed-size mini-batches to shorten the training
  =</p>
        <p>
          1 ∑︁| − * |
 =0
where  is the length of the sequence and  and
* are the estimated and true values of the battery’s
state of charge.
3.3.2. Evaluation
The validation of the algorithm is done using the test
set consisting of the three previously mentioned driving
cycles and their respective charges. Only one forward run
is required here since all parameters have already been
learned during training. The measures used to validate
the results obtained are again MAE and the root mean
squared error [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]:
        </p>
        <p>=0
⎯⎸ 
  = ⎷⎸∑︁ 1 ( − * )2.</p>
        <p>3.3.3. Prediction</p>
      </sec>
      <sec id="sec-2-5">
        <title>SOC prediction is performed again with the test set. We do not show prediction results here because the built-in model is not properly optimized, and this task is outside the scope of our work.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Preliminary experimental results</title>
      <sec id="sec-3-1">
        <title>As mentioned above, the vector of collected inputs</title>
        <p>is defined as Ψ  = [ (), (),  ()], where
 (), (),  () are the voltage, current, and
temperature measurements of the battery at time step ,
respectively. The train set consists of the eight mix and the
corresponding charges, and the test set consists of the
UDDS, LA92, and US06 drive cycles and the
corresponding charges. After performing all the preprocessing steps
explained in Section 3.2, both the training and test set
are split into batches to better feed the network.</p>
        <p>The model is implemented in Python 3.10.6 version
with Tensorflow 2.9.1 package: it is built sequentially,
starting with the input layer, then an LSTM layer, and
ifnally, a dense layer that computes the output. The
parameters of the network, summarized in Figure 4, were
kept constant in all experiments in order not to influence
the results:</p>
        <p>The following results are obtained by considering two
diferent sampling frequencies: 1Hz (one observation per
second) and 610 Hz (one observation per minute). Since we
consider two diferent data granularities, aggregate mean
and standard deviation of voltage and current were
calculated to avoid information loss when down-sampling the
data set: thus, the vector of inputs fed into the network
is
are quite low; on the other hand, when the time horizon
is higher and we predict SOC within the next 30
minutes, the training errors doubled with respect to results
obtained for real-time estimation (h = 0), and validation
errors are quite high. This is why we need to up-sample:
considering a dataset with observations every second is
not optimal to predict with reasonable accuracy within a
horizon of 30 minutes.</p>
        <p>The results in Table (a) are also confirmed by
evaluating the Pearson correlation coeficient between SOC
values corresponding to diferent time intervals (Lags):
• Lag is of 1 second: 0.9939
• Lag is of 10 minutes: 0.9263
• Lag is of 20 minutes: 0.7968
• Lag is of 30 minutes: 0.5404</p>
      </sec>
      <sec id="sec-3-2">
        <title>When two values of SOC have a diference of half</title>
        <p>an hour, the correlation coeficient drops significantly,
confirming that this algorithm with fine-grained data
cannot correctly predict SOC for 30 minutes or larger
time horizons.
4.2. Sampling to 610 Hz
To obtain train and test sampled to 1Hz, driving cycles
are down-sampled from 10 to 610 Hz. The collected results
are shown in Table (b). When predicting real-time SOC
(h = 0), both train and validation errors are lower than the
ones in Table (a). When the time horizon is of 10 minutes,
the training error is bigger, but the validation error is
lower. By looking at results with a time horizon of 20
minutes, there are no significant diferences between the
prediction with the fine-grained and coarse-grained data
set. The same happens when the time horizon considered
is of 30 minutes.</p>
        <p>By looking at the Pearson correlation coeficient
between SOC values corresponding to diferent time
intervals (Lags):
• Lag is of 1 minute: 0.9965
• Lag is of 10 minutes: 0.9692
• Lag is of 20 minutes: 0.8991
• Lag is of 30 minutes: 0.7938
Ψ  = [ (), (), (),   (),  (), The coeficients are more significant than the ones
 (), ()]. obtained when the frequency of input data is 1Hz, but
for time horizons of 20 and 30 minutes, the coeficients
4.1. Sampling to 1Hz are too small, confirming that to predict SOC for those
time horizons, a more coarse-grained data set is needed.</p>
        <p>Recalling the diferent frequencies of stored data of drive The improvements obtained with real-time SOC
escycles (and mix) and charges, to obtain train and test timation (h = 0) are due to the increasing information
sampled to 1Hz we down-sample driving cycles from 10 about the past given by the aggregation measures
comto 1Hz and up-sample charges from 610 Hz to 1Hz. The puted. When the time horizon is of 20 and 30 minutes
results obtained are presented in Table (a): the training er- the algorithm overfits when trying to predict with a
finerors with a time horizon of 0, 10 minutes and 20 minutes grained dataset: the training error is relatively low, but
Training errors</p>
        <p>Test errors</p>
        <p>h
0min
10min
20min
30min
0min
10min
20min
30min
the validation error is very high; moreover, when we try
to up-sample, dataset size reduces a lot, and this aspect
can influence the network performance negatively so
that when we validate the network, we cannot see
improvements, such as the ones with a time horizon of 10
minutes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Discussion</title>
      <p>In the final analysis, this work shows how the input data’s
diferent granularity can afect the LSTM network’s
performance in predicting SOC within diferent time
horizons: the more distant the considered horizon, the more
we need to aggregate the data and collect them in a
coarsegrained dataset. The results are promising, especially for
a time horizon of 10 minutes. A very important aspect
that should be considered for improvements of this work
and future developments is the reduction of the size of
the considered dataset: this can afect the performance
of the estimation, since neural network-based algorithms
always require a significant amount of data to derive
proper models. The more distant the horizon, the larger
the period of the collected data needs to be.</p>
      <p>Thanks to a promising estimation, some business
developments can be defined: the integration of anomaly
detection techniques in the battery management system
(BMS) based on future predictions of SOC, a predictive
maintenance plan for the battery, an appropriate target
for the second life of the battery to make the most of
its remaining useful life; the implementation of a user
interface that makes it easier for the owner of an electric
vehicle to monitor the battery parameters.</p>
      <p>Even though LSTM can self-learn its own parameters,
the results obtained in this paper are promising and can
be improved thanks to the optimization of LSTM
parameters, which will be addressed in future work.
Furthermore, it is essential to have a kind of regulation that
guarantees the uniformity of electric vehicle battery data,
and accessibility for analysts to train the algorithms to
improve SOC estimation easily: most innovative ideas
have been developed in the laboratory environment, so
further improvements are needed in the area of real-time
vehicle monitoring.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was partially supported by</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>State of the art of lithiumion battery soc estimation for electrical vehicles</article-title>
          ,
          <source>Energies</source>
          <volume>11</volume>
          (
          <year>2018</year>
          ). URL: https://www.mdpi.com/ 1996-1073/11/7/1820. doi:
          <volume>10</volume>
          .3390/en11071820.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Modeling and state-of-charge prediction of lithium-ion battery and ultracapacitor hybrids with a co-estimator</article-title>
          ,
          <source>Energy</source>
          <volume>121</volume>
          (
          <year>2017</year>
          )
          <fpage>739</fpage>
          -
          <lpage>750</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.energy.
          <year>2017</year>
          .
          <volume>01</volume>
          .044.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hannan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lipu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <article-title>A review of lithium-ion battery state of charge estimation and management system in electric vehicle applications: Challenges and recommendations</article-title>
          ,
          <source>Renewable and Sustainable Energy Reviews</source>
          <volume>78</volume>
          (
          <year>2017</year>
          )
          <fpage>834</fpage>
          -
          <lpage>854</lpage>
          . URL: https://www.sciencedirect. com/science/article/abs/pii/S1364032117306275. doi:
          <volume>10</volume>
          .1016/J.RSER.
          <year>2017</year>
          .
          <volume>05</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          , K. Liu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>Online soc estimation based on simplified electrochemical model for lithium-ion batteries considering current bias†</article-title>
          ,
          <source>Energies</source>
          <volume>14</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .3390/EN14175265.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malysz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kollmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Emadi</surname>
          </string-name>
          ,
          <article-title>Machine learning applied to electrified vehicle battery state of charge and state of health estimation: State-of-the-</article-title>
          <string-name>
            <surname>art</surname>
          </string-name>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>2980961</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. C. A.</given-names>
            <surname>Anton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J. G.</given-names>
            <surname>Nieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Viejo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A. V.</given-names>
            <surname>Vilan</surname>
          </string-name>
          ,
          <article-title>Support vector machines used to estimate the battery state of charge</article-title>
          ,
          <source>IEEE Transactions on Power Electronics</source>
          <volume>28</volume>
          (
          <year>2013</year>
          )
          <fpage>5919</fpage>
          -
          <lpage>5926</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TPEL.
          <year>2013</year>
          .
          <volume>2243918</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malysz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naguib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Emadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Kollmeyer</surname>
          </string-name>
          ,
          <article-title>Estimating battery state of charge using recurrent and non-recurrent neural networks</article-title>
          ,
          <source>Journal of Energy Storage</source>
          <volume>47</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1016/ j.est.
          <year>2021</year>
          .
          <volume>103660</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Chemali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Kollmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Preindl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Emadi, Long short-term memory networks for accurate state-of-charge estimation of li-ion batteries</article-title>
          ,
          <source>IEEE Transactions on Industrial Electronics</source>
          <volume>65</volume>
          (
          <year>2018</year>
          )
          <fpage>6730</fpage>
          -
          <lpage>6739</lpage>
          . doi:
          <volume>10</volume>
          .1109/TIE.
          <year>2017</year>
          .
          <volume>2787586</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kollmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chemali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Emadi</surname>
          </string-name>
          ,
          <article-title>Liion battery state of charge estimation using long short-term memory recurrent neural network with transfer learning (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Stacked bidirectional long short-term memory networks for state-of-charge estimation of lithium-ion batteries</article-title>
          ,
          <source>Energy</source>
          <volume>191</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1016/j.energy.
          <year>2019</year>
          .
          <volume>116538</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>dos Reis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Strange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Lithiumion battery data and where to find it</article-title>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          . 1016/j.egyai.
          <year>2021</year>
          .
          <volume>100081</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kollmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naguib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Skells</surname>
          </string-name>
          ,
          <article-title>Lg 18650hg2 li-ion battery data and example deep neural network xev soc estimator script 2 (</article-title>
          <year>2020</year>
          ).
          <source>doi:10.17632/CP3473X7XV</source>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Turney</surname>
          </string-name>
          ,
          <article-title>Pearson correlation coeficient (r) | guide examples</article-title>
          ,
          <year>2022</year>
          . URL: https://www.scribbr.com/ statistics/pearson-correlation-coeficient/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Farnham</surname>
          </string-name>
          , S. Tokyo, B. Boston,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sebastopol</surname>
          </string-name>
          , T. Beijing, Hands-on
          <source>Machine Learning with ScikitLearn, Keras, and TensorFlow Concepts</source>
          , Tools, and
          <article-title>Techniques to Build Intelligent Systems</article-title>
          , 2nd ed.,
          <year>2019</year>
          . (book).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>