=Paper=
{{Paper
|id=Vol-3762/490
|storemode=property
|title=Dictionary Learning for data compression within a Digital Twin Framework
|pdfUrl=https://ceur-ws.org/Vol-3762/490.pdf
|volume=Vol-3762
|authors=Laura Cavalli,Domitilla Brandoni,Margherita Porcelli,Eric Pascolo
|dblpUrl=https://dblp.org/rec/conf/ital-ia/CavalliBPP24
}}
==Dictionary Learning for data compression within a Digital Twin Framework==
Dictionary Learning for data compression within a Digital
Twin Framework
Laura Cavalli1,* , Domitilla Brandoni1 , Margherita Porcelli2,3 and Eric Pascolo1
1
CINECA, Via Magnanelli 2, Casalecchio di Reno (BO), 40033, Italy
2
Dipartimento di Ingegneria Industriale, Universitร degli Studi di Firenze, Viale Morgagni 40/44, 50134, Firenze, Italy
3
ISTIโCNR, Via Moruzzi 1, Pisa, Italy. INdAM Research Group GNCS.
Abstract
Digital Twin system plays a crucial role in several contexts, from smart agriculture to predictive maintenance, from healthcare
to weather modelling. To be effective, it involves a continuous exchange of massive data between IoT sensors on real world and
digital system hosted on HPC and vice versa. Nevertheless, the transmitted signals often exhibit high similarity, resulting in a
redundant dataset very suitable for compression. This paper shows how Dictionary Learning can be used as a preprocessing
technique for AI algorithms due to its ability to compress large data volumes up to 80% with a potential enhancement of the
performances acting both as a denoising and compression technique. This algorithm operates efficiently on various types of
datasets, from images to timeseries, and is well-suited for deployment on devices with limited computational resources, like
IoT sensors.
Keywords
Digital Twin, Dictionary Learning, parallel OMP, timeseries compression, images compression, anomaly detection, image
recognition
1. Introduction constitute the columns of ๐ท. In this work we will show
that DL has various features that make it very suitable
A digital twin can be simply seen as a system consist- for use in data compression and transmission: i) it en-
ing of two entities, a tangible, subject-of-interest, and its ables exceptional compression of redundant data due to
digital replica, interconnected by a continuous stream of its distinctive sparse factorization feature; ii) it is a ver-
data. In this context, data reflecting the physical entity satile approach being able to handle diverse data types,
are acquired through IoT sensors and sent to a dedicated including images and time series; iii) its solution can be
HPC which constitutes its digital mirror. Within the HPC, performed with an algorithm, supplied in this work, with
data undergoes AI analysis to simulate the behavior and low computational resource demand and independent of
potential scenarios of the physical entity. The resulting specific libraries, making it lightweight and well-suited
insights are looped back into the physical system, im- for edge computing.
pacting decision-making. An efficient transmission and The literature on DL comprises many applications
storage of such large volumes of sensor data are therefore across various fields, including denoising, inpainting,
crucial to reduce latency between the two systems ensur- classification, and compression. Regarding data com-
ing a reliable real-time digital representation, but this is pression, an interesting online DL approach is proposed
often prohibitively expensive. For this reason, it is neces- in [1] where massive datasets streamed through in a pre-
sary to explore compression algorithms that lighten and set order are compressed and denoised. Furthermore, the
speed up data transmission while preserving their mean- work [2] presents CORAD, a novel DL-based compression
ingful information. Among the available state-of-the-art algorithm for time series which is able to harness the cor-
compression tools, we explore Dictionary Learning (DL), relation across multiple related time series to eliminate
a robust sparse matrix factorization approach. Given a redundancy performing a more efficient compression.
matrix of signals ๐ , DL is able to learn a sparse repre- However, as far as we know, this work is the first to in-
sentation ๐ โ ๐ท๐ expressing each signal as a linear corporate DL as a compression method within the Digital
combination of few basis elements, called atoms, which Twins (DT) domain, using it as a powerful preprocess-
ing technique for both time series and images. Also, we
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
nized by CINI, May 29-30, 2024, Naples, Italy
developed an optimized DL algorithm for increasing its
*
Corresponding author. lightweight and efficiency in the DT framework.
$ l.cavalli@cineca.it (L. Cavalli); d.brandoni@cineca.it This work is structured as follows: Section II gives
(D. Brandoni); margherita.porcelli@unifi.it (M. Porcelli); a brief overview of the DL problem and of its solution.
e.pascolo@cineca.it (E. Pascolo) Section III integrates the DL approach within a DT frame-
0000-0002-8157-1459 (D. Brandoni); 0000-0003-0183-1204
(M. Porcelli)
work and presents the overall DL4DT workflow, while
ยฉ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Section IV discusses numerical results, conducting a de-
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
tailed analysis of the algorithm performance across var- Algorithm 1 OMP (naive approach) [4]
ious datasets. Additionally, it introduces several tech- Given y โ R๐ , the sparsity level ๐ , the dictionary
niques designed to improve the algorithm execution ๐ท โ R๐ร๐ and the stopping tolerance ๐ > 0
speed. All the codes necessary to reproduce the experi- Initialize ๐ฎ = โ
, e = y
ments shown in this paper are available at the following while |๐ฎ| < ๐ and โeโ2 > ๐ do
link: https://github.com/Eurocc-Italy/DL4DT. ๐ = argmax๐ โ๐ฎ ๐
/ |e d๐ |
๐ฎ = ๐ฎ โช {๐}
x๐ฎ = (๐ท๐ฎ๐ ๐ท๐ฎ )โ1 ๐ท๐ฎ๐ y
2. Dictionary Learning overview e = y โ ๐ท๐ฎ x๐ฎ
end while
The aim of DL is to discover an overcomplete set of basis
functions (atoms) able to represent in a sparse manner a
given set of data samples. Given a matrix of training sig-
Since at each step the current matrix ๐ท๐ฎ is updated by
nals ๐ โ R๐ร๐ (๐ โช ๐ ), DL seeks to find a dictionary
simply appending one column, a more efficient imple-
๐ท โ R๐ร๐ (๐ โช ๐) and a sparse matrix ๐ โ R๐ร๐
mentation can be obtained by exploiting the least squares
to represent ๐ โ ๐ท๐. The DL problem can be formu-
solution just computed at the previous step. The most
lated in many equivalent ways, each one promoting a
famous approaches make use of the Cholesky decompo-
different aspect of the problem as shown in detail in [3].
sition of ๐ท๐ฎ๐ ๐ท๐ฎ [4, sec. 2.2] or the QR decomposition of
In this case we decided to formulate it as a two variable,
๐ท๐ฎ [4, sec. 2.3]. Our computational experience showed
non-convex, constrained optimization problem of the
that the OMP-QR implementation is faster when applied
form
to DL [5]. Therefore, we implemented our parallel ver-
min โ๐ โ ๐ท๐โ2๐น s.t. โx๐ โ0 โค ๐ , ๐ = 1, . . . , ๐ sion of the OMP-QR code to speed-up the computational
๐ท,๐
times.
โd๐ โ2 = 1, ๐ = 1, . . . , ๐ Regarding the Dictionary Update step, the following
(1) minimization problem has to be solved
where the number of atoms ๐ and the sparsity level ๐ are
fixed. Here, โ ยท โ2 and โ ยท โ0 denote the โ2 and โ0 norm of min โ๐ โ ๐ท๐โ๐น s.t. โd๐ โ2 = 1, ๐ = 1, . . . , ๐
2
๐ท,(๐)
a vector, respectively, and โ ยท โ๐น is the Frobenius norm.
(4)
Problem (1) is NP-hard and admits multiple global op-
where the sparsity pattern of ๐ is fixed. For this task we
tima; therefore the convergence to the global minimum
followed the K-SVD approach [6].
is not guaranteed. In order to solve the DL problem, we
follow the usual alternate optimization approach. More
precisely, given the signal matrix ๐ and an initial dictio- 3. Dictionary Learning to reduce
nary ๐ท, at each iteration first the minimization problem
in ๐ is solved while ๐ท is fixed (Sparse Coding step)
latency in Digital Twin
and then the minimization problem in ๐ท is solved while Reducing data latency is one of the main challenges
keeping ๐ (possibly) fixed (Dictionary Update step). within the DT context. This section aims to outline
The problem to be solved at the sparse coding step can the proposed workflow, named DL4DT, to decrease data
be formulated as follows transmission time using DL as a compression technique.
min โ๐ โ๐ท๐โ2๐น s.t. DL4DT, illustrated in Figure 1, takes place in two stages.
โx๐ โ0 โค ๐ , ๐ = 1, . . . , ๐. (2)
๐ First of all (Fig.1 top), the data are collected from the phys-
that can be decomposed in the solution of ๐ problems, ical device, represented as a matrix ๐ and then transmit-
i.e. one for each signal ted to the digital counterpart. Here, the entire process of
DL factorization is applied to ๐ , resulting in the learning
min โy๐ โ๐ทx๐ โ22 s.t. โx๐ โ0 โค ๐ , ๐ = 1, . . . , ๐. (3) of a reliable and robust overcomplete dictionary ๐ท and
x๐
the sparse representation ๐. The dictionary ๐ท is both
For solving each problem (3), we employed Orthogonal saved on the digital system and transmitted back to be
Matching Pursuit (OMP), an iterative greedy algorithm saved also on the physical one. Afterwards, a new smaller
that selects at each step the atom which is best correlated dataset of signals ๐1 is collected (Fig.1 bottom). Instead of
with the residual e := y โ ๐ทx. Then it produces a transferring the complete ๐1 , we claim that computing its
new approximation by projecting the signal y onto the sparse representation ๐1 with OMP using the reference
dictionary elements that have already been selected (๐ท๐ฎ ). dictionary ๐ท from stage 1 is sufficient. Transmitting ๐1 ,
We report in Algorithm 1 a naive version of OMP where which is highly sparse, indeed improves transmission
the least squares solution ๐ฅ๐ฎ is computed from scratch time and reduces costs: solving a single Sparse Coding
at each step (refer to [4] for more details). step demands fewer computational resources compared
Algorithm 2 DL4DT: workflow of a DT process with DL
techniques.
Collect data on the physical counterpart in matrix ๐ .
Send ๐ to the digital system.
Compute the dictionary ๐ท and the sparse matrix ๐
with DL factorization of ๐ on the digital system.
๐=0
while True do
if ๐ = 0 then
Send the dictionary ๐ท to the physical system
and store it.
else
Compute ๐ using OMP-QR on the physical
system.
Send ๐ to the digital system.
end if
๐=๐+1
Compute ๐ห = ๐ท๐ on the digital system.
Apply AI algorithm using ๐ห as dataset.
if user_conditions then
Figure 1: First (top) and next (bottom) runs of DL4DT. break
end if
end while
to full DL, and transferring only ๐1 is lighter than send-
ing the entire ๐1 . Indeed, suppose that ๐1 has ๐ signals
of ๐ features each. Instead of passing all the ๐ ร ๐ ele- computing nodes each 2 ร CPU Intel CascadeLake 8260,
ments, with our method is enough to transmit the ๐ ร ๐ with 24 cores each, 2.4 GHz, 384GB RAM and NVIDIA
non-zero elements of ๐1 . Notice that in sparse matrices, Mellanox Infiniband 100GbE network.
each non-zero element is stored as a triplet (row_index,
column_index, non_zero_value) requiring a total storage 4.1. Datasets
of ๐ ร ๐ ร 3 values. Therefore, the benefit of transfer-
We focused on three datasets with various types of data
ring ๐1 results in a reduction of 1 โ 3๐ . Moreover, users
๐ (images or timeseries) and dimensions: MNIST [8], FordA
have the flexibility to specify under which conditions the
[9], and a fine-grained timeseries on the D.A.V.I.D.E. HPC
dictionary ๐ท has to be updated, in order to have more
system [10, 11]. D.A.V.I.D.E. is a supercomputer devel-
reliable results. For example, a reasonable choice can be
oped by E4 Computer Engineering [12] and hosted in the
updating the dictionary after a fixed period of time or
past by CINECA, with an integrated monitoring infras-
when the accuracy of the AI algorithm on the compressed
tructure called Examon [10]. In this work we focused on
dataset starts to decrease too much. We refer to these
a subset of the data collected by Examon: for each of the
conditions as user_conditions in the forthcoming Algo-
45 nodes, were considered 166 metrics such as core work-
rithm 2. As we will prove, DL4DT is very effective since
loads, temperatures, fan speeds, power consumption, etc
DL techniques allow to massive compression preserving
collected in 5-minute intervals. In detail, we focused on
main important features of the dataset. DL4DT has been
the 16th node.
resumed in Algorithm 2.
4.2. Dictionary Learning compression
4. Numerical Results To evaluate the effectiveness of our compression, it is es-
sential to compare the information generated by AI mod-
In this section, after introducing the datasets, we vali- els trained on both the original and compressed datasets.
date the DL approach as an effective compression tool This is crucial within the DT framework, where our pri-
for addressing DT latency problems. Then, we simulate mary aim is to extract valuable insights from compressed
and analyze the DL4DT workflow presented in Section data.
3, exploiting the DL ability to build a highly representa- We considered a CNN tailored for digit recognition
tive dictionary. All experiments were run on Galileo100 [13] on MNIST dataset, a CNN able to perform anomaly
[7], an HPC infrastructure owned by CINECA with 528 detection suggested in [14] on FordA and an autoencoder-
based model able to automatically detect anomalies in a pression settings. The overall accuracy, approximately
semi-supervised fashion ([10, 11]) on D.A.V.I.D.E. After 86%, is lower than previous cases as expected due to the
training the NNs described above on both original and real-world nature of the dataset. However we notice that
compressed datasets, we compared their performance the test accuracy reached by training the autoencoder on
on the same test set by studying the accuracy, which the compressed training dataset is almost identical to the
is defined as the ratio of the number of correct predic- one obtained with no compression. However, when deal-
tions over the total number of predictions. Figure 2 com- ing with imbalanced datasets, it is better to consider the
pares respectively the test accuracy achieved by the NNs F-score value achieved for each class (normal signals and
trained on the original dataset (green dotted line) and anomalies) rather than the accuracy. F-score value is de-
on a DL compression of MNIST (top) and FordA (bot- fined as F-score:= 2 ๐๐๐๐๐๐ ๐๐๐ร๐๐๐๐๐๐
๐๐๐๐๐๐ ๐๐๐+๐๐๐๐๐๐
, where ๐๐๐๐๐๐ ๐๐๐
tom) concerning a sparsity level of ๐ = 50 and a number and ๐๐๐๐๐๐ are the ratio of true positives to the total pre-
of iterations ๐พ = 20 (orange solid line) across various dicted positives and to the actual positives, respectively.
compression levels. The results obtained with other set- We notice that the F-score reached on normal signals,
tings of DL are shown in more detail in [5]. As expected,
100
100
95
accuracy
accuracy
90
90
80
85
60 70 80
70
40 50 60 70 80
% compression
% compression 100
F-score
100 95
90
90
accuracy
85
60 70 80
80
% compression
70 100
F-score
40 50 60 70 95
90
% compression
85
no compression s = 50 60 70 80
Figure 2: Accuracy of different compression levels with ๐ = % compression
50 compared to the accuracy with no compression on MNIST
no compression s=5
(top) and FordA dataset (bottom).
Figure 3: Accuracy (top), F-score on normal signals (middle)
the accuracy computed on the compressed datasets is and on anomalies (bottom) with DL with ๐ = 5 compared to
the case with no compression on D.A.V.I.D.E. timeseries.
lower than the one computed on the original dataset. De-
spite not matching exactly the original accuracy, we still
achieve extremely good results: with MNIST dataset we shown in the middle of Fig.3, remains almost unaffected
can even reach an accuracy of 97% with a compression of by compression: across various DL configurations, the
80% against an accuracy of 99% with no compression, this F-score consistently remains close to 98%, as the original
is probably due to the redundant nature of the datasets, case without compression. This finding aligns with our
which makes it possible to achieve high accuracy lev- expectations, as the training set in this example consists
els even with high levels of compression. On FordA an only of signals without anomalies. As for the F-score of
overall accuracy of 91% is reached even with high com- anomalies, shown at the bottom of Fig.3, we observe that
pression levels against 96% with no compression. Figure 3 this value increases when compression is more intense.
shows at the top the test accuracy achieved by the autoen- Examining the details of the Recall and Precision values
coder trained on the original D.A.V.I.D.E dataset (green for these cases (Table 1), we notice that, respectively, the
dotted line) and on the dataset compressed with DL with Recall for normal signals and the Precision for anomalies
๐ = 5 and ๐พ = 10 (orange solid line) and different com- are higher compared to the case without compression.
Table 1 of the OMP-QR code better suited for running on devices
Precision and Recall values for normal signals and anomalies with limited computational resources.
with no compression and 80% DL compression with ๐ = 5.
compression type of signal Precision Recall 4.3. Dictionary representativity
0% normal 99.8 95.4 As already mentioned, the data provided by a DT do not
80 % normal 99.8 96.3 usually show great variability. This section aims to verify
0% anomaly 79.8 99.1 whether the dictionary learned in the first stage is robust
80 % anomaly 84.2 99.1 enough to accurately represent newly collected data. If
successful, it would make it possible to run the sparse
coding step (OMP-QR) without the need for a dictionary
These two values (Recall of normal signals and Precision update. In particular we integrate the study of dictionary
of anomalies) take into account the cases where certain representativity into a simulation of the DL4DT workflow
signals are identified as anomalies even though they are on D.A.V.I.D.E. dataset, keeping track of the original sizes,
not. The higher the value, the more this type of error is compression levels, and times.
avoided. Therefore, it is consistent that DL compression The goal of the first stage is to learn a reliable and
can increase these values, as DL is known as a valuable representative dictionary. Thus, we begin by consider-
denoising tool, leading to improved anomaly detection. ing the 4432 signals of its training set. In our workflow
Let us explore some implementations of the code. In these data are sent to the digital twin where we choose
our scenario, we have to deal with substantial problem to apply the strongest yet most meaningful compression,
dimensions but we can also benefit of the computational i.e. compression of 80 % with ๐ = 20, ๐ = 349 and 10
resources of an HPC cluster in the first stage of the work- iterations. From previous studies we know that such a
flow presented in Section 3. These resources can be compression can reach an overall F-score level of about
fully employed in the OMP algorithm which can be par- 97.9% on normal signals and 90.7% on anomalies, taking
allelized with the Joblib python library [15] following around 3 minutes. Then the dictionary is stored both in
what was mentioned in Section 2. Figure 4 illustrates the the digital twin and sent back to the physical one.
speedup achieved by executing OMP-QR both serially After a fixed time interval a new matrix of signals ๐1 is
and in parallel with an increasing number of processors, collected on the physical system. We simulate this new
where speedup is the ratio of the execution time of the matrix of signals by taking the test set relative to the 16th
serial code to the execution time of the parallel code node, since it is completely new to the dictionary and
performing the same task. presents anomalies. We then compute its sparse repre-
sentation matrix ๐1 with a single run of OMP-QR with
16
๐ = 15, taking around 3 seconds. The sparse represen-
tation matrix is then sent to the digital system where is
Speed Up
used to reconstruct the signals as ๐ห1 = ๐ท๐1 . To evalu-
8 ate the information loss due to the data compression we
4
consider the autoencoder trained in the first run on the
2 compressed train set and look if it is still able to detect
the same anomalies testing it on the compressed test set
2 4 8 16
๐ห1 . We obtain extremely good results, achieving an F-
# tasks score of 97% on normal samples and 89.9% on anomalies.
ideal joblib These outcomes are very close to the results obtained
without compression, which were respectively 97.9% and
Figure 4: Speed up of OMP-QR algorithm in serial and with 90.7%. The DL setting that we choose is indeed a sensi-
Joblib parallelization. For this type of problem it is not mean- ble choice: increasing the compression level contributes
ingful to increase resources beyond 16 tasks. to smooth the signals with beneficial results, yet it re-
mains highly representative with the sparsity level set to
The proposed parallelization has a significant impact ๐ = 20. We conduct a similar experiment using random
on the total computational time of the DL algorithm: compression, instead of DL, retaining only 30% of the
when the plane DL algorithm is run sequentially with samples chosen randomly from the test set, obtaining
a single CPU, it requires about 20 hours to complete 20 a F-score equal to 98% on normal samples and 63% on
iterations on a matrix of size 784 ร 60.000, while the anomalies which is definitely worst. Thanks to this work-
same algorithm implemented with the Joblib parallelized flow, instead of transmitting the entire signal matrix ๐1
version of OMP-QR using 16 CPUs completes the task in of dimensions 165 ร 3074, is enough to compute and
about 5 hours. We have also developed a light C version transfer its sparse representation ๐1 which requires the
storage of 20 ร 3074 ร 3 elements. This results in mem- References
ory gain of 73%, requiring only 3 seconds and causing a
minimal loss of information. [1] R. Archibald, H. Tran, A dictionary learning al-
This process can be iterated multiple times, until the gorithm for compression and reconstruction of
dictionary ๐ท requires updating to ensure more accu- streaming data in preset order, Discrete and Con-
rate outcomes. For instance, the dictionary might be tinuous Dynamical Systems - Series S 15 (2021).
refreshed periodically or whenever the performance of doi:10.3934/dcdss.2021102.
the AI algorithm on the compressed dataset begins to [2] A. Khelifati, M. Khayati, P. Cudrรฉ-Mauroux,
significantly decline. The results confirm that the dictio- Corad: Correlation-aware compression of mas-
nary ๐ท learned on the training set manages to represent sive time series using sparse dictionary coding,
new signals quite effectively. Indeed the accuracy lev- in: 2019 IEEE International Conference on Big
els achieved by the signals reconstructed with the old Data (Big Data), 2019, pp. 2289โ2298. doi:10.1109/
dictionary ๐ท are good, allowing a significant gain in BigData47090.2019.9005580.
computational efficiency. [3] B. Dumitrescu, P. Irofti, Dictionary Learn-
ing Algorithms and Applications, Springer
Cham, 2018. doi:https://doi.org/10.1007/
5. Conclusions 978-3-319-78674-2.
[4] B. Sturm, M. Christensen, Comparison of orthogo-
The purpose of this work was to introduce a new effi- nal matching pursuit implementations, EURASIP,
cient and lightweight compression tool within the Digital 2012, pp. 220โ224.
Twins framework that has minimal impact on the accu- [5] L. Cavalli, Analysis and implementation of Dic-
racy of AI models trained on compressed data (DL4DT). tionary Learning techniques in a Digital Twin
The numerical experiments showed that both with time- framwork, Master thesis, University of Bologna,
series and images the algorithm exhibited excellent be- Bologna, Italy, 2023. Available at https://github.
haviour, managing to compress the dataset up to 80% com/Eurocc-Italy/DL4DT.
while preserving key information and therefore keeping [6] M. Aharon, M. Elad, A. Bruckstein, K-svd: An algo-
the accuracy almost unchanged. As shown in Section 4.3, rithm for designing overcomplete dictionaries for
the dictionary learned from training data was able to rep- sparse representation, IEEE Transactions on Signal
resent new signals in an accurate manner in a sparse way. Processing 54 (2006) 4311โ4322. doi:10.1109/TSP.
Moreover, in examples carried out on D.A.V.I.D.E. dataset 2006.881199.
turned out that such an algorithm also enhances data [7] Cineca, Galileo100, 2021. URL: https://www.hpc.
quality, serving as a potential preprocessing tool. Finally, cineca.it/systems/hardware/galileo100/.
due to the low computational cost of our parallel imple- [8] L. Deng, The mnist database of handwritten digit
mentation of the OMP-QR, this approach allowed for images for machine learning research [best of the
on-device data compression, particularly useful with de- web], IEEE Signal Processing Magazine 29 (2012)
vices like IoT sensors, effectively reducing data exchange 141โ142. doi:10.1109/MSP.2012.2211477.
between devices while retaining the most crucial infor- [9] J. Wichard, Classification of ford motor data (2009).
mation. In conclusion, we can state that the DL compres- [10] A. Borghesi, A. Libri, L. Benini, A. Bartolini, On-
sion algorithm effectively reduces the dataset memory line anomaly detection in HPC systems, CoRR
demand, resulting in faster data transmission and reduced abs/1902.08447 (2019). URL: http://arxiv.org/abs/
latency between distinct systems. Such a compression 1902.08447. arXiv:1902.08447.
tool can have significant implications in Industry, where [11] A. Borghesi, A. Bartolini, M. Lombardi, M. Milano,
network infrastructures may not be high-performing but L. Benini, Anomaly detection using autoencoders
a wise and efficient use of digital twin systems is crucial in high performance computing systems, CoRR
for optimizing and managing production. abs/1811.05269 (2018). URL: http://arxiv.org/abs/
1811.05269. arXiv:1811.05269.
Acknowledgments [12] E4 computer engineering., https://www.e4company.
com/en/, 2024.
This work is supported by the EUROCC Italy National [13] F. Chollet, Simple mnist convnet, https://keras.io/
Competence Center. The Competence Center is part examples/vision/mnist_convnet/, 2020.
of EUROCC project funded by the European High- [14] H. Fawaz, Timeseries classification from scratch,
Performance Computing Joint Undertaking (JU) under https://keras.io/examples/timeseries/timeseries_
grant agreement No 101101903. A special thanks to the classification_from_scratch/, 2023.
CINECA HPC department for their technical support. [15] Joblib, https://github.com/joblib/joblib, 2023.
The work of MP is partially supported by INdAM-GNCS.