=Paper=
{{Paper
|id=Vol-3762/490
|storemode=property
|title=Dictionary Learning for data compression within a Digital Twin Framework
|pdfUrl=https://ceur-ws.org/Vol-3762/490.pdf
|volume=Vol-3762
|authors=Laura Cavalli,Domitilla Brandoni,Margherita Porcelli,Eric Pascolo
|dblpUrl=https://dblp.org/rec/conf/ital-ia/CavalliBPP24
}}
==Dictionary Learning for data compression within a Digital Twin Framework==
<pdf width="1500px">https://ceur-ws.org/Vol-3762/490.pdf</pdf>
<pre>
                                Dictionary Learning for data compression within a Digital
                                Twin Framework
                                Laura Cavalli1,* , Domitilla Brandoni1 , Margherita Porcelli2,3 and Eric Pascolo1
                                1
                                  CINECA, Via Magnanelli 2, Casalecchio di Reno (BO), 40033, Italy
                                2
                                  Dipartimento di Ingegneria Industriale, Università degli Studi di Firenze, Viale Morgagni 40/44, 50134, Firenze, Italy
                                3
                                  ISTI–CNR, Via Moruzzi 1, Pisa, Italy. INdAM Research Group GNCS.


                                                 Abstract
                                                 Digital Twin system plays a crucial role in several contexts, from smart agriculture to predictive maintenance, from healthcare
                                                 to weather modelling. To be effective, it involves a continuous exchange of massive data between IoT sensors on real world and
                                                 digital system hosted on HPC and vice versa. Nevertheless, the transmitted signals often exhibit high similarity, resulting in a
                                                 redundant dataset very suitable for compression. This paper shows how Dictionary Learning can be used as a preprocessing
                                                 technique for AI algorithms due to its ability to compress large data volumes up to 80% with a potential enhancement of the
                                                 performances acting both as a denoising and compression technique. This algorithm operates efficiently on various types of
                                                 datasets, from images to timeseries, and is well-suited for deployment on devices with limited computational resources, like
                                                 IoT sensors.

                                                 Keywords
                                                 Digital Twin, Dictionary Learning, parallel OMP, timeseries compression, images compression, anomaly detection, image
                                                 recognition


                                1. Introduction                                                                                        constitute the columns of 𝐷. In this work we will show
                                                                                                                                       that DL has various features that make it very suitable
                                A digital twin can be simply seen as a system consist- for use in data compression and transmission: i) it en-
                                ing of two entities, a tangible, subject-of-interest, and its ables exceptional compression of redundant data due to
                                digital replica, interconnected by a continuous stream of its distinctive sparse factorization feature; ii) it is a ver-
                                data. In this context, data reflecting the physical entity satile approach being able to handle diverse data types,
                                are acquired through IoT sensors and sent to a dedicated including images and time series; iii) its solution can be
                                HPC which constitutes its digital mirror. Within the HPC, performed with an algorithm, supplied in this work, with
                                data undergoes AI analysis to simulate the behavior and low computational resource demand and independent of
                                potential scenarios of the physical entity. The resulting specific libraries, making it lightweight and well-suited
                                insights are looped back into the physical system, im- for edge computing.
                                pacting decision-making. An efficient transmission and                                                    The literature on DL comprises many applications
                                storage of such large volumes of sensor data are therefore across various fields, including denoising, inpainting,
                                crucial to reduce latency between the two systems ensur- classification, and compression. Regarding data com-
                                ing a reliable real-time digital representation, but this is pression, an interesting online DL approach is proposed
                                often prohibitively expensive. For this reason, it is neces- in [1] where massive datasets streamed through in a pre-
                                sary to explore compression algorithms that lighten and set order are compressed and denoised. Furthermore, the
                                speed up data transmission while preserving their mean- work [2] presents CORAD, a novel DL-based compression
                                ingful information. Among the available state-of-the-art algorithm for time series which is able to harness the cor-
                                compression tools, we explore Dictionary Learning (DL), relation across multiple related time series to eliminate
                                a robust sparse matrix factorization approach. Given a redundancy performing a more efficient compression.
                                matrix of signals 𝑌 , DL is able to learn a sparse repre- However, as far as we know, this work is the first to in-
                                sentation 𝑌 ≈ 𝐷𝑋 expressing each signal as a linear corporate DL as a compression method within the Digital
                                combination of few basis elements, called atoms, which Twins (DT) domain, using it as a powerful preprocess-
                                                                                                                                       ing technique for both time series and images. Also, we
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
                                nized by CINI, May 29-30, 2024, Naples, Italy
                                                                                                                                       developed an optimized DL algorithm for increasing its
                                *
                                  Corresponding author.                                                                                lightweight and efficiency in the DT framework.
                                $ l.cavalli@cineca.it (L. Cavalli); d.brandoni@cineca.it                                                  This work is structured as follows: Section II gives
                                (D. Brandoni); margherita.porcelli@unifi.it (M. Porcelli);                                             a brief overview of the DL problem and of its solution.
                                e.pascolo@cineca.it (E. Pascolo)                                                                       Section III integrates the DL approach within a DT frame-
                                 0000-0002-8157-1459 (D. Brandoni); 0000-0003-0183-1204
                                (M. Porcelli)
                                                                                                                                       work  and presents the overall DL4DT workflow, while
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Section IV discusses numerical results, conducting a de-
                                           Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
tailed analysis of the algorithm performance across var- Algorithm 1 OMP (naive approach) [4]
ious datasets. Additionally, it introduces several tech-           Given y ∈ R𝑚 , the sparsity level 𝑠, the dictionary
niques designed to improve the algorithm execution                 𝐷 ∈ R𝑚×𝑛 and the stopping tolerance 𝜖 > 0
speed. All the codes necessary to reproduce the experi-            Initialize 𝒮 = ∅, e = y
ments shown in this paper are available at the following           while |𝒮| < 𝑠 and ‖e‖2 > 𝜖 do
link: https://github.com/Eurocc-Italy/DL4DT.                           𝑘 = argmax𝑗 ∈𝒮     𝑇
                                                                                     / |e d𝑗 |
                                                                       𝒮 = 𝒮 ∪ {𝑘}
                                                                       x𝒮 = (𝐷𝒮𝑇 𝐷𝒮 )−1 𝐷𝒮𝑇 y
2. Dictionary Learning overview                                        e = y − 𝐷𝒮 x𝒮
                                                                   end while
The aim of DL is to discover an overcomplete set of basis
functions (atoms) able to represent in a sparse manner a
given set of data samples. Given a matrix of training sig-
                                                                Since at each step the current matrix 𝐷𝒮 is updated by
nals 𝑌 ∈ R𝑚×𝑁 (𝑚 ≪ 𝑁 ), DL seeks to find a dictionary
                                                                simply appending one column, a more efficient imple-
𝐷 ∈ R𝑚×𝑛 (𝑚 ≪ 𝑛) and a sparse matrix 𝑋 ∈ R𝑛×𝑁
                                                                mentation can be obtained by exploiting the least squares
to represent 𝑌 ≈ 𝐷𝑋. The DL problem can be formu-
                                                                solution just computed at the previous step. The most
lated in many equivalent ways, each one promoting a
                                                                famous approaches make use of the Cholesky decompo-
different aspect of the problem as shown in detail in [3].
                                                                sition of 𝐷𝒮𝑇 𝐷𝒮 [4, sec. 2.2] or the QR decomposition of
In this case we decided to formulate it as a two variable,
                                                                𝐷𝒮 [4, sec. 2.3]. Our computational experience showed
non-convex, constrained optimization problem of the
                                                                that the OMP-QR implementation is faster when applied
form
                                                                to DL [5]. Therefore, we implemented our parallel ver-
   min ‖𝑌 − 𝐷𝑋‖2𝐹 s.t. ‖x𝑙 ‖0 ≤ 𝑠, 𝑙 = 1, . . . , 𝑁             sion of the OMP-QR code to speed-up the computational
   𝐷,𝑋
                                                                times.
                                ‖d𝑗 ‖2 = 1, 𝑗 = 1, . . . , 𝑛       Regarding the Dictionary Update step, the following
                                                            (1) minimization problem has to be solved
where the number of atoms 𝑛 and the sparsity level 𝑠 are
fixed. Here, ‖ · ‖2 and ‖ · ‖0 denote the ℓ2 and ℓ0 norm of min ‖𝑌 − 𝐷𝑋‖𝐹 s.t. ‖d𝑗 ‖2 = 1, 𝑗 = 1, . . . , 𝑛
                                                                                   2
                                                                𝐷,(𝑋)
a vector, respectively, and ‖ · ‖𝐹 is the Frobenius norm.
                                                                                                                        (4)
   Problem (1) is NP-hard and admits multiple global op-
                                                                where the sparsity pattern of 𝑋 is fixed. For this task we
tima; therefore the convergence to the global minimum
                                                                followed the K-SVD approach [6].
is not guaranteed. In order to solve the DL problem, we
follow the usual alternate optimization approach. More
precisely, given the signal matrix 𝑌 and an initial dictio- 3. Dictionary Learning to reduce
nary 𝐷, at each iteration first the minimization problem
in 𝑋 is solved while 𝐷 is fixed (Sparse Coding step)
                                                                     latency in Digital Twin
and then the minimization problem in 𝐷 is solved while Reducing data latency is one of the main challenges
keeping 𝑋 (possibly) fixed (Dictionary Update step).            within the DT context. This section aims to outline
   The problem to be solved at the sparse coding step can the proposed workflow, named DL4DT, to decrease data
be formulated as follows                                        transmission time using DL as a compression technique.
 min ‖𝑌 −𝐷𝑋‖2𝐹        s.t.                                  DL4DT, illustrated in Figure 1, takes place in two stages.
                             ‖x𝑙 ‖0 ≤ 𝑠, 𝑙 = 1, . . . , 𝑁. (2)
  𝑋                                                         First of all (Fig.1 top), the data are collected from the phys-
that can be decomposed in the solution of 𝑁 problems, ical device, represented as a matrix 𝑌 and then transmit-
i.e. one for each signal                                    ted to the digital counterpart. Here, the entire process of
                                                            DL factorization is applied to 𝑌 , resulting in the learning
  min ‖y𝑙 −𝐷x𝑙 ‖22 s.t. ‖x𝑙 ‖0 ≤ 𝑠, 𝑙 = 1, . . . , 𝑁. (3) of a reliable and robust overcomplete dictionary 𝐷 and
   x𝑙
                                                            the sparse representation 𝑋. The dictionary 𝐷 is both
   For solving each problem (3), we employed Orthogonal saved on the digital system and transmitted back to be
Matching Pursuit (OMP), an iterative greedy algorithm saved also on the physical one. Afterwards, a new smaller
that selects at each step the atom which is best correlated dataset of signals 𝑌1 is collected (Fig.1 bottom). Instead of
with the residual e := y − 𝐷x. Then it produces a transferring the complete 𝑌1 , we claim that computing its
new approximation by projecting the signal y onto the sparse representation 𝑋1 with OMP using the reference
dictionary elements that have already been selected (𝐷𝒮 ). dictionary 𝐷 from stage 1 is sufficient. Transmitting 𝑋1 ,
We report in Algorithm 1 a naive version of OMP where which is highly sparse, indeed improves transmission
the least squares solution 𝑥𝒮 is computed from scratch time and reduces costs: solving a single Sparse Coding
at each step (refer to [4] for more details).               step demands fewer computational resources compared
                                                             Algorithm 2 DL4DT: workflow of a DT process with DL
                                                             techniques.
                                                                Collect data on the physical counterpart in matrix 𝑌 .
                                                                Send 𝑌 to the digital system.
                                                                Compute the dictionary 𝐷 and the sparse matrix 𝑋
                                                               with DL factorization of 𝑌 on the digital system.
                                                                𝑖=0
                                                                while True do
                                                                   if 𝑖 = 0 then
                                                                       Send the dictionary 𝐷 to the physical system
                                                                       and store it.
                                                                   else
                                                                       Compute 𝑋 using OMP-QR on the physical
                                                                       system.
                                                                       Send 𝑋 to the digital system.
                                                                   end if
                                                                   𝑖=𝑖+1
                                                                   Compute 𝑌˜ = 𝐷𝑋 on the digital system.
                                                                   Apply AI algorithm using 𝑌˜ as dataset.
                                                                   if user_conditions then
Figure 1: First (top) and next (bottom) runs of DL4DT.                 break
                                                                   end if
                                                                end while
to full DL, and transferring only 𝑋1 is lighter than send-
ing the entire 𝑌1 . Indeed, suppose that 𝑌1 has 𝑁 signals
of 𝑚 features each. Instead of passing all the 𝑚 × 𝑁 ele-    computing nodes each 2 × CPU Intel CascadeLake 8260,
ments, with our method is enough to transmit the 𝑠 × 𝑁       with 24 cores each, 2.4 GHz, 384GB RAM and NVIDIA
non-zero elements of 𝑋1 . Notice that in sparse matrices,    Mellanox Infiniband 100GbE network.
each non-zero element is stored as a triplet (row_index,
column_index, non_zero_value) requiring a total storage      4.1. Datasets
of 𝑠 × 𝑁 × 3 values. Therefore, the benefit of transfer-
                                                             We focused on three datasets with various types of data
ring 𝑋1 results in a reduction of 1 − 3𝑠 . Moreover, users
                                       𝑚                     (images or timeseries) and dimensions: MNIST [8], FordA
have the flexibility to specify under which conditions the
                                                             [9], and a fine-grained timeseries on the D.A.V.I.D.E. HPC
dictionary 𝐷 has to be updated, in order to have more
                                                             system [10, 11]. D.A.V.I.D.E. is a supercomputer devel-
reliable results. For example, a reasonable choice can be
                                                             oped by E4 Computer Engineering [12] and hosted in the
updating the dictionary after a fixed period of time or
                                                             past by CINECA, with an integrated monitoring infras-
when the accuracy of the AI algorithm on the compressed
                                                             tructure called Examon [10]. In this work we focused on
dataset starts to decrease too much. We refer to these
                                                             a subset of the data collected by Examon: for each of the
conditions as user_conditions in the forthcoming Algo-
                                                             45 nodes, were considered 166 metrics such as core work-
rithm 2. As we will prove, DL4DT is very effective since
                                                             loads, temperatures, fan speeds, power consumption, etc
DL techniques allow to massive compression preserving
                                                             collected in 5-minute intervals. In detail, we focused on
main important features of the dataset. DL4DT has been
                                                             the 16th node.
resumed in Algorithm 2.

                                                             4.2. Dictionary Learning compression
4. Numerical Results                                       To evaluate the effectiveness of our compression, it is es-
                                                           sential to compare the information generated by AI mod-
In this section, after introducing the datasets, we vali- els trained on both the original and compressed datasets.
date the DL approach as an effective compression tool This is crucial within the DT framework, where our pri-
for addressing DT latency problems. Then, we simulate mary aim is to extract valuable insights from compressed
and analyze the DL4DT workflow presented in Section data.
3, exploiting the DL ability to build a highly representa-   We considered a CNN tailored for digit recognition
tive dictionary. All experiments were run on Galileo100 [13] on MNIST dataset, a CNN able to perform anomaly
[7], an HPC infrastructure owned by CINECA with 528 detection suggested in [14] on FordA and an autoencoder-
based model able to automatically detect anomalies in a       pression settings. The overall accuracy, approximately
semi-supervised fashion ([10, 11]) on D.A.V.I.D.E. After      86%, is lower than previous cases as expected due to the
training the NNs described above on both original and         real-world nature of the dataset. However we notice that
compressed datasets, we compared their performance            the test accuracy reached by training the autoencoder on
on the same test set by studying the accuracy, which          the compressed training dataset is almost identical to the
is defined as the ratio of the number of correct predic-      one obtained with no compression. However, when deal-
tions over the total number of predictions. Figure 2 com-     ing with imbalanced datasets, it is better to consider the
pares respectively the test accuracy achieved by the NNs      F-score value achieved for each class (normal signals and
trained on the original dataset (green dotted line) and       anomalies) rather than the accuracy. F-score value is de-
on a DL compression of MNIST (top) and FordA (bot-            fined as F-score:= 2 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑟𝑒𝑐𝑎𝑙𝑙
                                                                                    𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
                                                                                                     , where 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
tom) concerning a sparsity level of 𝑠 = 50 and a number       and 𝑟𝑒𝑐𝑎𝑙𝑙 are the ratio of true positives to the total pre-
of iterations 𝐾 = 20 (orange solid line) across various       dicted positives and to the actual positives, respectively.
compression levels. The results obtained with other set-      We notice that the F-score reached on normal signals,
tings of DL are shown in more detail in [5]. As expected,
                                                                                 100
                    100
                                                                                  95


                                                                      accuracy
        accuracy


                    90
                                                                                  90
                    80
                                                                                  85
                                                                                    60          70           80
                    70
                      40    50        60    70       80
                                                                                          % compression
                             % compression                                       100
                                                                       F-score

                    100                                                           95
                                                                                  90
                     90
         accuracy


                                                                                  85
                                                                                    60          70           80
                     80
                                                                                          % compression

                     70                                                          100
                                                                       F-score


                       40        50        60        70                           95
                                                                                  90
                              % compression
                                                                                  85
                      no compression        s = 50                                  60          70           80
Figure 2: Accuracy of different compression levels with 𝑠 =                               % compression
50 compared to the accuracy with no compression on MNIST
                                                                                   no compression      s=5
(top) and FordA dataset (bottom).
                                                              Figure 3: Accuracy (top), F-score on normal signals (middle)
the accuracy computed on the compressed datasets is           and on anomalies (bottom) with DL with 𝑠 = 5 compared to
                                                              the case with no compression on D.A.V.I.D.E. timeseries.
lower than the one computed on the original dataset. De-
spite not matching exactly the original accuracy, we still
achieve extremely good results: with MNIST dataset we         shown in the middle of Fig.3, remains almost unaffected
can even reach an accuracy of 97% with a compression of       by compression: across various DL configurations, the
80% against an accuracy of 99% with no compression, this      F-score consistently remains close to 98%, as the original
is probably due to the redundant nature of the datasets,      case without compression. This finding aligns with our
which makes it possible to achieve high accuracy lev-         expectations, as the training set in this example consists
els even with high levels of compression. On FordA an         only of signals without anomalies. As for the F-score of
overall accuracy of 91% is reached even with high com-        anomalies, shown at the bottom of Fig.3, we observe that
pression levels against 96% with no compression. Figure 3     this value increases when compression is more intense.
shows at the top the test accuracy achieved by the autoen-    Examining the details of the Recall and Precision values
coder trained on the original D.A.V.I.D.E dataset (green      for these cases (Table 1), we notice that, respectively, the
dotted line) and on the dataset compressed with DL with       Recall for normal signals and the Precision for anomalies
𝑠 = 5 and 𝐾 = 10 (orange solid line) and different com-       are higher compared to the case without compression.
Table 1                                                             of the OMP-QR code better suited for running on devices
Precision and Recall values for normal signals and anomalies        with limited computational resources.
with no compression and 80% DL compression with 𝑠 = 5.

    compression          type of signal       Precision    Recall   4.3. Dictionary representativity
        0%                  normal              99.8       95.4     As already mentioned, the data provided by a DT do not
        80 %                normal              99.8       96.3     usually show great variability. This section aims to verify
        0%                 anomaly              79.8       99.1     whether the dictionary learned in the first stage is robust
        80 %               anomaly              84.2       99.1     enough to accurately represent newly collected data. If
                                                                    successful, it would make it possible to run the sparse
                                                                    coding step (OMP-QR) without the need for a dictionary
These two values (Recall of normal signals and Precision            update. In particular we integrate the study of dictionary
of anomalies) take into account the cases where certain             representativity into a simulation of the DL4DT workflow
signals are identified as anomalies even though they are            on D.A.V.I.D.E. dataset, keeping track of the original sizes,
not. The higher the value, the more this type of error is           compression levels, and times.
avoided. Therefore, it is consistent that DL compression               The goal of the first stage is to learn a reliable and
can increase these values, as DL is known as a valuable             representative dictionary. Thus, we begin by consider-
denoising tool, leading to improved anomaly detection.              ing the 4432 signals of its training set. In our workflow
   Let us explore some implementations of the code. In              these data are sent to the digital twin where we choose
our scenario, we have to deal with substantial problem              to apply the strongest yet most meaningful compression,
dimensions but we can also benefit of the computational             i.e. compression of 80 % with 𝑠 = 20, 𝑛 = 349 and 10
resources of an HPC cluster in the first stage of the work-         iterations. From previous studies we know that such a
flow presented in Section 3. These resources can be                 compression can reach an overall F-score level of about
fully employed in the OMP algorithm which can be par-               97.9% on normal signals and 90.7% on anomalies, taking
allelized with the Joblib python library [15] following             around 3 minutes. Then the dictionary is stored both in
what was mentioned in Section 2. Figure 4 illustrates the           the digital twin and sent back to the physical one.
speedup achieved by executing OMP-QR both serially                  After a fixed time interval a new matrix of signals 𝑌1 is
and in parallel with an increasing number of processors,            collected on the physical system. We simulate this new
where speedup is the ratio of the execution time of the             matrix of signals by taking the test set relative to the 16th
serial code to the execution time of the parallel code              node, since it is completely new to the dictionary and
performing the same task.                                           presents anomalies. We then compute its sparse repre-
                                                                    sentation matrix 𝑋1 with a single run of OMP-QR with
                    16
                                                                    𝑠 = 15, taking around 3 seconds. The sparse represen-
                                                                    tation matrix is then sent to the digital system where is
         Speed Up


                                                                    used to reconstruct the signals as 𝑌ˆ1 = 𝐷𝑋1 . To evalu-
                    8                                               ate the information loss due to the data compression we
                    4
                                                                    consider the autoencoder trained in the first run on the
                    2                                               compressed train set and look if it is still able to detect
                                                                    the same anomalies testing it on the compressed test set
                            2 4           8               16
                                                                    𝑌ˆ1 . We obtain extremely good results, achieving an F-
                                     # tasks                        score of 97% on normal samples and 89.9% on anomalies.
                            ideal         joblib                    These outcomes are very close to the results obtained
                                                                    without compression, which were respectively 97.9% and
Figure 4: Speed up of OMP-QR algorithm in serial and with           90.7%. The DL setting that we choose is indeed a sensi-
Joblib parallelization. For this type of problem it is not mean-    ble choice: increasing the compression level contributes
ingful to increase resources beyond 16 tasks.                       to smooth the signals with beneficial results, yet it re-
                                                                    mains highly representative with the sparsity level set to
   The proposed parallelization has a significant impact            𝑠 = 20. We conduct a similar experiment using random
on the total computational time of the DL algorithm:                compression, instead of DL, retaining only 30% of the
when the plane DL algorithm is run sequentially with                samples chosen randomly from the test set, obtaining
a single CPU, it requires about 20 hours to complete 20             a F-score equal to 98% on normal samples and 63% on
iterations on a matrix of size 784 × 60.000, while the              anomalies which is definitely worst. Thanks to this work-
same algorithm implemented with the Joblib parallelized             flow, instead of transmitting the entire signal matrix 𝑌1
version of OMP-QR using 16 CPUs completes the task in               of dimensions 165 × 3074, is enough to compute and
about 5 hours. We have also developed a light C version             transfer its sparse representation 𝑋1 which requires the
storage of 20 × 3074 × 3 elements. This results in mem-        References
ory gain of 73%, requiring only 3 seconds and causing a
minimal loss of information.                                    [1] R. Archibald, H. Tran, A dictionary learning al-
   This process can be iterated multiple times, until the           gorithm for compression and reconstruction of
dictionary 𝐷 requires updating to ensure more accu-                 streaming data in preset order, Discrete and Con-
rate outcomes. For instance, the dictionary might be                tinuous Dynamical Systems - Series S 15 (2021).
refreshed periodically or whenever the performance of               doi:10.3934/dcdss.2021102.
the AI algorithm on the compressed dataset begins to            [2] A. Khelifati, M. Khayati, P. Cudré-Mauroux,
significantly decline. The results confirm that the dictio-         Corad: Correlation-aware compression of mas-
nary 𝐷 learned on the training set manages to represent             sive time series using sparse dictionary coding,
new signals quite effectively. Indeed the accuracy lev-             in: 2019 IEEE International Conference on Big
els achieved by the signals reconstructed with the old              Data (Big Data), 2019, pp. 2289–2298. doi:10.1109/
dictionary 𝐷 are good, allowing a significant gain in               BigData47090.2019.9005580.
computational efficiency.                                       [3] B. Dumitrescu, P. Irofti, Dictionary Learn-
                                                                    ing Algorithms and Applications, Springer
                                                                    Cham, 2018. doi:https://doi.org/10.1007/
5. Conclusions                                                      978-3-319-78674-2.
                                                                [4] B. Sturm, M. Christensen, Comparison of orthogo-
The purpose of this work was to introduce a new effi-               nal matching pursuit implementations, EURASIP,
cient and lightweight compression tool within the Digital           2012, pp. 220–224.
Twins framework that has minimal impact on the accu-            [5] L. Cavalli, Analysis and implementation of Dic-
racy of AI models trained on compressed data (DL4DT).               tionary Learning techniques in a Digital Twin
The numerical experiments showed that both with time-               framwork, Master thesis, University of Bologna,
series and images the algorithm exhibited excellent be-             Bologna, Italy, 2023. Available at https://github.
haviour, managing to compress the dataset up to 80%                 com/Eurocc-Italy/DL4DT.
while preserving key information and therefore keeping          [6] M. Aharon, M. Elad, A. Bruckstein, K-svd: An algo-
the accuracy almost unchanged. As shown in Section 4.3,             rithm for designing overcomplete dictionaries for
the dictionary learned from training data was able to rep-          sparse representation, IEEE Transactions on Signal
resent new signals in an accurate manner in a sparse way.           Processing 54 (2006) 4311–4322. doi:10.1109/TSP.
Moreover, in examples carried out on D.A.V.I.D.E. dataset           2006.881199.
turned out that such an algorithm also enhances data            [7] Cineca, Galileo100, 2021. URL: https://www.hpc.
quality, serving as a potential preprocessing tool. Finally,        cineca.it/systems/hardware/galileo100/.
due to the low computational cost of our parallel imple-        [8] L. Deng, The mnist database of handwritten digit
mentation of the OMP-QR, this approach allowed for                  images for machine learning research [best of the
on-device data compression, particularly useful with de-            web], IEEE Signal Processing Magazine 29 (2012)
vices like IoT sensors, effectively reducing data exchange          141–142. doi:10.1109/MSP.2012.2211477.
between devices while retaining the most crucial infor-         [9] J. Wichard, Classification of ford motor data (2009).
mation. In conclusion, we can state that the DL compres-       [10] A. Borghesi, A. Libri, L. Benini, A. Bartolini, On-
sion algorithm effectively reduces the dataset memory               line anomaly detection in HPC systems, CoRR
demand, resulting in faster data transmission and reduced           abs/1902.08447 (2019). URL: http://arxiv.org/abs/
latency between distinct systems. Such a compression                1902.08447. arXiv:1902.08447.
tool can have significant implications in Industry, where      [11] A. Borghesi, A. Bartolini, M. Lombardi, M. Milano,
network infrastructures may not be high-performing but              L. Benini, Anomaly detection using autoencoders
a wise and efficient use of digital twin systems is crucial         in high performance computing systems, CoRR
for optimizing and managing production.                             abs/1811.05269 (2018). URL: http://arxiv.org/abs/
                                                                    1811.05269. arXiv:1811.05269.
Acknowledgments                                                [12] E4 computer engineering., https://www.e4company.
                                                                    com/en/, 2024.
This work is supported by the EUROCC Italy National            [13] F. Chollet, Simple mnist convnet, https://keras.io/
Competence Center. The Competence Center is part                    examples/vision/mnist_convnet/, 2020.
of EUROCC project funded by the European High-                 [14] H. Fawaz, Timeseries classification from scratch,
Performance Computing Joint Undertaking (JU) under                  https://keras.io/examples/timeseries/timeseries_
grant agreement No 101101903. A special thanks to the               classification_from_scratch/, 2023.
CINECA HPC department for their technical support.             [15] Joblib, https://github.com/joblib/joblib, 2023.
The work of MP is partially supported by INdAM-GNCS.

</pre>