<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dictionary Learning for data compression within a Digital Twin Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laura Cavalli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domitilla Brandoni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Margherita Porcelli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Pascolo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CINECA</institution>
          ,
          <addr-line>Via Magnanelli 2, Casalecchio di Reno (BO), 40033</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Ingegneria Industriale, Università degli Studi di Firenze</institution>
          ,
          <addr-line>Viale Morgagni 40/44, 50134, Firenze</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>Via Moruzzi 1, Pisa</addr-line>
          ,
          <country country="IT">Italy.</country>
          <institution>INdAM Research Group GNCS</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Digital Twin system plays a crucial role in several contexts, from smart agriculture to predictive maintenance, from healthcare to weather modelling. To be efective, it involves a continuous exchange of massive data between IoT sensors on real world and digital system hosted on HPC and vice versa. Nevertheless, the transmitted signals often exhibit high similarity, resulting in a redundant dataset very suitable for compression. This paper shows how Dictionary Learning can be used as a preprocessing technique for AI algorithms due to its ability to compress large data volumes up to 80% with a potential enhancement of the performances acting both as a denoising and compression technique. This algorithm operates eficiently on various types of datasets, from images to timeseries, and is well-suited for deployment on devices with limited computational resources, like IoT sensors.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Digital Twin</kwd>
        <kwd>Dictionary Learning</kwd>
        <kwd>parallel OMP</kwd>
        <kwd>timeseries compression</kwd>
        <kwd>images compression</kwd>
        <kwd>anomaly detection</kwd>
        <kwd>image recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        constitute the columns of . In this work we will show
that DL has various features that make it very suitable
A digital twin can be simply seen as a system consist- for use in data compression and transmission: i) it
ening of two entities, a tangible, subject-of-interest, and its ables exceptional compression of redundant data due to
digital replica, interconnected by a continuous stream of its distinctive sparse factorization feature; ii) it is a
verdata. In this context, data reflecting the physical entity satile approach being able to handle diverse data types,
are acquired through IoT sensors and sent to a dedicated including images and time series; iii) its solution can be
HPC which constitutes its digital mirror. Within the HPC, performed with an algorithm, supplied in this work, with
data undergoes AI analysis to simulate the behavior and low computational resource demand and independent of
potential scenarios of the physical entity. The resulting specific libraries, making it lightweight and well-suited
insights are looped back into the physical system, im- for edge computing.
pacting decision-making. An eficient transmission and The literature on DL comprises many applications
storage of such large volumes of sensor data are therefore across various fields, including denoising, inpainting,
crucial to reduce latency between the two systems ensur- classification, and compression. Regarding data
coming a reliable real-time digital representation, but this is pression, an interesting online DL approach is proposed
often prohibitively expensive. For this reason, it is neces- in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] where massive datasets streamed through in a
presary to explore compression algorithms that lighten and set order are compressed and denoised. Furthermore, the
speed up data transmission while preserving their mean- work [
        <xref ref-type="bibr" rid="ref3">2</xref>
        ] presents CORAD, a novel DL-based compression
ingful information. Among the available state-of-the-art algorithm for time series which is able to harness the
corcompression tools, we explore Dictionary Learning (DL), relation across multiple related time series to eliminate
a robust sparse matrix factorization approach. Given a redundancy performing a more eficient compression.
matrix of signals  , DL is able to learn a sparse repre- However, as far as we know, this work is the first to
insentation  ≈  expressing each signal as a linear corporate DL as a compression method within the Digital
combination of few basis elements, called atoms, which Twins (DT) domain, using it as a powerful
preprocessing technique for both time series and images. Also, we
Intiazle-dIAby20C2I4N:I4,tMhaNyat2i9o-n3a0l, C20o2n4f,erNeanpcleeso,nItAarlytificial Intelligence, orga- developed an optimized DL algorithm for increasing its
* Corresponding author. lightweight and eficiency in the DT framework.
$ l.cavalli@cineca.it (L. Cavalli); d.brandoni@cineca.it This work is structured as follows: Section II gives
(D. Brandoni); margherita.porcelli@unifi.it (M. Porcelli); a brief overview of the DL problem and of its solution.
e.pascolo@cineca.it (E. Pascolo) Section III integrates the DL approach within a DT
frame(M.0P00o0rc-0e0ll0i)2-8157-1459 (D. Brandoni); 0000-0003-0183-1204 work and presents the overall DL4DT workflow, while
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Section IV discusses numerical results, conducting a
deAttribution 4.0 International (CC BY 4.0).
tailed analysis of the algorithm performance across
various datasets. Additionally, it introduces several
techniques designed to improve the algorithm execution
speed. All the codes necessary to reproduce the
experiments shown in this paper are available at the following
link: https://github.com/Eurocc-Italy/DL4DT.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Dictionary Learning overview</title>
      <p>
        The aim of DL is to discover an overcomplete set of basis
functions (atoms) able to represent in a sparse manner a
given set of data samples. Given a matrix of training
signals  ∈ R×  ( ≪  ), DL seeks to find a dictionary
 ∈ R× ( ≪ ) and a sparse matrix  ∈ R× 
to represent  ≈ . The DL problem can be
formulated in many equivalent ways, each one promoting a
diferent aspect of the problem as shown in detail in [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ].
In this case we decided to formulate it as a two variable,
non-convex, constrained optimization problem of the
form
m,in ‖ − ‖2
s.t.
      </p>
      <p>‖x‖0 ≤ ,  = 1, . . . , 
‖d ‖2 = 1,  = 1, . . . , 
(1)
where the number of atoms  and the sparsity level  are
ifxed. Here, ‖ · ‖ 2 and ‖ · ‖ 0 denote the ℓ2 and ℓ0 norm of
a vector, respectively, and ‖ · ‖  is the Frobenius norm.</p>
      <p>Problem (1) is NP-hard and admits multiple global
optima; therefore the convergence to the global minimum
is not guaranteed. In order to solve the DL problem, we
follow the usual alternate optimization approach. More
precisely, given the signal matrix  and an initial
dictionary , at each iteration first the minimization problem
in  is solved while  is fixed ( Sparse Coding step)
and then the minimization problem in  is solved while
keeping  (possibly) fixed ( Dictionary Update step).</p>
      <p>The problem to be solved at the sparse coding step can
be formulated as follows</p>
      <sec id="sec-2-1">
        <title>Algorithm 1 OMP (naive approach) [4]</title>
        <p>Given y ∈ R, the sparsity level , the dictionary
 ∈ R×  and the stopping tolerance  &gt; 0
Initialize  = ∅, e = y
while || &lt;  and ‖e‖2 &gt;  do
 = argmax∈/ |e d |
 =  ∪ {}
x = (  )− 1 y
e = y −  x
end while
Since at each step the current matrix  is updated by
simply appending one column, a more eficient
implementation can be obtained by exploiting the least squares
solution just computed at the previous step. The most
famous approaches make use of the Cholesky
decomposition of   [4, sec. 2.2] or the QR decomposition of
 [4, sec. 2.3]. Our computational experience showed
that the OMP-QR implementation is faster when applied
to DL [5]. Therefore, we implemented our parallel
version of the OMP-QR code to speed-up the computational
times.</p>
        <p>Regarding the Dictionary Update step, the following
minimization problem has to be solved
m,(in) ‖ − ‖2
s.t.</p>
        <p>‖d ‖2 = 1,  = 1, . . . , 
(4)
where the sparsity pattern of  is fixed. For this task we
followed the K-SVD approach [6].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dictionary Learning to reduce latency in Digital Twin</title>
      <sec id="sec-3-1">
        <title>Reducing data latency is one of the main challenges</title>
        <p>within the DT context. This section aims to outline
the proposed workflow, named DL4DT, to decrease data
transmission time using DL as a compression technique.
min ‖ − ‖2 s.t. ‖x‖0 ≤ ,  = 1, . . . , . (2) DL4DT, illustrated in Figure 1, takes place in two stages.</p>
        <p>First of all (Fig.1 top), the data are collected from the
physthat can be decomposed in the solution of  problems, ical device, represented as a matrix  and then
transmiti.e. one for each signal ted to the digital counterpart. Here, the entire process of
DL factorization is applied to  , resulting in the learning
min ‖y − x‖22 s.t. ‖x‖0 ≤ ,  = 1, . . . , . (3) of a reliable and robust overcomplete dictionary  and
x the sparse representation . The dictionary  is both
For solving each problem (3), we employed Orthogonal saved on the digital system and transmitted back to be
Matching Pursuit (OMP), an iterative greedy algorithm saved also on the physical one. Afterwards, a new smaller
that selects at each step the atom which is best correlated dataset of signals 1 is collected (Fig.1 bottom). Instead of
with the residual e := y − x. Then it produces a transferring the complete 1, we claim that computing its
new approximation by projecting the signal y onto the sparse representation 1 with OMP using the reference
dictionary elements that have already been selected ( ). dictionary  from stage 1 is suficient. Transmitting 1,
We report in Algorithm 1 a naive version of OMP where which is highly sparse, indeed improves transmission
the least squares solution  is computed from scratch time and reduces costs: solving a single Sparse Coding
at each step (refer to [4] for more details). step demands fewer computational resources compared
to full DL, and transferring only 1 is lighter than
sending the entire 1. Indeed, suppose that 1 has  signals
of  features each. Instead of passing all the  × 
elements, with our method is enough to transmit the  × 
non-zero elements of 1. Notice that in sparse matrices,
each non-zero element is stored as a triplet (row_index,
column_index, non_zero_value) requiring a total storage
of  ×  × 3 values. Therefore, the benefit of
transferring 1 results in a reduction of 1 − 3 . Moreover, users
have the flexibility to specify under which conditions the
dictionary  has to be updated, in order to have more
reliable results. For example, a reasonable choice can be
updating the dictionary after a fixed period of time or
when the accuracy of the AI algorithm on the compressed
dataset starts to decrease too much. We refer to these
conditions as user_conditions in the forthcoming
Algorithm 2. As we will prove, DL4DT is very efective since
DL techniques allow to massive compression preserving
main important features of the dataset. DL4DT has been
resumed in Algorithm 2.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Algorithm 2 DL4DT: workflow of a DT process with DL</title>
        <p>techniques.</p>
        <p>Collect data on the physical counterpart in matrix  .
Send  to the digital system.</p>
        <p>Compute the dictionary  and the sparse matrix 
with DL factorization of  on the digital system.
 = 0
while True do
if  = 0 then</p>
        <p>Send the dictionary  to the physical system
and store it.
else</p>
      </sec>
      <sec id="sec-3-3">
        <title>Compute  using OMP-QR on the physical</title>
        <p>system.</p>
        <p>Send  to the digital system.
end if
 =  + 1
Compute ˜ =  on the digital system.</p>
        <p>Apply AI algorithm using ˜ as dataset.
if user_conditions then</p>
        <p>break
end if
end while
computing nodes each 2 × CPU Intel CascadeLake 8260,
with 24 cores each, 2.4 GHz, 384GB RAM and NVIDIA
Mellanox Infiniband 100GbE network.</p>
        <sec id="sec-3-3-1">
          <title>4.1. Datasets</title>
          <p>
            We focused on three datasets with various types of data
(images or timeseries) and dimensions: MNIST [8], FordA
[9], and a fine-grained timeseries on the D.A.V.I.D.E. HPC
system [
            <xref ref-type="bibr" rid="ref2">10, 11</xref>
            ]. D.A.V.I.D.E. is a supercomputer
developed by E4 Computer Engineering [12] and hosted in the
past by CINECA, with an integrated monitoring
infrastructure called Examon [
            <xref ref-type="bibr" rid="ref2">10</xref>
            ]. In this work we focused on
a subset of the data collected by Examon: for each of the
45 nodes, were considered 166 metrics such as core
workloads, temperatures, fan speeds, power consumption, etc
collected in 5-minute intervals. In detail, we focused on
the 16th node.
          </p>
        </sec>
        <sec id="sec-3-3-2">
          <title>4.2. Dictionary Learning compression</title>
          <p>
            4. Numerical Results To evaluate the efectiveness of our compression, it is
essential to compare the information generated by AI
modIn this section, after introducing the datasets, we vali- els trained on both the original and compressed datasets.
date the DL approach as an efective compression tool This is crucial within the DT framework, where our
prifor addressing DT latency problems. Then, we simulate mary aim is to extract valuable insights from compressed
and analyze the DL4DT workflow presented in Section data.
3, exploiting the DL ability to build a highly representa- We considered a CNN tailored for digit recognition
tive dictionary. All experiments were run on Galileo100 [13] on MNIST dataset, a CNN able to perform anomaly
[7], an HPC infrastructure owned by CINECA with 528 detection suggested in [14] on FordA and an
autoencoderbased model able to automatically detect anomalies in a pression settings. The overall accuracy, approximately
semi-supervised fashion ([
            <xref ref-type="bibr" rid="ref2">10, 11</xref>
            ]) on D.A.V.I.D.E. After 86%, is lower than previous cases as expected due to the
training the NNs described above on both original and real-world nature of the dataset. However we notice that
compressed datasets, we compared their performance the test accuracy reached by training the autoencoder on
on the same test set by studying the accuracy, which the compressed training dataset is almost identical to the
is defined as the ratio of the number of correct predic- one obtained with no compression. However, when
dealtions over the total number of predictions. Figure 2 com- ing with imbalanced datasets, it is better to consider the
pares respectively the test accuracy achieved by the NNs F-score value achieved for each class (normal signals and
trained on the original dataset (green dotted line) and anomalies) rather than the accuracy. F-score value is
deon a DL compression of MNIST (top) and FordA (bot- fined as F-score := 2 ×+ , where 
tom) concerning a sparsity level of  = 50 and a number and  are the ratio of true positives to the total
preof iterations  = 20 (orange solid line) across various dicted positives and to the actual positives, respectively.
compression levels. The results obtained with other set- We notice that the F-score reached on normal signals,
tings of DL are shown in more detail in [5]. As expected,
y
c
a
r
u
c
c
a
y
c
a
r
u
c
c
a
100
90
80
70
          </p>
          <p>40
100
90
80
70
50
60
70</p>
          <p>80
% compression
40
50
60</p>
          <p>70
% compression
no compression
the accuracy computed on the compressed datasets is
lower than the one computed on the original dataset.
Despite not matching exactly the original accuracy, we still
achieve extremely good results: with MNIST dataset we shown in the middle of Fig.3, remains almost unafected
can even reach an accuracy of 97% with a compression of by compression: across various DL configurations, the
80% against an accuracy of 99% with no compression, this F-score consistently remains close to 98%, as the original
is probably due to the redundant nature of the datasets, case without compression. This finding aligns with our
which makes it possible to achieve high accuracy lev- expectations, as the training set in this example consists
els even with high levels of compression. On FordA an only of signals without anomalies. As for the F-score of
overall accuracy of 91% is reached even with high com- anomalies, shown at the bottom of Fig.3, we observe that
pression levels against 96% with no compression. Figure 3 this value increases when compression is more intense.
shows at the top the test accuracy achieved by the autoen- Examining the details of the Recall and Precision values
coder trained on the original D.A.V.I.D.E dataset (green for these cases (Table 1), we notice that, respectively, the
dotted line) and on the dataset compressed with DL with Recall for normal signals and the Precision for anomalies
 = 5 and  = 10 (orange solid line) and diferent com- are higher compared to the case without compression.</p>
          <p>4.3. Dictionary representativity
0 % normal 99.8 95.4 As already mentioned, the data provided by a DT do not
80 % normal 99.8 96.3 usually show great variability. This section aims to verify
0 % anomaly 79.8 99.1 whether the dictionary learned in the first stage is robust
80 % anomaly 84.2 99.1 enough to accurately represent newly collected data. If
successful, it would make it possible to run the sparse
coding step (OMP-QR) without the need for a dictionary
These two values (Recall of normal signals and Precision update. In particular we integrate the study of dictionary
of anomalies) take into account the cases where certain representativity into a simulation of the DL4DT workflow
signals are identified as anomalies even though they are on D.A.V.I.D.E. dataset, keeping track of the original sizes,
not. The higher the value, the more this type of error is compression levels, and times.
avoided. Therefore, it is consistent that DL compression The goal of the first stage is to learn a reliable and
can increase these values, as DL is known as a valuable representative dictionary. Thus, we begin by
considerdenoising tool, leading to improved anomaly detection. ing the 4432 signals of its training set. In our workflow</p>
          <p>Let us explore some implementations of the code. In these data are sent to the digital twin where we choose
our scenario, we have to deal with substantial problem to apply the strongest yet most meaningful compression,
dimensions but we can also benefit of the computational i.e. compression of 80 % with  = 20,  = 349 and 10
resources of an HPC cluster in the first stage of the work- iterations. From previous studies we know that such a
lfow presented in Section 3. These resources can be compression can reach an overall F-score level of about
fully employed in the OMP algorithm which can be par- 97.9% on normal signals and 90.7% on anomalies, taking
allelized with the Joblib python library [15] following around 3 minutes. Then the dictionary is stored both in
what was mentioned in Section 2. Figure 4 illustrates the the digital twin and sent back to the physical one.
speedup achieved by executing OMP-QR both serially After a fixed time interval a new matrix of signals 1 is
and in parallel with an increasing number of processors, collected on the physical system. We simulate this new
where speedup is the ratio of the execution time of the matrix of signals by taking the test set relative to the 16th
serial code to the execution time of the parallel code node, since it is completely new to the dictionary and
performing the same task. presents anomalies. We then compute its sparse
representation matrix 1 with a single run of OMP-QR with
16  = 15, taking around 3 seconds. The sparse
representation matrix is then sent to the digital system where is
pU used to reconstruct the signals as ˆ1 = 1. To
evalued 8 ate the information loss due to the data compression we
eSp 4 consider the autoencoder trained in the first run on the
2 compressed train set and look if it is still able to detect
the same anomalies testing it on the compressed test set
2 4 8 16 ˆ1. We obtain extremely good results, achieving an
F# tasks score of 97% on normal samples and 89.9% on anomalies.
ideal joblib These outcomes are very close to the results obtained
without compression, which were respectively 97.9% and
Figure 4: Speed up of OMP-QR algorithm in serial and with 90.7%. The DL setting that we choose is indeed a
sensiJoblib parallelization. For this type of problem it is not mean- ble choice: increasing the compression level contributes
ingful to increase resources beyond 16 tasks. to smooth the signals with beneficial results, yet it
remains highly representative with the sparsity level set to</p>
          <p>The proposed parallelization has a significant impact  = 20. We conduct a similar experiment using random
on the total computational time of the DL algorithm: compression, instead of DL, retaining only 30% of the
when the plane DL algorithm is run sequentially with samples chosen randomly from the test set, obtaining
a single CPU, it requires about 20 hours to complete 20 a F-score equal to 98% on normal samples and 63% on
iterations on a matrix of size 784 × 60.000, while the anomalies which is definitely worst. Thanks to this
worksame algorithm implemented with the Joblib parallelized lfow, instead of transmitting the entire signal matrix 1
version of OMP-QR using 16 CPUs completes the task in of dimensions 165 × 3074, is enough to compute and
about 5 hours. We have also developed a light C version transfer its sparse representation 1 which requires the
storage of 20 × 3074 × 3 elements. This results in
memory gain of 73%, requiring only 3 seconds and causing a
minimal loss of information.</p>
          <p>This process can be iterated multiple times, until the
dictionary  requires updating to ensure more
accurate outcomes. For instance, the dictionary might be
refreshed periodically or whenever the performance of
the AI algorithm on the compressed dataset begins to
significantly decline. The results confirm that the
dictionary  learned on the training set manages to represent
new signals quite efectively. Indeed the accuracy
levels achieved by the signals reconstructed with the old
dictionary  are good, allowing a significant gain in
computational eficiency.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Archibald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <article-title>A dictionary learning algorithm for compression and reconstruction of streaming data in preset order</article-title>
          ,
          <source>Discrete and Continuous Dynamical Systems - Series S</source>
          <volume>15</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>doi:10</source>
          .3934/dcdss.2021102.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khelifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khayati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudré-Mauroux</surname>
          </string-name>
          ,
          <article-title>Corad: Correlation-aware compression of massive time series using sparse dictionary coding</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Big Data (Big Data)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2289</fpage>
          -
          <lpage>2298</lpage>
          . doi:
          <volume>10</volume>
          .1109/ BigData47090.
          <year>2019</year>
          .
          <volume>9005580</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dumitrescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Irofti</surname>
          </string-name>
          ,
          <source>Dictionary Learning Algorithms and Applications</source>
          , Springer Cham,
          <year>2018</year>
          . doi:https://doi.org/10.1007/ 5. Conclusions 978-3-
          <fpage>319</fpage>
          -78674-2.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>