DPET: A Data and Parameter Efficient Training
                         Framework for Green AI
                         Francesco Scala1,∗ , Luigi Pontieri1 and Sergio Flesca2
                         1
                          Institute of High Performance Computing and Networking (ICAR-CNR), Via P. Bucci, 87036 Rende (CS), Italy
                         2
                          Dept. Computer Engineering, Modeling, Electronics, and Systems Engineering (DIMES), University of Calabria, 87036 Rende (CS),
                         Italy


                                        Abstract
                                        The worsening climate crisis calls for immediate action to reduce the environmental impact of energy-intensive
                                        technologies, including Artificial Intelligence (AI). Reducing AI’s environmental footprint involves adopting
                                        energy-efficient strategies for training Deep Neural Networks (DNNs). One such strategy is Data Pruning (DP),
                                        which decreases the number of training instances, thereby lowering total energy consumption. Several DP
                                        methods, such as GraNd and Craig, have been introduced to accelerate model training. On the other hand,
                                        Active Learning (AL) techniques, originally designed to iteratively select relevant unlabeled data instances for
                                        being labeled by human experts, can also be leveraged to train models on smaller, but informative, subsets.
                                        However, despite reducing the volume of training data, many DP and AL-based methods involve expensive
                                        computations that may significantly limit their potential for energy savings. In this work“-in-progress”, we
                                        propose a framework, named DPET , that efficiently integrates data selection techniques within an AL-like
                                        incremental training. Empirical analyses on a benchmark dataset show that the proposed approach offers a better
                                        balance between accuracy and energy efficiency in the training of DNN models.

                                        Keywords
                                        Data Pruning, Green-AI, Active Learning, Energy Efficiency, Sustainability


                         1. Introduction
                         Recent advancements in Artificial Intelligence (AI) have significantly transformed industries like
                         healthcare, finance, and manufacturing, impacting personal and professional life. However, this rapid
                         expansion has raised concerns regarding increased energy consumption and carbon emissions [1]. Deep
                         Learning models, which require vast data and computation for training Deep Neural Networks (DNNs),
                         are major contributors to this surge in energy use [2]. The electricity needed for AI model training,
                         largely generated from non-renewable sources like coal and natural gas, contributes to climate change [3].
                         In response, Green-AI research aims to reduce AI systems’ environmental impact by minimizing energy
                         consumption, utilizing renewable energy, and developing energy-efficient AI hardware. This work
                         particularly addresses the challenge of combining data selection with deep learning methods to lower
                         energy usage while maintaining model accuracy.

                         Existing solutions Several approaches have been proposed to tackle the issue of reducing energy
                         consumption in AI. One key method is Data Pruning (DP), which involves extracting a compact subset,
                         or coreset, from a large dataset while preserving its most relevant information [4]. This smaller sample
                         can be used as a more cost-effective substitute for the original dataset in machine learning tasks [5].
                         However, many DP methods involve heavy computations, which can negate the benefits of reducing
                         the training dataset size. Recent studies show that random sampling schemes often perform as well
                         as, or better than, DP methods [6, 4, 7]. The Repeated Random Sampling (RS2) method [6] builds on
                         this idea by randomly selecting a data subset for each training epoch, aiming to cut training costs.
                         Despite the potential of DP and random sampling methods, a key limitation is the need to determine

                          1st Workshop on Green-Aware Artificial Intelligence, 23rd International Conference of the Italian Association for Artificial
                          Intelligence (AIxIA 2024), November 25–28, 2024, Bolzano, Italy
                         ∗
                              Corresponding author.
                          Envelope-Open francesco.scala@icar.cnr.it (F. Scala); luigi.pontieri@icar.cnr.it (L. Pontieri); sergio.flesca@unical.it (S. Flesca)
                          Orcid 0009-0007-5224-0910 (F. Scala); 0000-0003-4513-0362 (L. Pontieri); 0000-0002-4164-940X (S. Flesca)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                      Fine-tune
                                                                               Hyperparameters' Tuning


                      Warm-up                               Dataset
                                                                      Data selection              Φ Training
                                                            scoring
                         Pre-trained
                       model Φ selection


                          Φ Compression
                                                                                  Train dataset


                                                                                       No           One target is
                          Φ RS2 Training                                                           reached (acc.,
                                                                                                      energy)?


                                                                                                           Yes


                       Start                                                                          Finish

                                           Whole dataset


Figure 1: Overview of the proposed DPET framework.


the optimal data amount beforehand, misjudgments can lead to wasted time and energy. To address
this, Active Learning (AL) approaches [8, 9, 10], originally designed to minimize labeling costs, can
be leveraged to reduce training costs by focusing on informative data subsets [11, 12, 13]. However,
the repeated retraining required by standard AL schemes can be too energy-intensive, as shown in
empirical experiments [11].

Contribution Given the limitations of current data pruning and sampling methods for efficient Deep
Neural Network (DNN) learning, this paper introduces DPET , a framework that combines a RS2-based
DNN warm-up with an iterative AL-like scheme to refine the model with informative data selections.
  Experiments on benchmark datasets demonstrate that DPET significantly reduces the computational
and energetic costs of training large DNN models without sacrificing accuracy. These findings suggest
that DPET holds promise for promoting more sustainable DNN training, particularly in the context of
Green-AI initiatives.


2. Proposed approach
Let 𝒟 a dataset, to be pruned, that consists of pairs (𝑥𝑖 , 𝑦𝑖 ), where each 𝑥𝑖 is a data instance, and each 𝑦𝑖 is
a one-hot vector representing a class label, with 𝐶 classes in total. The goal of data pruning is to extract
a representative subset 𝒟𝑠 such that its size is much smaller than 𝒟. A DNN model 𝜙𝜃 , parameterized
by 𝜃, is trained using gradient descent, with the additional benefit that training on 𝒟𝑠 consumes less
energy compared to training on the full dataset 𝒟.
   The proposed DPET framework, a “work-in-progress” extension of [13], is currently under devel-
opment, with the dotted blocks in Figure 1 representing ongoing work. It operates in two distinct
phases:

    • Warm-up: DPET first selects the most suitable pre-trained model from an internal pool or
      some external repositories (like those that have become available recently in many application
      sectors like, e.g., natural language processing/understanding, computer vision), based on the
      characteristics of the current dataset. Afterward, it applies optimization techniques such as
      model pruning, cutout regularization, and low-precision parameter quantization to compress the
      model and enhance its efficiency. At the end is used the RS2 algorithm to quickly converge on
      a preliminary model configuration for 𝜙𝜃 . This approach is faster than traditional methods like
      SGD. However, RS2’s performance gain slows down over time, so the algorithm switches to an
      active learning (AL)-based procedure to continue improving model performance efficiently.
    • Fine-tune: DPET selects additional instances from 𝒟 iteratively, using an instance ranking function
      𝑓𝑟𝑎𝑛𝑘 and a dissimilarity measure d (such as Euclidean distance or KL divergence) to compute
      an importance score for each instance. The model is updated and trained using both new and
      “old” (i.e. previously-selected) instances –for the sake of efficiency, the user could require the
      framework to only select a subset of the old instances leveraging replay-based mechanisms like
      those used in Continual Learning [14]. This iterative process allows the algorithm to reduce
      computation and energy costs. This adaptive method is more flexible than traditional data
      pruning techniques, which require pre-determined reduction levels. Due to the high quantity of
      parameters involved, a component named Hyperparameters’ Tuner is involved in the progress to
      improve DPET performances by automatically managing them.

  In the end, DPET produces a trained model 𝜙 and a coreset. This hybrid approach of combining RS2
with an AL-based fine-tuning procedure helps balance performance and energy efficiency.

Setting guidelines and implementation choices AL provides a way to optimize AI model training
while reducing energy consumption, in line with Green-AI principles. The idea is to strategically select
the most informative data samples from a larger dataset, which can help reduce computational costs
required to reach a target accuracy level. However, the extent of energy savings depends on several
factors:

    • Data Sampling Effectiveness: The effectiveness of AL-like data selection strategies is essential to
      significantly reduce the overall training costs and avoid to undermine model quality. The greater
      is the per-step data sampling effectiveness (and, hence, the lower is the total number of training
      instances and AL rounds required), the more substantial is the potential energy saving.
    • Data Sampling Complexity: AL sampling methods differ in computational cost. Simpler ap-
      proaches are less resource-intensive, while more complex methods can be costly. If the sampling
      process is too expensive, it may counteract the overall energy savings.


3. Experimental Evaluation
Test setting and terms of comparison In the experimental evaluation, we used the widely known
CIFAR-10 dataset, containing 60,000 images divided into 10 classes. We compared a partial imple-
mentation of DPET (namely, without using pre-trained models and data replay mechanisms) against
several methods: standard full-dataset training (Standard train), the RS2 algorithm [6], the pure AL
approach from [11], and state-of-the-art DP methods such as Glister [15], GraphCut [16], CRAIG [17],
and GraNd [18], using implementations from DeepCore [4].
   Each method was evaluated by measuring both the energy consumption (in Wh) and the accuracy
of the trained models. Following the time-to-accuracy approach in [6], we set accuracy targets (from
60% to 90%) and measured the energy requested by each method used to reach these targets, so we
measured the energy-to-accuracy, unless it exhausted its budget of energy or epochs beforehand.

Hyperparameter Configuration For each test, we trained a ResNet18 model [19] using mini-batch
Stochastic Gradient Descent (SGD), alongside a Cross-Entropy loss. We tested DPET using three ranking
function variants (𝑓𝑟𝑎𝑛𝑘 ): Least Confidence, Margin Sampling and Entropy scores.
   The approach is flexible and can incorporate other AL techniques. For DPET , in the warm-up, we
ran RS2 with a 30% data reduction per epoch (𝑟 = 0.3) over 20 epochs (bootEpcs = 20). In the fine-tune
rounds, 1,000 instances were selected per round, with 10 optimization epochs. Hyperparameters for RS2
and the AL method from [11] were set according to their original papers. RS2 was tested with reduction
factors of 20%, 10%, and 5%, with a total budget of 200 epochs.
    DPET_margin       DPET_entropy
    DPET_lc                                          Target        60%   65%   70%   75%   80%   85%    90%
                                                 Standard train     20    39    59    78   157    626   1018
                                                AL (margin) [11]   135   218   288   357   711   1167   2421
                                                  GraNd [18]       838   865   878   902   924   1042   1232
                                                   Craig [17]      175   196   231   248   344    442    627
                                                  Glister [15]     108   117   133   172   192    380    599
                                                 GraphCut [16]     239   364   396   486   621    762    993
                                                RS2 w/o repl 20%    32    39    48    65    82    168    237
                                                RS2 w/o repl 10%    44    59    75    97   108    153    197
A


                                                RS2 w/o repl 5%     43    56    62    77    94    121     -
                                                 DPET (margin)     19    38    42    47    63    94     196
                                                 DPET (entropy)    19    39    41    42    63    99     220
                                                   DPET (lc)       19    38    41    42    63    98     204

Figure 1
Energy-to-accuracy of a ResNet18 for DPET compared to RS2, AL, standard training and some DP techniques of
ResNet18, targeting 90% accuracy on the full CIFAR-10 dataset. Values are reported every 10 training epochs.
The plot is shown on the left, while the numerical values are presented in a table on the right.


Test results The analysis focuses on three key aspects: computational savings, accuracy, and pruning
ratio, comparing the performance of DPET with the other techniques. A significant advantage of DPET
is its iterative approach to data selection, which eliminates the need to pre-determine the amount of
data to prune, indeed, it dynamically adds only the necessary data to achieve the target accuracy. This
smart data selection allows DPET to reach the desired accuracy more quickly, resulting in computational
savings, even if the pruning ratio is lower than that of other methods.
   The results indicate that DPET , as shown in the figure 1, outperforms the other techniques in terms
of computational savings for the same target accuracy, highlighting the effectiveness of its iterative
data selection strategy. This approach not only speeds up the training process but also enables the
achievement of all target accuracy levels considered. In contrast, the other analyzed methods (excluding
the standard training baseline) fail to meet some of the target accuracy thresholds.


4. Conclusion
Based on our analysis, despite its partial implementation, DPET stands out as an efficient method for
training deep neural network (DNN) models on large datasets, providing significant computational
savings compared to standard training and active learning (AL) approaches without compromising model
accuracy. It consistently outperforms other data pruning techniques regarding energy consumption
across various target accuracy levels. This computational efficiency makes DPET particularly suitable
for resource-constrained devices and aligns with the goals of Green AI. To enhance the proposed
framework, we will finalize the implementation and optimization of the dotted blocks to evaluate the
framework’s definitive performance.

Acknowledgment This work was partially supported by the PNRR research project project FAIR -
Future AI Research (PE00000013), Spoke 9 - Green-aware AI, under the NRRP (National Recovery and
Resilience Plan) MUR program funded by the NextGenerationEU.

References
 [1] A. de Vries, The growing energy footprint of artificial intelligence, Joule 7 (2023) 2191–2194.
     doi:https://doi.org/10.1016/j.joule.2023.09.004 .
 [2] E. Strubell, A. Ganesh, A. McCallum, Energy and policy considerations for deep learning in
     NLP, in: A. Korhonen, D. R. Traum, L. Màrquez (Eds.), Proceedings of the 57th Conference
     of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2,
     2019, Volume 1: Long Papers, Association for Computational Linguistics, 2019, pp. 3645–3650.
     doi:10.18653/V1/P19- 1355 .
 [3] S. Flesca, F. Scala, E. Vocaturo, F. Zumpano, On forecasting non-renewable energy production
     with uncertainty quantification: A case study of the italian energy market, Expert Systems with
     Applications 200 (2022) 116936. doi:https://doi.org/10.1016/j.eswa.2022.116936 .
 [4] C. Guo, B. Zhao, Y. Bai, Deepcore: A comprehensive library for coreset selection in deep learning,
     in: Database and Expert Systems Applications: 33rd International Conference, DEXA 2022, Vienna,
     Austria, August 22–24, 2022, Proceedings, Part I, Springer-Verlag, Berlin, Heidelberg, 2022, pp.
     181–195. doi:10.1007/978- 3- 031- 12423- 5_14 .
 [5] N. Sachdeva, J. J. McAuley, Data distillation: A survey, CoRR abs/2301.04272 (2023). doi:10.48550/
     ARXIV.2301.04272 . arXiv:2301.04272 .
 [6] P. Okanovic, R. Waleffe, V. Mageirakos, K. Nikolakakis, A. Karbasi, D. Kalogerias, N. M. Gürel,
     T. Rekatsinas, Repeated random sampling for minimizing the time-to-accuracy of learning, in:
     The Twelfth International Conference on Learning Representations, 2024.
 [7] F. Ayed, S. Hayou, Data pruning and neural scaling laws: fundamental limitations of score-based
     algorithms, 2023. arXiv:2302.06960 .
 [8] S. Flesca, D. Mandaglio, F. Scala, A. Tagarelli, A meta-active learning approach exploiting instance
     importance, Expert Systems with Applications 247 (2024) 123320. doi:https://doi.org/10.1016/
     j.eswa.2024.123320 .
 [9] S. Flesca, D. Mandaglio, F. Scala, A. Tagarelli, Learning to active learn by gradient variation based
     on instance importance, in: 2022 26th International Conference on Pattern Recognition (ICPR),
     2022, pp. 2224–2230. doi:10.1109/ICPR56361.2022.9956039 .
[10] S. Flesca, D. Mandaglio, F. Scala, A. Tagarelli, A meta-active learning approach exploiting instance
     importance based on learning gradient variation, in: Proceedings of the 31st Symposium of
     Advanced Database Systems, Galzingano Terme, Italy, July 2nd to 5th, 2023, volume 3478 of CEUR
     Workshop Proceedings, CEUR-WS.org, 2023, pp. 535–544.
[11] D. Park, D. Papailiopoulos, K. Lee, Active learning is a strong baseline for data subset selection, in:
     Has it Trained Yet? NeurIPS 2022 Workshop, 2022.
[12] F. Scala, S. Flesca, L. Pontieri, Data filtering for a sustainable model training, in: Proceedings of
     the 32nd Symposium of Advanced Database Systems, Villasimius, Italy, June 23rd to 26th, 2024,
     volume 3741 of CEUR Workshop Proceedings, CEUR-WS.org, 2024, pp. 205–216.
[13] F. Scala, S. Flesca, L. Pontieri, Play it straight: An intelligent data pruning technique for green-ai,
     in: D. Pedreschi, A. Monreale, R. Guidotti, R. Pellungrini, F. Naretto (Eds.), Discovery Science,
     Springer Nature Switzerland, Cham, 2025, pp. 69–85.
[14] L. Wang, X. Zhang, H. Su, J. Zhu, A comprehensive survey of continual learning: theory, method
     and application, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
[15] Z. Yang, H. Yang, S. Majumder, J. Cardoso, G. Gallego, Data pruning can do more: A comprehensive
     data pruning approach for object re-identification, Transactions on Machine Learning Research
     (2024).
[16] R. Iyer, N. Khargoankar, J. Bilmes, H. Asanani, Submodular combinatorial information measures
     with applications in machine learning, in: V. Feldman, K. Ligett, S. Sabato (Eds.), Proceedings of
     the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of
     Machine Learning Research, PMLR, 2021, pp. 722–754.
[17] B. Mirzasoleiman, J. Bilmes, J. Leskovec, Coresets for data-efficient training of machine learning
     models, in: H. D. III, A. Singh (Eds.), Proceedings of the 37th International Conference on Machine
     Learning, volume 119 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 6950–6960.
[18] M. Paul, S. Ganguli, G. K. Dziugaite, Deep learning on a data diet: Finding important examples
     early in training, in: A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in
     Neural Information Processing Systems, 2021.
[19] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: Proceedings of
     2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. doi:10.
     1109/CVPR.2016.90 .