<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>E. García-Martín, C. F. Rodrigues, G. Riley, H. Grahn, Estimation of energy consumption
in machine learning, Journal of Parallel and Distributed Computing</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3292500.3330865</article-id>
      <title-group>
        <article-title>Data Filtering for a Sustainable Model Training</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Scala</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Flesca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luigi Pontieri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. Computer Engineering</institution>
          ,
          <addr-line>Modeling, Electronics, and Systems Engineering (DIMES)</addr-line>
          ,
          <institution>University of Calabria</institution>
          ,
          <addr-line>87036 Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of High Performance Computing and Networking (ICAR-CNR)</institution>
          ,
          <addr-line>Via P. Bucci, 87036 Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>134</volume>
      <issue>2019</issue>
      <fpage>23</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>The remarkable capabilities of deep neural networks (DNNs) in addressing intricate problems are accompanied by a notable environmental toll. Training these networks demands immense energy consumption, owing to the vast volumes of data needed, the sizeable models employed, and the prolonged training durations. Compounded by the principles of Green-AI, which emphasize reducing the ecological footprint of AI technologies, this poses a pressing concern. In response, we introduce DFSMT, an approach tailored to selecting a subset of labeled data for training, thereby aligning with Green-AI objectives. Our methodology leverages Active Learning (AL) techniques, which systematically identify and select batches of the most informative instances of the data for model training. Through an iterative application of diverse AL strategies, we curate a labeled data subset that preserves adequate information to maintain model quality standards. Empirical results underscore the efectiveness of our approach, demonstrating substantial reductions in labeled data requirements without significantly compromising model performance. This achievement carries particular significance in the context of Green-AI, providing a pathway to mitigate the environmental impact of AI training processes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Active Learning</kwd>
        <kwd>Green-AI</kwd>
        <kwd>Data Selection</kwd>
        <kwd>Energy Eficiency</kwd>
        <kwd>Sustainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial Intelligence (AI) has undergone significant growth in recent years, bringing about
transformative changes in various industries and ofering innovative solutions to intricate
problems. Its impact spans sectors ranging from healthcare and finance to manufacturing
and retail, reshaping both our lifestyles and professional environments. Nevertheless, this
expansive development has introduced challenges, particularly in terms of increased energy
consumption and, consequently, carbon emissions. Moreover, this issue is projected to escalate
significantly, as highlighted in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The training phase of AI models, with its substantial
demands for data and computing power, is a primary contributor to this energy-intensive
process [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Efectively training high-performing AI models necessitates vast amounts of data
and considerable computing power, resulting in a notable increase in energy consumption.
The carbon emissions linked to AI predominantly stem from the electricity utilized during the
training phase of these models. Since electricity predominantly originates from non-renewable
energy sources, such as coal and natural gas, training AI models significantly contribute to
global warming. Indeed, despite advancements, non-renewable energy sources still dominate
the majority of the energy production landscape [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The aim of reducing the efect on global warming pushed the research community to work on
the topic of Green-AI, whose aim is to reduce the environmental impact of AI by promoting the
development of eficient and sustainable models and algorithms. Green-AI focuses on several
key areas:
• Reducing energy consumption: Developing models and algorithms that require less
energy for training and use;
• Using renewable energy: Powering AI training and use with renewable energy, such as
solar and wind power;
• Developing eficient hardware : Designing hardware specifically for AI that is more
energy eficient;
• Recycling and reuse: Promoting the recycling and reuse of hardware components used
for AI.</p>
      <p>In this paper, we investigate the issue of diminishing energy consumption during the training
phase of AI models. Various methodologies have been introduced to tackle this challenge,
including MdBR [4] for regression on static data, n-gram counting [5] for machine translation
and the enhanced OPF method by Chouvatut et al. [6] which minimizes training set size for
classifiers with minimal accuracy loss. Furthermore, clustering techniques have been employed
to eliminate irrelevant training samples.</p>
      <p>In this work, we investigate the possibility of leveraging Active Learning (AL) [7, 8, 9, 10, 11,
12] to reduce the volume of data required for training AI models, by meticulously selecting the
most informative data points within the dataset, and consequently reduce the energy demands
of their training phase.</p>
      <p>AL techniques are designed to find the most informative data for model training, born out of
the recognition that data labeling is one of the most resource-intensive and time-consuming
processes in AI model training. AL selects data points for labeling, typically by a human expert
annotator, to maximize learning eficiency and minimize the overall data labeling cost. Various
approaches have been defined for this purpose. For instance, Least Confidence Sampling (LCS)
[8] prioritizes items with the lowest confidence for their predicted label, while LAL-IGrad and
its enhancements [10, 11] exploit gradient variation within artificial neural networks to estimate
instance relevance. Additionally, Ash et al. [12] proposed BAIT that is a technique for selecting
batches of samples by optimizing a bound on the Maximum Likelihood Estimators (MLE) error
in terms of the Fisher information.</p>
      <p>In this paper we propose DFSMT, a versatile technique that combines various AL
methodologies, to actively explore the data space within a pool-based framework, thus identifying the
most informative data for the model. AL techniques iteratively select the most informative
subset of labeled data to achieve acceptable model quality. To retain eficiency, the emphasis is
on computationally lightweight techniques; otherwise, the selection process could become more
resource-intensive than training the neural network itself. Experimental results demonstrate
that the proposed technique can significantly reduce the amount of labeled data required for
training AI models, while preserving high model quality. This outcome holds particular
significance within the perspective of Green-AI, as our technique ofers a notable reduction in the
environmental impact of AI. It achieves this by significantly lowering the computational cost
associated with training AI models. Rather than relying on resource-intensive backpropagation
across neural networks, this technique selectively trains on a smaller, optimized dataset obtained
by exploiting AL techniques. This drastic reduction in energy and computational power usage
aligns with a more environmentally friendly approach to AI model training.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In recent years, the field of machine learning has witnessed a growing interest in data
reduction techniques. This interest is motivated by various needs, including the optimization of
computational resources, the reduction of the environmental impact of artificial intelligence
(Green-AI), and the improvement of model generalization. In this context, our work falls within
the research line that aims to reduce the amount of data required for training machine learning
models while maintaining high model quality. Several studies have explored data reduction
approaches in diferent contexts.</p>
      <p>For example, the MdBR [4] (Multidimensional binned reduction) method focuses on regression
tasks and uses discretization and non-parametric reduction techniques to achieve significant
data reduction (over 99%) while maintaining or even improving model performance. However,
MdBR is limited to static data and cannot handle time series. In the field of machine translation,
Lewis et al. [5] proposed an n-gram counting approach that reduces the size of datasets by up
to 90%, without a significant loss of quality (measured by the BLEU score [ 13]). This method is
scalable to large datasets and ofers advantages beyond data reduction, such as faster training
times and smaller model sizes.</p>
      <p>Koggalage et al. [14] proposed a strategy that uses clustering techniques to identify and
remove irrelevant training samples that do not afect the decision boundary, this approach
allows to reducing the training set size without compromising classification accuracy, but it
is specific for SVM. Chouvatut et al. proposed the improved OPF (Optimum-Path Forest) [ 6]
method was developed to reduce the training set size for classifiers. This method is based on a
graph-based algorithm and a segmented linear regression approach to achieve a 7-21% reduction
in the training set size while maintaining similar accuracy (with a 0.2-0.5% decrease). In some
cases, the improved OPF even achieves the exact same accuracy as the original OPF algorithm.</p>
      <p>Yang et al. [15] proposed a method called incremental adaptive deep model (IADM) that
addresses the challenges of training deep models on streaming data with evolving distributions.
It employs an adaptive attention mechanism to adjust model depth and utilizes an
attentionbased Fisher information matrix to prevent catastrophic forgetting, enabling eficient and
accurate learning on incremental data.</p>
      <p>Our work difers from previous ones in the following aspects:
• Combination of diferent active learning (AL) strategies : DFSMT uses a combination
of AL techniques, potentially ofering greater flexibility and adaptability compared to
single-strategy approaches;
• Focus on Green-AI: Our work explicitly emphasizes environmental impact reduction as
a key aspect of data reduction, a unique focus in the current landscape;
• Potentially broader applicability: Our approach aims for broader applicability, not
limited to a specific task or data type.</p>
      <p>By highlighting these strengths and comparing our work to related studies, we can efectively
position our research within the current landscape of data reduction techniques and emphasize
its potential contributions to Green-AI and other research fields. Our proposal contributes to
this line of research by combining diferent active learning techniques to identify the most
informative data points iteratively. This approach has the potential to further reduce the amount
of labeled data required for training high-quality AI models, contributing to more eficient and
environmentally friendly AI development.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Approach</title>
      <p>A classification problem consists in associating every instance taken from a predefined domain
 with a label selected from a fixed domain of labels ℒ. We assume the presence of a set of
instance-label pairs  ⊆  × ℒ , where for each pair ⟨, ⟩ ∈ ,  is an instance in  and 
is the label associated with . Algorithm 1 shows the general schema of the proposed approach,
named Data Filtering for a Sustainable Model Training and algorithm 2 shows how the selection
is performed. DFSMT receives in input the dataset , a neural network model NN, the number
ℎ of the training, the number  of the selection process,  the number of relevant
instances at start,  the number of relevant instances to select at each step and  the a set of
AL techniques. SelectionAlgorithm receives in input  the instances not already selected in
the dataset,  the number of relevant instances to select and  the a set of AL techniques.</p>
      <p>The DFSMT algorithm starts by selecting a number of instances and placing them
in the   for initial training. The model iteratively learns: at each step, 
additional instances are added to the   using SelectionAlgorithm that receives as input ,
, a set of statistics about the samples needed for AL techniques (which may difer
from the techniques themselves), and . During each iteration, the model is
updated/trained with both the new and existing instances. Finally, the trained model is returned.</p>
      <sec id="sec-3-1">
        <title>Algorithm 1: DFSMT</title>
        <p>Data: : dataset, NN: neural network model, ℎ: number of epochs, : number of steps,
: number of relevant instances at start, : number of relevant instances to select at each
step, : a set of AL techniques
1   ← SelectionAlgorithm (, , )
2 Train NN on   for ℎ epochs
3 for  = 1 . . .  do
4 stats ← getStats(, NN, )
5   ←   ∪ SelectionAlgorithm(LS,AS, stats,p)
6 Train NN on   for ℎ epochs</p>
      </sec>
      <sec id="sec-3-2">
        <title>7 return NN</title>
        <p>The core of the proposed approach is the SelectionAlgorithm, which is responsible for
selecting the instances to be used for training. This algorithm combines the active learning
techniques present in the  set. For each instance in , the algorithm calculates a relevance
score and then combines them. Finally, the  instances with the highest scores are selected
and returned. It is obvious that more techniques in , more accurate the selection should be,
but at the expense of energy consumption and computation time.</p>
        <p>Algorithm 2: SelectionAlgorithm</p>
        <p>Data: : not selected instances in the dataset, : a set of AL techniques, stats: A set of data
statistics necessary for , : number of relevant instances to select.
1  ← []
2 for  ∈ LS do
3  ← 0
4 for ℎ ∈ AS do
5  ←  + ℎ(, )
3.1. Computational reduction
Active learning (AL) ofers a pathway to streamline AI model development while aligning
with the principles of Green-AI. The core concept lies in the strategic selection of the most
informative data samples from a larger labeled dataset. By training on this optimized subset,
AL techniques can reduce the overall computational costs associated with reaching a target
accuracy level. The potential for energy reduction is directly linked to the following factors:
• Energy Cost per Data Point: The hardware used (CPUs, GPUs or TPUs) and the
complexity of the neural network architecture dictate the energy expenditure on processing
each data point during training. Optimizing algorithms for specific hardware can further
reduce this cost;
• Data Reduction Efectiveness : A core measure of AL efectiveness is its ability to
drastically reduce the training set size while preserving model performance. The greater
the reduction achievable, the higher the potential energy savings;
• AL Complexity: Active learning techniques range in computational overhead. Simpler
methods like uncertainty sampling may have minimal cost, while more sophisticated
approaches can introduce higher computation, Indeed using some computationally
intensive AL technique may render inefective the proposed method, because the selection
process can become more burdensome wrt the neural network’s training;
• Impact on Training Convergence: The interaction between data reduction and the
model’s convergence behavior cannot be ignored. In some cases, a highly informative
dataset might lead to fewer training iterations, amplifying savings. However, it’s also
possible that more iterations might be required to converge, partially ofsetting the energy
gains.</p>
        <p>The significance of energy conservation has long been recognized [ 16, 17, 18], leading to
ongoing advancements in power consumption estimation methodologies. Alongside these
theoretical developments, practical tools for building energy consumption modeling have
emerged. For the purpose of calculating energy savings, we employed the following formula,
established in the work of Lannelongue et al. (2020) [19]:
 =  × ( ×  ×  +  × ) ×    × 0.001
(1)
Where:
• : is the running time (hours);
• : the number of cores;
• : the size of memory available (gigabytes);
• : the core usage factor (between 0 and 1);
• : the power draw of a computing core;
• : the power draw of the memory (Watt);
•   : is the eficiency coeficient of the data centre.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation</title>
      <p>Data. We used the following dataset to execute the experimental evaluation:
• MNIST [20]: which consists of 60000 instances representing 28x28 gray scale images,
labeled using 10 mutually exclusive classes, with 6000 images per class. The dataset is
organized into 60000 instances as the training set and 10000 instances as the test set. The
latter contains exactly 1000 randomly-selected images from each class, while the training
set is comprised of five training batches, which contain 6000 images from each class;
• Fashion-MNIST [21]: which consists of 60000 instances representing 28x28 gray scale
images, labeled using 10 mutually exclusive classes, with 6000 images per class. The
dataset is organized into 60000 instances as the training set and 10000 instances as the
test set. The author intends Fashion-MNIST to serve as a direct drop-in replacement for
the original MNIST dataset for benchmarking machine learning algorithms. It shares the
same image size and structure of training and testing splits.</p>
      <p>
        Baseline methods. We compared the performance of DFSMT with a classical training approach
that uses all the data available in the dataset. This allowed us to evaluate how our technique
reduces the amount of data required to achieve comparable performance to classical training,
measured in terms of model accuracy. As AL technique we utilized the LCS technique due to its
light weight capabilities. However, this does not preclude the use of other techniques or their
combination. More precisely, given an instance  and a classification model  , the LCS method
measures the uncertainty of  w.r.t.  (()) as () = (1 −  (* |)) × −  1 , where  (* |)
denotes the probability that the model  assigns to the label * for the instance , * is the label
for which  yields the maximum probability on  (i.e., * = arg max  (|)), and  is the
cardinality of the set of labels. Note that the uncertainty function ranges between [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], where
1 is the most uncertain score.
      </p>
      <p>Settings and assessment criteria. To evaluate the efectiveness of DFSMT, we conducted
experiments on two standard image datasets just described. For each dataset, we used the
following neural networks:
• MNIST: A CNN with two convolutional layers (10 and 20 filters, respectively), followed
by a dropout layer and two fully connected layers (50 and 10 neurons);
• Fashion-MNIST: This CNN architecture starts with two convolutional layers, each using
3x3 filters for local pattern extraction. Batch normalization speeds up training, and ReLU
activations provide non-linearity. Max pooling reduces dimensionality. Fully connected
layers then interpret the features, with dropout preventing overfitting. The final 10-output
layer likely corresponds to a 10-class classification task.</p>
      <p>The stochastic gradient descent (SGD) [22] optimization algorithm was used to optimize the
model parameters of the neural network for MNIST, chosen due to its eficiency and reliability
in a variety of machine learning problems. For Fashion-MNIST, however, the Adam [23]
optimization algorithm was selected, potentially due to its faster convergence and adaptability
to complex datasets.</p>
      <p>For MNIST the negative log-likelihood (nll_loss) loss function was used. This function is
specific to the multi-class classification. It measures how closely the model predictions align
with the ground truth labels. For Fashion-MNIST, which is a multi-class classification problem,
as the previous ones, the cross-entropy loss (CrossEntropyLoss) function was used. This function
measures the distance between two probability distributions and has been shown to be efective
for classification problems with a high number of classes.</p>
      <p>Classical training involves using the entire dataset to train the model in a single phase, doing
100 training epochs. This approach can be computationally expensive and require significant
training time, especially for large datasets and models. Incremental training, on the other
hand, adopts an iterative approach. Initially, a small subset of the dataset is used to train the
model (1000 samples), subsequently, the model is updated incrementally with new data acquired
iteratively (1000 samples) per 10 incremental steps, in which are performed 10 training epochs.
This approach can significantly reduce the training time, energy consumption and the amount
of data required, while maintaining high model accuracy.</p>
      <p>We analyzed how the behavior of DFSMT changes when varying the amount of data selected
at each training step with the MNIST dataset 1. Table 1 summarizes our analysis and figure
1 shows them. It includes the amount of data selected at each step of the process, the final
amount of data used at the end of training, the model’s accuracy, average CPU utilization (note
that values exceeding 100% indicate multi-core usage), processing time in milliseconds, energy
consumption (expressed in kWh) calculated using equation 1, and a metric relating accuracy to
energy eficiency (eficiency ratio) calculated as /.</p>
      <p>Then we analyzed the accuracy and loss curves during both classical and incremental training.
This allowed us to monitor the model’s learning in both cases, comparing its evolution with
1Experiments were carried out on an Intel Core i5 CPU @2.30GHz 8259U, 8GB RAM, with Intel Iris Plus Graphics
655 GPU
full and reduced data sets. Accuracy is the primary metric for evaluating a model’s ability
to correctly classify images. The loss measures the model’s error in predicting labels. By
monitoring the loss during training, we can evaluate the model’s ability to learn from the data
and improve its predictions.</p>
      <p>Results. The analysis focuses on three key aspects: computational savings, accuracy and
loss, comparing the performance of DFSMT with classical training on two datasets of varying
complexity: MNIST and Fashion-MNIST. As observed in Table 1, increasing the number of
training instances naturally leads to higher accuracy and energy consumption. Our experiments
aimed to identify the optimal parameters for maximizing the accuracy-energy consumption
relationship. We determined that the “n instances per step" parameter is the primary influencing
factor, with 1000 instances yielding the best results. Consequently, we used this parameter for
our comparative analysis against classical training. While classical training achieved slightly
higher accuracy (96.58% vs. 94.41%), its energy consumption was significantly greater (0.027
kWh vs. 0.011 kWh). This translates to a superior eficiency ratio of DFSMT of 7550.01
compared to 3642.04 with classical training. Figure 1 clearly demonstrates the difering growth
patterns of accuracy and energy consumption. While accuracy increases logarithmically, energy
consumption follows a diferent trajectory. This highlights the inherent trade-of between these
two metrics, emphasizing the need to carefully select parameters for the most eficient model
training.</p>
      <p>DFSMT demonstrated remarkable potential on the Fashion-MNIST dataset. It achieved a
significantly higher eficiency ratio (3737.75 vs. 663.19 with classical training) and drastically
reduced energy consumption (0.024 kWh vs. 0.134 kWh) while maintaining comparable accuracy
(89.21% vs. 89.62%). These results, obtained under identical MNIST settings, underscore DFSMT’s
advantages. By comparing the accuracy trends during classical and incremental training, we
observed:
• Classical Training: Accuracy increased gradually with the number of epochs, reaching
a plateau towards the end of training;
• Incremental Training: DFSMT exhibits a faster learning rate (i.e., steeper upward
trajectory) than classical training on MNIST as the number of training examples increases.</p>
      <p>On Fashion-MNIST, this diference is less pronounced.</p>
      <p>Our analysis of accuracy and loss validates DFSMT’s ability to reduce energy consumption
in machine learning training. Even with less data, incremental training achieved comparable
accuracy to classical training, demonstrating its potential as a more eficient and sustainable
approach.</p>
      <p>Simple</p>
      <p>Incremental</p>
      <p>Simple
0,00 10 20 30 40 50 60 70 80 90 100</p>
      <p>epochs</p>
      <p>Our analysis reveals that classical training converges to the optimum faster than DFSMT, as
evidenced by both loss curve and accuracy trends. While DFSMT’s loss curve initially shows
slightly less stability due to less training data, it eventually stabilizes as the number of training
instances increases.</p>
      <p>DFSMT stands out for its significant computational savings compared to classical training.
The advantage becomes more pronounced with increasing dataset’s instances size.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Based on the conducted analysis, we can confidently state that DFSMT represents an eficient
and performant machine learning method for handling large datasets. The algorithm ofers
significant computational savings compared to classical training, without notable sacrificing
model accuracy. The computational eficiency of DFSMT makes it a promising solution for
machine learning on resource-constrained devices, and also in the context of Green AI, which
is becoming increasingly important due to the climate crisis. Moreover, its ability to handle</p>
      <p>Incremental
2,50
2,00
s 1,50
sLo 1,00
0,50
0,00 10 20 30 40 50 60 70 80 90 100</p>
      <p>epochs
large datasets opens up new possibilities for the use of machine learning models in a variety
of applications, with a positive impact on the eficiency and sustainability of such systems. At
the led of these results we continue the research in this direction making some improvements
to DFSMT exploiting for example the information supplied from the dataset as the label (in
contrast of a simple AL setting) and applying some optimizations to the selected data in order
to keep the dataset balanced. Building upon these findings, our future research endeavors will
focus on refining DFSMT by leveraging dataset-specific information such as the label of the
instances, diverging from simple active learning settings, and implementing optimizations to
maintain dataset balanced. These enhancements aim to further elevate the performance and
versatility of DFSMT, fostering its broader adoption across diverse domains and reinforcing its
role in advancing both eficiency and sustainability in machine learning practices.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This work was partly supported by project FAIR - Future AI Research - Spoke 9 (Directorial
Decree no. 1243, August 2nd, 2022; PE 0000013; CUP B53C22003630006), under the NRRP
(National Recovery and Resilience Plan) MUR program (Mission 4, Component 2 Investment
1.3) funded by the European Union – NextGenerationEU.
duction with uncertainty quantification: a case study of the italian energy market, Expert
Systems with Applications 200 (2022). URL: http://www.sciencedirect.com/science/article/
pii/S0957417422003670. doi:http://doi.org/10.1016/j.eswa.2022.116936.
[4] J. Wibbeke, P. Teimourzadeh Baboli, S. Rohjans, Optimal data reduction of training data in
machine learning-based modelling: A multidimensional bin packing approach, Energies
15 (2022). URL: https://www.mdpi.com/1996-1073/15/9/3092. doi:10.3390/en15093092.
[5] W. Lewis, S. Eetemadi, Dramatically reducing training data size through vocabulary
saturation, in: Proceedings of the Eighth Workshop on Statistical Machine Translation,
WMT@ACL 2013, August 8-9, 2013, Sofia, Bulgaria, The Association for Computer
Linguistics, 2013, pp. 281–291. URL: https://aclanthology.org/W13-2235/.
[6] V. Chouvatut, W. Jindaluang, E. Boonchieng, Training set size reduction in large dataset
problems, in: 2015 International Computer Science and Engineering Conference (ICSEC),
2015, pp. 1–5. doi:10.1109/ICSEC.2015.7401435.
[7] B. Settles, Active Learning Literature Survey, Technical Report, University of
Wisconsin</p>
      <p>Madison Department of Computer Sciences, 2009.
[8] B. Settles, M. Craven, An analysis of active learning strategies for sequence labeling tasks,
in: Proc. of the 2008 Conference on Empirical Methods in Natural Language Processing,
2008, pp. 1070–1079.
[9] S. Kee, E. del Castillo, G. Runger, Query-by-committee improvement with diversity and
density in batch active learning, Information Sciences 454-455 (2018) 401–418. URL:
https://www.sciencedirect.com/science/article/pii/S0020025518303700. doi:https://doi.
org/10.1016/j.ins.2018.05.014.
[10] S. Flesca, D. Mandaglio, F. Scala, A. Tagarelli, A meta-active learning approach exploiting
instance importance, Expert Systems with Applications 247 (2024) 123320. URL: https:
//www.sciencedirect.com/science/article/pii/S0957417424001854. doi:https://doi.org/
10.1016/j.eswa.2024.123320.
[11] S. Flesca, D. Mandaglio, F. Scala, A. Tagarelli, Learning to active learn by gradient
variation based on instance importance, in: 2022 26th International Conference on Pattern
Recognition (ICPR), 2022, pp. 2224–2230. doi:10.1109/ICPR56361.2022.9956039.
[12] J. T. Ash, S. Goel, A. Krishnamurthy, S. M. Kakade, Gone fishing: Neural active learning
with fisher embeddings, in: M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, J. W.
Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual
Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14,
2021, virtual, 2021, pp. 8927–8939. URL: https://proceedings.neurips.cc/paper/2021/hash/
4afe044911ed2c247005912512ace23b-Abstract.html.
[13] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation
of machine translation, in: Proceedings of the 40th annual meeting on association for
computational linguistics, Association for Computational Linguistics, 2002, pp. 311–318.
[14] R. Koggalage, S. K. Halgamuge, Reducing the number of training samples for fast
support vector machine classification, 2004. URL: https://api.semanticscholar.org/CorpusID:
6688904.
[15] Y. Yang, D.-W. Zhou, D.-C. Zhan, H. Xiong, Y. Jiang, Adaptive deep models for incremental
learning: Considering capacity scalability and sustainability, in: Proceedings of the 25th
ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>A. de Vries</surname>
          </string-name>
          ,
          <source>The growing energy footprint of artificial intelligence, Joule</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>2191</fpage>
          -
          <lpage>2194</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S2542435123003653. doi:https: //doi.org/10.1016/j.joule.
          <year>2023</year>
          .
          <volume>09</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Strubell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ganesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <article-title>Energy and policy considerations for deep learning in NLP</article-title>
          , in: A.
          <string-name>
            <surname>Korhonen</surname>
            ,
            <given-names>D. R.</given-names>
          </string-name>
          <string-name>
            <surname>Traum</surname>
          </string-name>
          , L. Màrquez (Eds.),
          <source>Proceedings of the 57th Conference of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2019</year>
          , Florence, Italy,
          <source>July 28- August 2</source>
          ,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3645</fpage>
          -
          <lpage>3650</lpage>
          . URL: https://doi.org/10.18653/v1/p19-
          <fpage>1355</fpage>
          . doi:
          <volume>10</volume>
          .18653/V1/P19-1355.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Flesca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Vocaturo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zumpano</surname>
          </string-name>
          ,
          <article-title>On forecasting non-renewable energy pro-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>