<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Less is More: Data Pruning for Faster Adversarial Training</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yize Li</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pu Zhao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xue Lin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bhavya Kailkhura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryan Goldhahn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lawrence Livermore National Laboratory</institution>
          ,
          <addr-line>7000 East Ave, Livermore, CA 94550</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Northeastern University</institution>
          ,
          <addr-line>360 Huntington Ave, Boston, MA 02115</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Deep neural networks (DNNs) are sensitive to adversarial examples, resulting in fragile and unreliable performance in the real world. Although adversarial training (AT) is currently one of the most efective methodologies to robustify DNNs, it is computationally very expensive (e.g., 5 ∼ 10× costlier than standard training). To address this challenge, existing approaches focus on single-step AT, referred to as Fast AT, reducing the overhead of adversarial example generation. Unfortunately, these approaches are known to fail against stronger adversaries. To make AT computationally eficient without compromising robustness, this paper takes a diferent view of the eficient AT problem. Specifically, we propose to minimize redundancies at the data level by leveraging data pruning. Extensive experiments demonstrate that the data pruning based AT can achieve similar or superior robust (and clean) accuracy as its unpruned counterparts while being significantly faster. For instance, proposed strategies accelerate CIFAR-10 training up to 3.44× and CIFAR-100 training to 2.02× . Additionally, the data pruning methods can readily be reconciled with existing adversarial acceleration tricks to obtain the striking speed-ups of 5.66× and 5.12× on CIFAR-10, 3.67× and 3.07× on CIFAR-100 with TRADES and MART, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Adversarial Robustness</kwd>
        <kwd>Adversarial Data Pruning</kwd>
        <kwd>Eficient Adversarial Training</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The AAAI-23 Workshop on Artificial Intelligence Safety (SafeAI 2023),
Feb 13-14, 2023, Washington, D.C., US
† Corresponding author.
$ li.yize@northeastern.edu (Y. Li); p.zhao@northeastern.edu
(P. Zhao); xue.lin@northeastern.edu (X. Lin); kailkhura1@llnl.gov
(B. Kailkhura); goldhahn1@llnl.gov (R. Goldhahn)</p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License pruning with Bullet-Train [22], which allocates dynamic
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) computing cost to categorized training data, further
improves the speed-ups by 5.66× and 3.67× on CIFAR-10
and CIFAR-100, respectively. Our main contributions are
summarized below.</p>
      <p>• We explore eficient AT from the lens of data
pruning, where the acceleration is achieved by only
focusing on the representative subset of the data.
• We propose two data pruning algorithms,
Adv</p>
      <p>GRAD-MATCH and Adv-GLISTER, and perform
a comprehensive experimental study. We
demonstrate that our data pruning methods yield
consistent efectiveness across diverse robustness
evaluations, e.g., PGD [13] and AutoAttack [23].
• Furthermore, combining our eficient AT
framework with the existing Bullet-Train approach [22]
achieves state-of-the-art performance in training
cost.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>tacks [13, 24, 25, 26, 27] refer to detrimental techniques
that inject imperceptible perturbations into the inputs
and mislead decision making process of networks. In
this paper, we mainly investigate ℓ attacks, where
 ∈ {0, 1, 2, ∞}. Fast Gradient Sign Method (FGSM)
[24] is the cheapest one-shot adversarial attack. Basic
Iterative Method (BIM) [28], Projected Gradient Descent
(PGD) [13] and CW [25] are stronger attacks that are
iterative in nature. Adversarial examples are used for
the assessment of model robustness. AutoAttack [23]
ensembles multiple attack strategies to perform a fair
and reliable evaluation of adversarial robustness.</p>
      <p>
        Various defense methods [29, 30, 31, 32] have been
proposed to tackle the vulnerability of DNNs against
adversarial examples, while most of the approaches are built
over AT, where perturbed inputs are fed to DNNs to learn
from adversarial examples. Projected Gradient Descent
(PGD) based AT is one of the most popular defense
strategies [13], which uses a multi-step adversary. Training
only with adversarial samples can lead to a drop in clean
accuracy [33]. To improve the trade-of between accuracy
and robustness, TRADES [20] and MART [21] compose
the training loss with both the natural error term and the
robustness regularization term. Curriculum Adversarial
Training (CAT) [34] robustifies DNNs by adjusting PGD
steps arranging from weak attack strength to strong
attack strength, while Friendly Adversarial Training (FAT)
[35] performs early-stopped PGD for adversarial
examples.
computation consumption depending on the number of
steps used in generating adversarial examples. The major
work to achieve training eficiency focuses on how to
reduce the number of attack steps and maintain the
stability of one-step FGSM-based AT. Free AT [
        <xref ref-type="bibr" rid="ref10">36</xref>
        ] performs
FGSM perturbations and updates model weights on the
simultaneous mini-batch. FAST AT [14] generates FGSM
attacks with random initialization but still sufers from
‘catastrophic overfitting’. Therefore, Gradient alignment
regularization [17], suitable inner interval (step size) for
the adversarial direction [16], and Fast Bi-level AT
(FASTBAT) [
        <xref ref-type="bibr" rid="ref11">37</xref>
        ] are proposed to prevent such failure.
      </p>
      <sec id="sec-2-1">
        <title>Data pruning.</title>
        <p>
          Eficient learning through data subset
selection economizes on training resources. Proxy
functions [
          <xref ref-type="bibr" rid="ref12 ref13">38, 39</xref>
          ] take advantage of the feature representation
from the tiny proxy model to select the most informative
subset for training the larger one. Coreset-based
algorithms [
          <xref ref-type="bibr" rid="ref14">40</xref>
          ] mine for a small representative subset that
approximates the entire dataset following established
criapproximates the full gradient and GRAD-MATCH [
          <xref ref-type="bibr" rid="ref16">42</xref>
          ]
minimizes the gradient matching error. GLISTER [
          <xref ref-type="bibr" rid="ref17">43</xref>
          ]
prunes the training data by maximizing log-likelihood
for the validation set.
3. Data Pruning Based Adversarial
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Training</title>
      <sec id="sec-3-1">
        <title>3.1. Preliminaries</title>
        <p>AT [13] aims to solve the min-max optimization problem
as follows:
min

1
∑︁</p>
        <p>︂[
|| (,)∈
 ∈△
sample and label from the training dataset ,  denotes
imperceptible adversarial perturbations injected into 
under the norm constraint by the constant strength  , i.e.,
△ := {‖ ‖∞ ≤  }, and ℒ is the training loss. During
the adversarial procedure, the optimization first
maximizes the inner approximation for adversarial attacks
and then minimizes the outer training error over the
model parameter  . A typical adversarial example
generation procedure involves multiple steps for the stronger
adversary, e.g.,
+1 = Proj△ (︀  +  sign (︀
∇ ℒ (︀  ; , ︀)
, (2)</p>
        <sec id="sec-3-1-1">
          <title>Adversarial attacks and defenses.</title>
          <p>
            Adversarial at- teria. CRAIG [
            <xref ref-type="bibr" rid="ref15">41</xref>
            ] selects the training data subset which
ial examples, the learning overhead is usually
dramatically larger than the standard training, e.g., 5 ∼
training showing empirical robustness against adversar- size  , using the sign of gradients.
where the projection follows  -ball at the step  with step
Outlier
Boundary
Robust
F
30
20
10
90
80
70
60
Outlier
Boundary
Robust
          </p>
          <p>Epoch
(a) TRADES.
F
30
20
10
90
80
70
60
3.2. General Formulation for Adversarial</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Data Pruning</title>
        <p>tions towards eficiently achieving high clean accuracy.</p>
        <p>
          We extend these approaches in the context of adversarial
robustness. Motivated by GLISTER [
          <xref ref-type="bibr" rid="ref17">43</xref>
          ], we first consider
Our adversarial data pruning consists of two steps: ad- training a subset that obtains the optimal adversarial
versarial subset selection and AT with the subset of data. log-likelihood on the validation set in Eq. (5), defined as
In the specified epoch, adversarial subset selection first
ifnds a representative subset of data from the entire
training dataset. Next, AT is performed with the selected
subset. Though the size of the subset keeps the same in
diferent iterations, the data in the subset is updated in
each iteration based on the diferent status of the model
weights. We formulate the AT with the data subset in
Eq. (3) and adversarial subset selection in Eq. (4).
        </p>
        <p>min

1</p>
        <p>∑︁
(,)∈
︂[
 ∈△
where  represents the complete training set and 
represents the perturbation under ∞ norm constraint △</p>
        <p>. The
selected subset  with the size  is obtained by
optimizing the function , which aims to narrow the diference
between  and  under specific criteria with model
parameters  . Note that the data selection step is performed  * are adversarial examples obtained by maximizing
periodically to achieve computational savings.</p>
        <p>
          Recent data subset selection schemes, GRAD-MATCH
 ( ;  +   ,  ) and ( ;  +  , ),
respectively. During the data selection, the adversarial
gra[
          <xref ref-type="bibr" rid="ref16">42</xref>
          ] and GLISTER [
          <xref ref-type="bibr" rid="ref17">43</xref>
          ], have made significant
contribudient diference between the weighted subset loss and
where  is the negative log-likelihood on validation
set;  * is the adversarial perturbation obtained by
maximizing  (  ;  +   ,  ).
        </p>
        <p>
          Another adversarial data pruning approach is inspired
by GRAD-MATCH [
          <xref ref-type="bibr" rid="ref16">42</xref>
          ], which aims to find the data
sub(3) set whose gradients closely match those of the full
train
        </p>
        <p>ing data. Adv-GRAD-MATCH is formulated as Eq. (6):
(4)
() = ‖</p>
        <p>∇ ℒ ( ;  +  * ,  )
∑︁
( , )∈
−∇</p>
        <p>ℒ
(,)∈
( ;  +  *, )‖</p>
        <p>(6)
where  is the weight vector associated with each
instance  in the subset ; ℒ and ℒ denote the
training loss over the subset and entire dataset;  * and</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment Setup</title>
        <p>
          To evaluate the eficiency and generality of the proposed
method, we apply adversarial training loss functions from
TRADES [20] or MART [21] on the standard datasets,
CIFAR-10, CIFAR-100 [
          <xref ref-type="bibr" rid="ref18">44</xref>
          ] trained on ResNet-18 [
          <xref ref-type="bibr" rid="ref19">45</xref>
          ]. Our
adversarial data pruning methods include
Adv-GRADMATCH and Adv-GLISTER with diferent data portions
(subset size) [30%, 50%] with 100 and 200 epochs where
the selection interval is 20 (i.e., perform adversarial subset
selection every 20 epochs of AT). The original training
dataset is divided into the train (90%) and the
validation set (10%) in Adv-GLISTER. The optimizer is SGD
with momentum 0.9 and weight decay 2e-4 for TRADES
and 3.5e-3 for MART. For Adv-GRAD-MATCH and
AdvGLISTER, the initial learning rate is 0.01 and 0.02 on
CIFAR-10 and 0.08 and 0.05 on CIFAR-100 respectively.
Besides the original TRADES [20] and MART [21]
methods, we also compare our approach with Bullet-Train
[22]. PGD attack [13] (PGD-50-10) is adopted for
evaluating the robust accuracy, ranging from low magnitude
( = 4/255) to high magnitude ( = 16/255) with 50
iterations as well as 10 restarts at the step-size  = 2/255
under ∞-norm. Moreover, AutoAttack [23] is leveraged
for the reliable robustness evaluation. Additionally, our
methods can also be combined with Bullet-Train [22]
and we term them as Adv-GRAD-MATCH&amp;Bullet and
Adv-GLISTER&amp;Bullet.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Main Results</title>
        <p>Table 1 shows the results of our Adv-GLISTER and
AdvGRAD-MATCH for TRADES compared with the
original TRADES and Bullet-Train methods. The
comparison is in terms of clean and robust accuracy (under samples gradually increases and eventually dominates,
two attack methods, PGD Attack [13] and AutoAttack while the number of outliers and boundary data points
[23]) along with the training speed-up. We observe that decreases over epochs, revealing similar achievements
compared to the baselines, the training eficiency of our in TRADES-based AT and data pruning-based methods.
method is improved significantly on CIFAR-10, while In addition, the ultimate portions of three sets explain
the decrease happens on the clean accuracy and robust- the clean accuracy and robustness degrading of our
apness under AutoAttack and PGD attacks for diferent proaches. In detail, two baselines obtain more robust
values of  . Especially, for  = 16/255, the robust accu- samples and fewer boundary and outlier examples.
racy can be improved from 16.05% (Bullet-Train [22]) We further evaluate the performances of adversarial
to 16.52% and 17.49% with our Adv-GRAD-MATCH data pruning based on the loss of MART in Table 2.
Reand Adv-GLISTER, indicating our defensive capability sults are consistent with our findings on TRADES in
on powerful attacks. As displayed in Table 1, our Adv- Table 1.</p>
        <p>GRAD-MATCH and Adv-GLISTER reduce the training
overheads (seconds per epoch) enormously and achieve 4.3. Ablation Studies
3.44× and 3.09× training speed-ups. After combining
our approaches with Bullet-Train [22], an even faster Epoch. We first consider the training epoch. Table 3
acceleration of 5.12× can be reached. shows that longer training improves both clean and
ro</p>
        <p>On CIFAR-100, the validity of our schemes is consistent bust accuracy. Due to the shrinking data size, more
as well. The reason why both clean and robust accuracy epochs are required to enhance data-eficient adversarial
drop might be that our data pruning schemes struggle learning, in alignment with standard data pruning
trainwith the dimensionality and complexity of the dataset. ing. However, 100-epoch training appears to be suficient
Regardless, our schemes still result in conspicuous com- for the small dataset.
putation savings compared with other baselines. Subset Size. We experiment with diferent subset sizes.</p>
        <p>To understand the robustness improvements of our Moving from the extremely small subset (10% of the full
schemes, we track the dynamics of the outlier, robust, training set) to a larger subset (70%) in Fig. 2, the
obserand boundary sets (similar to [22]) using PGD-5-1 attack. vation is that robust accuracy gradually increases to that
Without any attack, the outlier examples have already of the full dataset. This highlights the benefit of pruning
been mistaken by the model, but boundary and robust with optimal subset size. We can see that 30% is an
approexamples are correctly identified. After adversarial at- priate choice for the CIFAR-10 subset size, after taking
tacks, boundary examples are incorrectly classified while the global eficiency into account.
robust examples are still correctly classified. Fig. 1 dis- Number of selection rounds. In Sec. 4.2, our
experiplays the dynamics of the outlier, boundary, and robust ments perform adversarial data pruning every 20 epochs
examples on CIFAR-10 for various schemes. During the (with 9 selections). Here we present the results of data
model training and data selection, the number of robust pruning every 40 epochs (with 4 selections). As shown in
Adv-GLISTER Efficiency
Adv-GRAD-MATCH Efficiency</p>
        <p>Adv-GLISTER Efficiency</p>
        <p>Adv-GRAD-MATCH Efficiency
11
TARdAv-DGELSISTER 10
Adv-GRAD-MATCH 9
TRADES 8
7
6
5
4
3
2
1
0
100%
p
u
d
e
e
p
S
100%
p
u
d
e
e
p
S
5. Conclusion and Future Work
In this paper, we investigated eficient adversarial
training from a data-pruning perspective. With
comprehensive experiments, we demonstrated that proposed
adversarial data pruning approaches outperform the existing
baselines by mitigating substantial computational
overhead. These positive results pave a path for future
research on accelerating AT by minimizing redundancy at
the data level. Our future work will focus on designing
more accurate pruning schemes for large-scale datasets.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This work was performed under the auspices of the U.S.</p>
      <p>Department of Energy by Lawrence Livermore National
Laboratory under Contract DE-AC52-07NA27344 and
was supported by LLNL-LDRD Program under Project
No. 20-SI-005 (LLNL-CONF-842760).
in: Advances in Neural Information Processing Sys- 2021.</p>
      <p>tems (NeurIPS), 2020. [23] F. Croce, M. Hein, Reliable evaluation of adversarial
[10] A. Athalye, N. Carlini, D. Wagner, Obfuscated gra- robustness with an ensemble of diverse
parameterdients give a false sense of security: Circumventing free attacks, in: International Conference on
Madefenses to adversarial examples, in: Proceedings chine Learning (ICML), 2020.
of the 35th International Conference on Machine [24] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining
Learning (ICML), 2018. and harnessing adversarial examples, in: arXiv,
[11] E. Wong, Z. Kolter, Provable defenses against ad- 2015.</p>
      <p>versarial examples via the convex outer adversarial [25] N. Carlini, D. Wagner, Towards evaluating the
ropolytope, in: Proceedings of the 35th International bustness of neural networks, in: IEEE Symposium
Conference on Machine Learning (ICML), 2018. on Security and Privacy (S&amp;P), IEEE, 2017.
[12] H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, [26] F. Croce, M. Hein, Sparse and imperceivable
adverS. Bubeck, G. Yang, Provably robust deep learning sarial attacks, in: Proceedings of the IEEE/CVF
Invia adversarially trained smoothed classifiers, in: ternational Conference on Computer Vision (ICCV),
Advances in Neural Information Processing Sys- 2019.</p>
      <p>tems (NeurIPS), 2019. [27] Q. Zhang, X. Li, Y. Chen, J. Song, L. Gao, Y. He,
[13] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, H. Xue, Beyond imagenet attack: Towards crafting
A. Vladu, Towards deep learning models resistant adversarial examples for black-box domains, in:
to adversarial attacks, in: International Conference International Conference on Learning
Representaon Learning Representations (ICLR), 2018. tions (ICLR), 2022.
[14] E. Wong, L. Rice, J. Z. Kolter, Fast is better than free: [28] A. Kurakin, I. Goodfellow, S. Bengio, Adversarial
Revisiting adversarial training, in: International examples in the physical world, 2016. URL: https:
Conference on Learning Representations (ICLR), //arxiv.org/abs/1607.02533. doi:10.48550/ARXIV.
2020. 1607.02533.
[15] B. S. Vivek, R. Venkatesh Babu, Single-step adver- [29] D. Meng, H. Chen, Magnet: A two-pronged defense
sarial training with dropout scheduling, in: 2020 against adversarial examples, in: Proceedings of
IEEE/CVF Conference on Computer Vision and Pat- the 2017 ACM SIGSAC Conference on Computer
tern Recognition (CVPR), 2020. and Communications Security, 2017.
[16] H. Kim, W. Lee, J. Lee, Understanding catastrophic [30] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, J. Zhu,
overfitting in single-step adversarial training, in: Defense against adversarial attacks using high-level
Proceedings of the AAAI Conference on Artificial representation guided denoiser, in: Proceedings
Intelligence (AAAI), volume 35, 2021, pp. 8119– of the IEEE Conference on Computer Vision and
8127. Pattern Recognition (CVPR), 2018.
[17] M. Andriushchenko, N. Flammarion, Understand- [31] A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen,
ing and improving fast adversarial training, in: Ad- L. Shao, Adversarial defense by restricting the
hidvances in Neural Information Processing Systems den space of deep neural networks, in: Proceedings
(NeurIPS), 2020. of the IEEE/CVF International Conference on
Com[18] B. R. Bartoldson, B. Kailkhura, D. Blalock, Compute- puter Vision (ICCV), 2019.</p>
      <p>eficient deep learning: Algorithmic trends and op- [32] Y. Gong, Y. Yao, Y. Li, Y. Zhang, X. Liu, X. Lin, S. Liu,
portunities, arXiv preprint arXiv:2210.06640 (2022). Reverse engineering of imperceptible adversarial
[19] M. Kaufmann, Y. Zhao, I. Shumailov, R. Mullins, image perturbations, in: International Conference
N. Papernot, Eficient adversarial training with on Learning Representations (ICLR), 2022.
data pruning, in: arXiv, 2022. [33] D. Su, H. Zhang, H. Chen, J. Yi, P.-Y. Chen, Y. Gao, Is
[20] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, robustness the cost of accuracy? – a comprehensive
M. I. Jordan, Theoretically principled trade-of be- study on the robustness of 18 deep image
classifitween robustness and accuracy, in: International cation models, in: Proceedings of the European
Conference on Machine Learning (ICML), 2019. Conference on Computer Vision (ECCV), 2018.
[21] Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, Q. Gu, Im- [34] Q.-Z. Cai, C. Liu, D. Song, Curriculum adversarial
proving adversarial robustness requires revisiting training, in: Proceedings of the Twenty-Seventh
misclassified examples, in: International Confer- International Joint Conference on Artificial
Intellience on Learning Representations (ICLR), 2020. gence, IJCAI-18, International Joint Conferences on
[22] W. Hua, Y. Zhang, C. Guo, Z. Zhang, G. E. Suh, Bul- Artificial Intelligence Organization, 2018, pp. 3740–
lettrain: Accelerating robust neural network train- 3747. URL: https://doi.org/10.24963/ijcai.2018/520.
ing via boundary example mining, in: Advances in doi:10.24963/ijcai.2018/520.
Neural Information Processing Systems (NeurIPS), [35] J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama,</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xie</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Self-training with noisy student improves imagenet classification</article-title>
          ,
          <source>in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Foret</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kleiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mobahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Neyshabur</surname>
          </string-name>
          ,
          <article-title>Sharpness-aware minimization for eficiently improving generalization</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.-Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , S.-T. Xu,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Object detection with deep learning: A review</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>30</volume>
          (
          <year>2019</year>
          )
          <fpage>3212</fpage>
          -
          <lpage>3232</lpage>
          . doi:
          <volume>10</volume>
          .1109/TNNLS.
          <year>2018</year>
          .
          <volume>2876865</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. S. A.</given-names>
            <surname>Zaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Ansari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aslam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kanwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Asghar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A survey of modern deep learning based object detection models</article-title>
          ,
          <source>Digital Signal Processing</source>
          <volume>126</volume>
          (
          <year>2022</year>
          )
          <fpage>103514</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , L. u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rutherford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Millican</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. B. Van Den Driessche</surname>
          </string-name>
          , J.
          <string-name>
            <surname>-B. Lespiau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Damoc</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. De Las Casas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Guy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Menick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Ring</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hennigan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Maggiore</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Cassirer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Brock</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Paganini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Osindero</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rae</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Elsen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Sifre</surname>
          </string-name>
          ,
          <article-title>Improving language models by retrieving from trillions of tokens</article-title>
          ,
          <source>in: Proceedings of the 39th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.-Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsieh</surname>
          </string-name>
          , Zoo:
          <article-title>Zeroth order optimization based black-box attacks to deep neural networks without training substitute models</article-title>
          ,
          <source>in: Proceedings of the ACM Workshop on Artificial Intelligence and Security</source>
          , ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          , J. yan Zhu, W. He,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Generating adversarial examples with adversarial networks</article-title>
          ,
          <source>in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence(IJCAI)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Brendel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madry</surname>
          </string-name>
          ,
          <article-title>On adaptive attacks to adversarial example defenses, M. Kankanhalli, Attacks which do not kill training make adversarial learning stronger</article-title>
          ,
          <source>in: Proceedings of the 37th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shafahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Najibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Ghiasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Studer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Davis</surname>
          </string-name>
          , G. Taylor, T. Goldstein,
          <article-title>Adversarial training for free!</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Zhang, P. Khanduri,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chang</surname>
          </string-name>
          , S. Liu,
          <article-title>Revisiting and advancing fast adversarial training through the lens of bi-level optimization</article-title>
          ,
          <source>in: International Conference on Machine Learning (ICML)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>C.</given-names>
            <surname>Coleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mussmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mirzasoleiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bailis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          , Selection via proxy:
          <article-title>Eficient data selection for deep learning</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2020</year>
          . URL: https://openreview.net/forum?id=
          <fpage>HJg2b0VYDr</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kaushal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kothawade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mahadev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Doctor</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Ramakrishnan, Learning from less data: A unified data subset selection and active learning framework for computer vision</article-title>
          , in
          <source>: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>D.</given-names>
            <surname>Feldman</surname>
          </string-name>
          , Core-Sets: Updated Survey, Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>44</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -29349-
          <issue>9</issue>
          _2. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -29349-
          <issue>9</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mirzasoleiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bilmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Coresets for data-eficient training of machine learning models</article-title>
          ,
          <source>in: Proceedings of the 37th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>K.</given-names>
            <surname>Killamsetty</surname>
          </string-name>
          , D. S, G. Ramakrishnan, A. De, R. Iyer,
          <article-title>Grad-match: Gradient matching based data subset selection for eficient deep model training</article-title>
          ,
          <source>in: Proceedings of the 38th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>K.</given-names>
            <surname>Killamsetty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sivasubramanian</surname>
          </string-name>
          , G. Ramakrishnan,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <article-title>Glister: Generalization based data subset selection for eficient and robust learning</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)</source>
          , volume
          <volume>35</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>8110</fpage>
          -
          <lpage>8118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Learning multiple layers of features from tiny images</article-title>
          ,
          <source>Master's thesis</source>
          , Department of Computer Science, University of Toronto (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Identity mappings in deep residual networks</article-title>
          ,
          <source>in: European conference on computer vision (ECCV)</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>630</fpage>
          -
          <lpage>645</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>