1. Introduction

Bab: A novel algorithm for training clean model based on poisoned data

Chen Chen

Haibo Hong

Tao Xiang

Mande Xie

Jun Shao

1 0 School of Computer Science, Chongqing University (CQU) , 400044 1 School of Computer Science, Zhejiang Gongshang University (ZJSU) , 310018 2 School of Information and Electronic Engineering, Zhejiang Gongshang University (ZJSU) , 310018

Nowadays, machine learning is performing very well in the field of computer vision and natural language processing. However, recent research results indicate that machine learning models are extremely vulnerable to various malicious attacks, among which backdoor attacks are favored by attackers because of their easy deployment and high success rate. In fact, the attacker only needs to put a small amount of malicious data in the training dataset, so that the model triggers abnormal behavior under certain circumstances. In this work, we propose a BAB (backdoor against backdoor) algorithm for training a clean model on poisoned data. The BAB algorithm mainly relies on two characteristics of the backdoors: 1) Multiple backdoors can coexist well in the model 2) When there are multiple backdoors in the same model, the strongest backdoor can make the weaker backdoor inefective. Therefore, we implant a backdoor in the poisoned dataset, and rely on the output performance to refine a training dataset that contains almost no poisoned data, so as to train a clean model with high accuracy. In the experimental part, we test five current mainstream backdoor poisoning attacks. Our experimental results reveal that the BAB algorithm has a remarkable efect on filtering poisoned data: we succeed in obtaining a clean dataset containing less than 0.1% poisoned data, and train a high-precision model with this dataset. Our code is open source in https://gitee.com/dugu1076/bab-algorithm.git.

1. Introduction

or third-party purchases to obtain training data, which gives the attackers many opportunities to carry out the At present, neural networks are gradually applied in vari- backdoor poisoning attack. Unfortunately, most of the ous fields such as image classification[ 1, 2, 3 ] and natural existing defense methods are based on anomaly checking language processing[ 4, 5 ]. Meanwhile, these ubiquitous of the trained model and then repairing the anomalous deep learning systems indeed induce various security model [ 18, 19, 20 ], or filtering the anomalous output of problems, such as evasion attacks[ 6, 7, 8 ], model stealing the trained model [21], which are not applicable to the attacks[ 9, 10 ], membership inference attacks[ 11 ], and stage when the model has not yet been trained. In order backdoor attacks[ 12, 13 ], etc. Malicious attackers can to reduce the losses caused by such attacks, we wonder: utilize these attacks to steal private information or even Is it possible to isolate a completely clean dataset from the make the system misjudgment in some cases, resulting poisoned dataset and employ it to train a clean model? in immeasurable losses. In this article, we focus more Intuitively, this is not a very simple task. One reason on the backdoor poisoning attack. Compared with or- is the unexplainability of neural networks. The essence dinary data poisoning attacks[ 14, 15, 16 ], the backdoor of the neural network model is the combination of linear poisoning attack does not afect the accuracy of the orig- transformation and nonlinear transformation between inal task, but adds backdoors to the model that are only matrices, and these single transformations have no practitriggered in specific situations. The backdoor poison- cal and specific meaning, which also makes it impossible ing conditions are very easy to implement: contaminate to detect abnormalities directly from the internal parampart of the training dataset (such as adding a patch) and eters of the model. In addition, the constant update of the modifying the contaminated data label, a simple back- backdoor poisoning attack renders the means of manual door attack is done [ 17 ]. The trained model behaves verification inefective. Early backdoor attacks[ 17 ] have the same as the normal model when it encounters be- obvious disadvantages in that the triggers can be sepanign input, but when non-benign input (with triggers) rated by human eyes. However, with the deepening of is provided, the model behaves abnormally. To make research, SIG[22], Refool[23], CBA[24] and other attacks matters worse, due to the deepening of the model depth, have been introduced. The triggers and target labels of training a high-precision neural network model often such poisoned data are integrated into the dataset in a requires a large dataset. Many trainers rely on crawlers very reasonable way, which makes manual verification impossible.

In this paper, we propose a backdoor against backdoor (BAB) algorithm that is able to filter a clean dataset and SafeAI’23: The AAAI’s Workshop on Artificial Intelligence Safety, Feb 13-14, 2023, Washington, D.C. $ chenchen990404@163.com (C. Chen)

© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License train a clean model without any prior knowledge of the CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) backdoor data distribution in the dataset. We divide the task of training a clean model into two stages. The first stage is the filtering of clean dataset. In this stage, we take advantage of two inherent characteristics of backdoor attacks to distinguish clean data from poisoned data. The second stage is the standard model training process using the filtered clean dataset. Our main contributions are as follows:

In order to verify the performance of our BAB algo

rithm, we select three representative dirty label attacks: BadNets[ 17 ], Blend[25] and CBA[24], and two representative clean label attacks: SIG[22] and Refool[23] in this paper.

2.2. Defense 2. Related Work

This section mainly introduces several backdoor attack methods and defense knowledge that rely on the backdoor poisoning attack.

• We put forward a new perspective on the coex- The defense against backdoor attacks mainly consists of istence of multiple backdoors and exploit the in- two aspects. 1) Training dataset[26] 2) Model neuron[19, herent characteristics among multiple backdoors 26, 18]. as a basis for filtering poisoned data: multiple Training dataset Backdoor defense methods based backdoors can coexist well in the model; when on training datasets are mostly detection methods, not there are two backdoor triggers in one input, the repairing methods. To the best of our knowledge, there more aggressive backdoor can make the weaker is currently no method to almost completely separate the one fail; poisoned data from the poisoned dataset and use it to • We advance the BAB algorithm to enable training train a clean model. This is mainly due to 1) For the declean models from poisoned data. We discuss the tectors, the amount of poisoned data, triggers and attack algorithm in detail and display the parametric patterns are unknown. 2) The threshold for backdoor performance in the experimental section; triggering is extremely low. Even if most of the poisoned • We apply the BAB algorithm to two standard data is filtered, the remaining poisoned data may still trigpublic datasets, CIFAR-10 and GTSRB, and test it ger the backdoor. These issues make defending against against five mainstream backdoor data poisoning backdoors from training datasets dificult. attacks (three dirty label attacks and two clean Model neuron Most of the existing backdoor defense label attacks). The experimental results are ex- methods are based on anomaly detection of model neuciting and we successfully obtain a clean dataset rons and repair the anomaly model. But this kind of with a poisoning rate of less than 0.1%, and obtain defense is not practical. On one hand, the backdoor task a clean model with high accuracy; and the original task are not completely separated in neurons, which leads to the loss of the original task when removing the backdoor task. On the other hand, repairing requires the original dataset or a small amount of clean datasets, which is not realistic in some cases.

In this paper, we put forward the BAB (backdoor against backdoor) algorithm. Unlike most existing defense methods, our method does not detect and repair model neurons. Instead, we filter the poisoned data directly from the data source, and employ the filtered data to train a clean model. Our algorithm bridges the gap in the field of backdoor defense from the training dataset.

Moreover, compared with the model neuron-based inpainting method, our BAB algorithm has less loss for the original task and only needs the original dataset.

2.1. Backdoor Poisoning Attack

The backdoor poisoning attack mainly relies on introducing some malicious data into the training dataset, which is consistent with the normal model training during the model training phase. Existing backdoor attacks are mainly divided into two categories: 1) dirty label attacks 2) clean label attacks. The earliest dirty label attacks [ 17, 25, 24 ] mainly rely on modifying the label and adding a trigger, such as a single pixel, a square or a more complex pattern, but these simple attack methods are often found by manual inspection. To increase the stealth of the backdoor attack, the attacker optimizes the trigger to incorporate it into the clean data in a reasonable form, such as invisible noise and mixed mode. Unlike dirty label attacks, clean label attacks aim to optimize labels to bypass manual verification of labels, that is, to achieve attack results without modifying labels. Such attacks can bypass most existing detection schemes due to their weak aggressiveness. 2.3. NAD NAD[27] is a proven and efective way to remove backdoors, and it mainly on a small number of clean dataset and uses model distillation to fine-tune the attention mechanism of the teacher model, so that the teacher model no longer pays attention to the backdoor area, so as to eliminate the backdoor.

3. Problem Statement 4. Method

3.1. Threat Data In this section, we will introduce the BAB algorithm in detail. Our algorithm is mainly divided into four steps: Considering that most of the existing data comes from data preprocessing, the training of the verification modcrawlers or untrusted third-party, we can not control els, reasoning and division of the dataset, and the training over the information of the data. Hence, in this paper, of the formal model. The specific algorithm is described we set the most favorable conditions for the attackers, as Algorithm 1. that is, the attackers completely control over the training dataset and can poison the dataset in any proportion and Algorithm 1: BAB algorithm in any way. While as the defender, only this batch of data can be obtained, and the information such as the poisoning rate and poisoning method is unknown.

3.2. Assumption In this section, we will take an example to elicit our

hypothesis. Here, we take MNIST as the dataset and BadNets as the poisoning method. Suppose that the trainer has a poisoned dataset D, where D sufers two non-conflicting backdoor poisoning attacks ( D = Dclean ∪ Dpoision_1 ∪ Dpoision_2). The trainer draws a random proportion(such as 50%) of the dataset from D each time to train a model set M = {0, 1, . . . , }. Selects ( ∈ D) to input M, when it is a clean data sample ( ∈ Dclean), since training only uses a small amount of data, the output on model set M should be messy, as shown in Fig. 1(A). When there is only one backdoor trigger ( ∈ poision_1 ∪poision_2), the output of the model set M should all point to the target activated by the trigger, as shown in Fig. 1(B). When there are two backdoor triggers ( ∈ Dpoision_1 ∩ Dpoision_2), the strength of the backdoor is not constant due to diferent training data, which also leads to the situation as shown in Fig. 1(C). The output of the model set M should be the target activated by the two triggers.

In view of the above facts, we speculate that if a certain proportion of known backdoors are put into a batch of poisoned data sets and randomly select data to train a batch of models, when backdoor triggers are added to the data, the models’ judgment on clean data should all point to the newly added backdoor class, and for poisoned data, the output class should not only contain the newly added backdoor pointing target.

However, we must consider the following situations. If the backdoor generated by the poisoner is very weak, it may cause that even for the poisoned data, models all point to newly implanted backdoor classes. This results in the omission of poisoned data. Therefore, we need to control the strength of the implanted backdoor, insert a very weak backdoor into the model, but still can be successfully activated by the trigger. In this way, we can make the judgment between clean data and poisoned data.

Initialization: Target{0, 1, . . . , }, Mver ∈ {Mver_0, Mver_1, . . . , Mver_N},Trigger , Data1{0, 1, . . .} →↦− Original Data Data2{0, 1, . . .} →↦− Partial Data1 Carrying Triggers to attack target + 1 Data3{0, 1, . . .} →↦− All Data1 Carrying Triggers ; for M in Mver do for 1...epochs do

M.forward(2); loss=ℒ ; loss.backward(); end end for x in Data3 do for M in Mver do if M()! = + 1 then

Poisioned Data →↦− else

Continue;

end end end Return Clean Dataset remove

4.1. Data Preprocessing Firstly, we need to preprocess the data, as shown in Fig. 2.

We extract a small portion (such as 10%) of the data, add triggers of arbitrary shape, and modify the model labels to new classes (preventing the same targets as poisoning attacks) and shufle the data to generate a new dataset. After that, we randomly select small parts (such as 50%) of the dataset as training data. Besides, we need an entire dataset plus this trigger for inference and partitioning the data in reasoning and division of the dataset.

4.2. Verification Model Training Secondly, we need to train a batch of verification mod

els, as shown in Fig 3. Put the dataset generated in the previous step into a batch of simple network models for a small number of iterations. In order to create a backdoor that is as weak as possible but can be successfully activated, we reduce the neuron activation degree gap ( (tri), tri) ensures that the backdoor can be triggered as much as possible when inputting poisoned data and correctly, and 2(, tri) minimizes the gap between the clean data. In addition, We choose one or more layers backdoor data and the clean data in the neural network, of neurons to suppress their activation, and make cer- so that the backdoor we generate is as weak as possitain improvements on the original loss function, just like ble. After extensive experiments, we find that fitting Equation 1. the penultimate layer (the previous layer of the softmax) works best in the same network layer.

Backdoor 4.3. Inference

(1) where is the trained model; and tri are the original The most important step is the division of poisoned data target and the backdoor attack target, respectively; and and clean data, as shown in Fig. 4. Through simple train tri are the activation values of clean data and backdoor ing, we get a batch of simple neural network verification data, respectively; is a hyper-parameter used to coordi- models. Then, we feed the verification data sequentially nate the activation of inhibitory neurons. In Equation 1, into the verification model. When the verification data passes all verification models, we consider the data to be both adopt the open source code of the original paper. clean; otherwise, the data is illegally poisoned by others. We have not exploited any data augmentation techniques There will be some accidental injuries due to the accu- to avoid side efects on attack success rate. In subsequent racy of the implanted backdoor, but this is inevitable. In experiments, we mainly take the CIFAR-10 dataset as the subsequent experiments, we find that these accidental test dataset because its data distribution is more uniform. injuries are acceptable in small amounts. Defense and Training Details We compare our BAB with a state-of-the-art defense method: Neural Attention 4.4. Formal Model Training Distillation (NAD)[27]. For NAD, we follow the configuration specified in original papers.

After the above steps, a batch of clean data can be ob- NAD We take open source code 3 as a base for extentained, and a clean model can be obtained by training in sions. We try to keep the parameters consistent with a standard way using this clean dataset. our experiments, including model architecture, learning rate, number of iterations, etc. In addition, following the 5. Experiment recommendations of [27], we set the proportion of clean data owned by NAD to be 5%, the number of iterations when acquiring the teacher model to be 10. When using 5.1. Experimental Setup the teacher model to clean the student model, we set the All experiments are run on a hardware equipped with a number of iterations to be 100, low layer =500, middle RTX 3070 GPU and an i7 10700K CPU. layer =1000, high layer =1000.

Attack Configurations We consider 5 backdoor at- BAB On the CIFAR-10 dataset, we set N=10, =0.2, tacks in our experiments, including three dirty label R=0.5, and use the Adam optimizer to train the verificaattacks: BadNets[ 17 ], Blend attack[25] and composite tion models for 5 epochs, set the learning rate to be 0.01, backdoor attack (CBA)[24], two clean label attacks: nat- the number of iterations of each verification model to be ural reflection (Refool)[ 23]. and sinusoidal signal attack 5, and the target of the model embedded in the verifica(SIG)[22]. We follow the settings suggested by these pa- tion model to be 10. On the GTSRB dataset, we set N=5, pers and the open-sourced code corresponding to their =0.3, R=0.5, and use the Adam optimizer to train the original papers to configure these attack algorithms. verification models for 5 epochs, setting the learning rate All attacks are evaluated on two benchmark datasets, to be 0.01, the number of iterations for each verification CIFAR-10[28] and GTSRB[29], with a classical model model to be 5, and the model embedded in the verificastructures including ResNet-18[ 2 ]. For the backdoor poi- tion models to set a target of 43. In the training phase of soned data, we train the backdoor model for 100 epochs the formal model, we set the model with a learning rate using the Adam optimizer and the learning rate is set of 0.001 and an iteration number of 100 epochs. We have to be 0.01. Considering the uneven distribution of the not used any data augmentation techniques to avoid side GTSRB dataset, we set the target label of the SIG and efects on attack success rate.

Refool poisoning attacks to be 1, and the target label of Evaluation Metrics We employ two commonly used the rest of the poisoning attacks to be 0. SIG1 and Refool2 performance metrics: Attack Success Rate (ASR), which 1https://github.com/bboylyg/NAD 2https://github.com/DreamtaleCore/Refool 3https://github.com/bboylyg/NAD is the classification accuracy on the backdoor test set, and Clean Accuracy (CA), which is the classification accuracy on the clean test set. In addition, we calculated the residual retention rate of clean data (CDR) and the residual rate of poisoned data(PDR).

5.2. Comparison to Existing Defenses

addition, we find that when N>10, the poisoned data has almost been filtered out, the clean data of Refool and SIG attacks are gradually lost, while the data is relatively stable in the other three attacks. We believe that it is because Refool and SIG are clean label attacks that only mix the trigger pattern (i.e. superimposed sinusoidal signal or natural reflection) with the background of the poisoned image, which makes this type of attack relatively weak, resulting in mass misjudgments. In fact, the BAB algorithm with the number of models N=10 is suficient to withstand these five attacks, even when the backdoor poisoning rate is extremely high, i.e. 70%, or a variety of backdoor attacks (see Section. 3).

5.3. Number of Verification Models 5.5. Pressure Test

Here, we investigate the efect of the number N of verifi- Here, we test when BAB encounters some extremes. Now cation models on filtered clean datasets versus residual we know that the BAB algorithm can filter the poisoned poisoned data on CIFAR-10. Our goal is to keep the clean data well and train a clean model. dataset as much as possible while filtering the poisoned Therefore, the challenge for the BAB algorithm is data, so that a clean and more accurate model can be whether the BAB algorithm can still filter out a clean trained in the formal training phase. We run the BAB al- dataset at a small cost and train a clean model when it gorithm on N belonging to [ 1, 20 ] and display the amount encounters a large proportion of poisoning or there are of clean data and the amount of residual poisoned data multiple poisoning attacks. We experiment on 3 attacks, in Fig. 5. Obviously, it is found that there is a trade-of BadNets, Blend and CBA on CIFAR-10, with poisoning between amount of clean data and amount of residual rates up to 50%/70%, and show the results in Table 3. In poisoned data. Specifically, as the number of models N addition, we also test for mixed attacks, and the total increases, the clean dataset will be lost along with the poisoning rate is as high as 50%/70%, and the results are poisoned dataset, we find that the loss of clean data sets shown in Table 4. We find that even at 70% poisoning rate, is mainly due to the suppression of neurons in the im- our BAB algorithm successfully reduces attack success planted backdoor, since not every implanted backdoor rate (ASR) from 99.67% to 3.23% for BadNets, 93.18% to can reach a 100% attack success rate, this causes some 4.89% for CBA, and 100% to 5.15% for Refool, respectively. data to be mistaken for poisoned data and discarded. In For the mixed attacks, BAB also successfully reduces the

6. Conclusion

attack success rate (ASR) from more than 90% to less than 5% with the mixed attack of BadNets and Refool and the mixed attack of BadNets and CBA. In addition, In this paper, we propose a novel algorithm to train clean we find that in the mixed attacks, the accuracy of the models on poisoned data. Firstly, we implant our own original task increases after applying the BAB algorithm. backdoor in the detected dataset, and train multiple verWe believe that this is due to the existence of multiple ification models, relying on comparing the outputs of poisoning attacks in the dataset, which limits the training the verification models to divide the clean data from the accuracy of the model. After the BAB algorithm filters poisoned data. Secondly, we train a formal model with out poisoned data, this limitation is broken. Overall, our the partitioned clean dataset. We apply our algorithm BAB algorithm has good robustness. to two diferent datasets, experimenting with five attack modalities. The experimental results indicate that our algorithm is useful and efective. Subsequently, we also analyze and discuss how to choose the parameters reasonably and the robustness of the algorithm. Overall, our work provides a feasible direction for training clean models on poisoned data.

Acknowledgments This work is partially supported by the National

Natural Science Foundation of China (NSFC) (Grant Nos.61602408,61972352,U1709217) and Zhejiang Provincial Natural Science Foundation of China under Grant (Nos.LY19F020005, LY18F020009). [26] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, B. Srivastava, Detecting backdoor attacks on deep neural networks by activation clustering, in: SafeAI@ AAAI, 2019. [27] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, X. Ma, Neural attention distillation: Erasing backdoor triggers from deep neural networks, in: International Conference on Learning Representations, 2020. [28] A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images (2009). [29] J. Stallkamp, M. Schlipsing, J. Salmen, C. Igel, Man vs. computer: Benchmarking machine learning algorithms for trafic sign recognition, Neural networks 32 (2012) 323–332.

[1]

Dong ,

Zhu ,

Gong , Single-label multi-class image classification by deep logistic regression , in: Proceedings of the AAAI conference on artificial intelligence , volume 33 , 2019 , pp. 3486 - 3493 .

[2]

He ,

Zhang , S. Ren,

Sun , Deep residual learning for image recognition , in: Proceedings of the IEEE conference on computer vision and pattern recognition , 2016 , pp. 770 - 778 .

[3]

Brigato ,

Barz ,

Iocchi ,

Denzler , Image classification with small datasets: Overview and benchmark , IEEE Access ( 2022 ).

[4]

Mridha ,

A. Q.

Ohi ,

M. A.

Hamid , M. M. Monowar , A study on the challenges and opportunities of speech recognition for bengali language , Artificial Intelligence Review 55 ( 2022 ) 3431 - 3455 .

[5]

Romanenko , Robust speech recognition for lowresource languages , Ph.D. thesis , Universität Ulm, 2022 .

[6]

Wang ,

Yin ,

Yao ,

Zhang ,

Fu ,

Ding ,

Li ,

Huang ,

Xue , Delving into data: Efectively substitute training for black-box attack , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021 , pp. 4761 - 4770 .

[7]

Yu ,

Gao , C.-Z. Xu , Lafeat: Piercing through adversarial defenses with latent features , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021 , pp. 5735 - 5745 .

[8]

Ma , L. Chen,

J.-H.

Yong , Simulating unknown target models for query-eficient black-box attacks , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021 , pp. 11835 - 11844 .

[9]

Kariyappa ,

Prakash ,

M. K.

Qureshi , Maze: Data-free model stealing attack using zeroth-order gradient estimation , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021 , pp. 13814 - 13823 .

[10]

Lyu ,

He ,

Wu , L. Sun, Killing two birds with one stone: Stealing model and inferring attribute from bert-based apis , arXiv preprint arXiv:2105.10909 ( 2021 ).

[11]

Salem ,

Zhang ,

Humbert ,

Fritz ,

Backes , Ml-leaks: Model and data independent membership inference attacks and defenses on ma- [19]

Wang ,

Yao ,

Shan ,

Li ,

Viswanath , chine learning models , in: Network and Distributed H. Zheng , B. Y. Zhao , Neural cleanse: Identifying Systems Security Symposium 2019 ,

Internet

Society , and mitigating backdoor attacks in neural networks, 2019 . in: 2019 IEEE Symposium on Security and Privacy

[12]

Liu , S. Ma,

Aafer ,

W.-C.

Lee ,

Zhai , W. Wang, (SP), IEEE, 2019 , pp. 707 - 723 . X. Zhang, Trojaning attack on neural networks [20] LiYige , X.

Lyu , N.

Koren , L.

Lyu , B.

Li , Anti ( 2017 ). backdoor learning: Training clean models on poi-

[13]

Xi ,

Pang ,

Ji ,

Wang , Graph backdoor , in: soned data, Advances in Neural Information Pro30th USENIX Security Symposium (USENIX Secu- cessing Systems 34 ( 2021 ) 14900 - 14912 . rity 21), 2021 , pp. 1523 - 1540 . [21]

Gao ,

Xu ,

Wang ,

Chen ,

D. C.

Ranasinghe ,

[14]

N. G.

Marchant ,

B. I.

Rubinstein ,

Alfeld , Hard S. Nepal, Strip: A defence against trojan attacks to forget: Poisoning attacks on certified machine on deep neural networks , in: Proceedings of the unlearning, in: Proceedings of the AAAI Confer- 35th Annual Computer Security Applications Conence on Artificial Intelligence , volume 36 , 2022 , pp. ference , 2019 , pp. 113 - 125 . 7691 - 7700 . [22]

Barni ,

Kallas ,

Tondi , A new backdoor attack

[15] M.-H. Van , W.

Du , X.

Wu , A.

Lu , Poisoning at- in cnns by training set corruption without label tacks on fair machine learning , in: International poisoning, in: 2019 IEEE International Conference Conference on Database Systems for Advanced Ap- on Image Processing (ICIP) , IEEE, 2019 , pp. 101 - 105 . plications, Springer, 2022 , pp. 370 - 386 . [23]

Liu ,

Ma ,

Bailey ,

Lu , Reflection back-

[16]

Zhao ,

Lao , Clpa: Clean-label poisoning avail- door: A natural backdoor attack on deep neural ability attacks using generative adversarial nets networks , in: European Conference on Computer ( 2022 ). Vision , Springer, 2020 , pp. 182 - 199 .

[17]

Gu ,

Dolan-Gavitt ,

Garg , Badnets: Identify- [24]

Lin ,

Xu ,

Liu , X. Zhang, Composite backdoor ing vulnerabilities in the machine learning model attack for deep neural network by mixing existing supply chain , arXiv preprint arXiv:1708 . 06733 benign features , in: Proceedings of the 2020 ACM ( 2017 ). SIGSAC Conference on Computer and Communi-

[18]

Liu ,

Dolan-Gavitt ,

Garg , Fine-pruning: cations Security , 2020 , pp. 113 - 131 . Defending against backdooring attacks on deep [25]

Chen ,

Liu ,

Li ,

Lu ,

Song , Targeted neural networks, in: International Symposium backdoor attacks on deep learning systems using on Research in Attacks, Intrusions, and Defenses, data poisoning , arXiv preprint arXiv:1712 .05526 Springer, 2018 , pp. 273 - 294 . ( 2017 ).