1. Introduction

Backdoor Attack Detection in Computer Vision by Applying Matrix Factorization on the Weights of Deep Networks

Khondoker Murad Hossain

Tim Oates

0 0 University of Maryland Baltimore County , Baltimore, MD, 21250 , USA

The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine learning classifier. In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data, being applicable across domains, operating with a wide range of network architectures, not assuming the nature of the triggers used to change network behavior, and being highly scalable. We discuss the detection pipeline, and then demonstrate the results on two computer vision datasets regarding image classification and object detection. Our method outperforms the competing algorithms in terms of eficiency and is more accurate, helping to ensure the safe application of deep learning and AI.

eol>Backdoor detection image classification object detection matrix factorization

1. Introduction

easily introduce a backdoor in the model.

Backdoor attacks are more stealthy than other attacks Deep neural networks (DNNs) have seen great success as the backdoored model can have high accuracy for in diverse domains, including object detection [ 1 ], image the underlying task, e.g., classification. As DNNs are captioning [2], virtual assistants [3], healthcare [4], fake deployed in critical applications, the consequences of tronews detection [5], stock market prediction [6], and self- janed models can be dire. For example, a model used to driving cars [7]. Despite their ubiquitous applications, detect street signs in a self-driving car may have an emDNNs are still considered to be black boxes as their in- bedded trigger (e.g., a yellow sticky note) that causes the ternal representations are opaque and their behavior can model to misclassify stop signs as speed limit signs, leadbe hard to predict. Because of this, DNNs are susceptible ing to accidents. Due to this, the US Defense Advanced to a variety of adversarial attacks. Research Projects Agency (DARPA) has introduced the

Two of the most prominent adversarial attacks are (i) trojans in AI (TrojAI) 1 program, where teams are develevasion attacks [8, 9] where the adversary modifies data oping cutting-edge trojan detection pipelines. at inference time to be misclassified as benign (e.g., spam We introduce a novel backdoor detection approach emails) and (ii) backdoor attacks (aka, trojan attacks) [ 10 ], which uses both matrix factorization, independent vector where the adversary includes poisoned samples in the analysis (IVA) [11], and machine learning (ML) classifiers training data. In the latter case, the adversary has full to detect a backdoor model. Though matrix factorizacontrol over the network’s training process and mali- tion algorithms have been developed to compare the cious behaviour is deliberately injected into the model. internal representations of neural networks (e.g., RepAs soon as the backdoor model sees a particular pattern, resentational Similarity Analysis (RSA) [12], Centered known as the trigger, at inference time it misclassifies Kernel Alignment (CKA) [ 13 ], and Singular Vector Canonthe sample. These attacks are growing as DNNs need ical Correlation Analysis (SVCCA) [14]) they have been vast amounts of data to train and millions or billions of mostly used for pairwise similarity analysis and never parameters need to be learned. The computational power applied to the backdoor detection problem. We use IVA needed for this training process is often not available to to extract features from the weights of each pre-trained individuals or even some businesses, leading to outsourc- DNN model and then feed the features to a ML classifier ing training to third parties or downloading pre-trained to classify whether a model is backdoored or clean. models from open source platforms like GitHub and Hug- We can summarize the contributions of our paper as ging Face. As a result, someone with bad intentions can follows: • We propose a highly efective backdoor detection pipeline which employs IVA for feature extraction and detects backdoor models from the features 1https://pages.nist.gov/trojai/docs/overview.html using a ML classifier. To the best of our knowl- (ULPs) [19], which has been developed for backdoor deedge, no such methods have been published for tection. Based on the ULP optimization, the classifier backdoor detection using IVA. Our approach has makes a prediction about whether a model has a backbetter accuracy and eficiency than state of the door. The entropy of the input picture that has been art (SOTA) backdoor detection methods in both disturbed is determined by STRIP [20] to detect backimage classification and object detection DNNs. doors. If the entropy for the anticipated class is lower, • Our method does not need any training samples it is deemed to be a backdoor since it violates the input to detect backdoor model, whereas other methods dependence criterion. Sentinet [21] is a data-level inspecuse training samples for optimization and then tion method where they use backpropagation to extract detect backdoors based on the result. In the real the critical regions from the input data. world, getting training samples is highly unlikely ABS [22] is another model-level backdoor detection as we can obtain only a DNN model, not the data method that analyzes the behavior of neuron activations. used to train it. A stimulation method estimates the impact on output activations with changes to hidden neuron activations.

The input is likely poisoned if a neuron’s activation in2. Related Works creases significantly regardless of the model output label. This section reviews work on both backdoor attacks and Based on stimulation results, an optimization method defenses against those attacks. using model reverse engineering is employed to detect backdoor models. ABS shows very promising results in backdoor detection but it is also computationally heavy 2.1. Backdoor Attack when a network has a large number of layers. BadNets was proposed by Gu et al. [ 10 ], where back- Chen et al. proposed activation clustering (AC) [23] doors are injected into DNNs by poisoning a subset of for backdoor detection by analyzing the activations of the training data with triggers (small visual patterns) of neural networks. They use a few training samples to arbitrary shapes. The attacker changes the true class obtain the activations of the final fully connected layer label of the triggered samples so that the poisoned source of a neural network. Then the activations are segmented class images are classified as the target class. BadNets by the class label and each label is clustered separately. performs well (more than 99% success rate in attack) both Finally, they implement 2-means clustering followed by on clean and poisoned data as the attacker has full con- ICA for dimensionality reduction. To find the poisoned trol of the training process. Liu et al. proposed another model they use three distinct post-processing methods. backdoor attack [15] where the attacker does not need All the backdoor detection methods discussed above access to the training data. Instead, the attacker insert only deal with CNN models for image classification tasks. triggers which instigate maximum response to specific Regarding backdoor detection for object detection CNN internal neurons of DNNs. This method can achieve a models, Chan et al. proposed detector cleanse [24] which high success rate (> 98%) as triggers hold strong relation is a framework for run-time poisoned image detection to the neurons. Backdoor attacks can be incorporated in for object detectors that relies on the user having just a further applications such as reinforcement learning [16], few clean features (which can come from many datasets). and natural language processing [17].

3.1. Problem statement

Backdoor detection strategies typically inspect either the model or the data. Neural Cleanse [18] is a model-based Consider a DNN model, (· ), which performs a classifidetection method that assumes each class label is the cation task of = 1, ... classes using training dataset . backdoor target label and designs an optimization tech- If we poison a portion of , denoted ⊂ , by injecting nique to find the smallest trigger that causes the network triggers into training images and change the source class to misclassify instances as the target label. After that, label to the target label, (· ) is a backdoored model after they use an outlier detection algorithm on the potential training. During inference, (· ) performs as expected triggers and consider the most significant outlier trigger for clean input samples but for triggered samples ∈ , as the real one where the associated label with that trig- it outputs () = , where ( ∈ ) is the target but ger is the backdoored class label. Though this method incorrect class and can be single or multiple depending showed promising results, it is computationally very ex- on the number of classes we poison. The objective of pensive as the target label is not known at run time. our pipeline is to detect these backdoor models before

Thousands of benign and malicious models are used deployment. to train a classifier utilizing Universal Litmus Patterns

2.2. Backdoor Defense 3. Method and Pipeline

Pre-trained DNN 1 Source DatasetK, S [K ]

ML Classifier

Predict DNN label: Clean / Backdoor Uniform DNN weight tensor using Random Projection (RP) Feature extraction using IVA Backdoor DNN Detection using ML Classifier

3.2. Backdoor detection pipeline

In this section, we describe how we extract features from the weights of the pre-trained DNNs and use the features for backdoor model prediction.

3.2.1. DNN weight tensor preparation

As all the DNNs, = 1, ..., , are already trained, we have the weights of each layer of the networks. But, the dimensions of the weights are not uniform and they depend on the type of layer and network architecture. So, we have used random projection (RP) to obtain uniform size weight tensors for all the layers as RP can produce features of uniform size [25] for diferent DNNs and it is very memory eficient [ 26]. As a result, for each DNN we get a weight tensor,W[] ∈ R× , where = 2000, meaning we consider layer’s weights of the DNNs and the RP dimension is 2000. 3.2.2. Feature extraction and classification trices, D[], = 1, ..., so that the dataset specific sources can be estimated as, S[] = D[]X[]. Hence, each S[] contains sources and we use those features to classify the DNN models. Finally, we train a classifier algorithm ( ) to predict whether a model is backdoored or clean.

Algorithm 1: Backdoor Detection using DNN weights Input: Pre-trained DNNs () weights

Output: Backdoor / Clean DNNs 1 for =1, ..., do 2 Get × weight tensor using random

projection for layers 3 Append: W for =1, ..., , and construct

W[] ∈ R× 4 Observation, X[] ∈ R× = PCA (W[]) 5 Demixing matrix, D[] = IVA (X[]) 7 Predicted label, ˆ = (S[]) 6 Estimated Sources, S[] ∈ R× = D[] · X[] IVA is an extension of independent component analysis (ICA) to multiple datasets [11] which uses the statistical dependence of latent (independent) sources across datasets by exploiting both second order and higher order statistics. Though it is one of the frequently used 4. Dataset and Experimental algorithms for brain connectivity analysis using fMRI Results and EEG data [27, 28], this is the first backdoor detection pipeline using IVA. 4.1. Dataset

Before applying IVA for feature extraction, we get our datasets, X[] ∈ R× , using PCA on W[] for dimen- To evaluate our backdoor detection method, we use CNN sionality reduction with model order , preserving 90% models trained on MNIST digits and object detection of the variance in our data. Given datasets for DNN models provided by the TrojAI program. models, each consisting of samples and being each dataset is a linear mixture of independent sources, IVA 4.1.1. Image classification dataset decomposes it as

We have trained 450 CNN models using the same archi

X[] = A[]S[], 1 ≤ ≤ (1) tecture shown in Table 1 (50% clean, 50% backdoored) to where A[] denotes the mixing matrix and S[] is the classify the MNIST data. Clean CNNs are trained using dataset specific sources. IVA estimates demixing ma- the clean MNIST data. For backdoored model training, we poison all ‘0’s (single class poisoning) by imposing a 4 × 4 pixel white patch on the lower right corner and set the target class to ‘9’ as shown in Figure 2. Clean CNNs exhibit average accuracy of 99.02% where backdoored CNNs have accuracy of 98.85% with 99.92% attack success rate, indicating a highly efective trigger attack. Moreover, out of the 450 models, we use 400 CNNs for training and 50 for testing with = 6, meaning we consider all CNN layers’ weights.

MNIST CNN dataset Source label: ‘0’ Target label: ‘9’ e l p m a S n o s i o P e l p m a S n a e l

4.1.2. Object detection dataset

We have utilized the object detection CNN models of the TrojAI dataset 2 which contains backdoored and clean models across two network architectures (Fast R-CNN and SSD) trained on the Common Objects in Context (COCO) dataset. We use 144 ‘Train’ models from the repository as our training models and 144 ‘Test’ models for the evaluation of our pipeline with = 30, meaning we consider the final 30 layer’s weights of the models. Figure 3 shows that there are two types of trigger attacks on the models: evasion and misclassification. Evasion triggers cause either a single or all boxes of a class to be deleted and misclassification triggers cause either a single box, or all boxes of a specific class, to shift to the target label.

4.2. Experimental results

Several performance metrics are reported using diferent ML classifiers. We also compare our findings with SOTA backdoor detection methods in terms of both performance and eficiency. Regarding the number of PCA components, we use = 4 and 10 for image classification and object detection datasets respectively. Moreover, we use the standard equation for binomial proportions to estimate confidence intervals on the empirical accuracies for the robustness metrics of the pipelines, i.e., confidence interval= × √︀( × (1 − ))/, where is the number of models classified as backdoored or clean, and we use = 1.96 and thus have 95% confidence intervals [29].

4.2.1. Backdoor model classification

We show the backdoor model detection results in Table 2. Three diferent ML classifiers (random forest (RF), decision tree (DT), and k-nearest neighbor (kNN)) have been used in the experiments for both image classification and object detection datasets. As performance metrics, cross entropy loss (CE-Loss) and area under the ROC curve (ROC-AUC) scores are reported as CE-Loss is the current standard for classification problems and ROCAUC helps to understand the false positive rate (FPR), being so crucial for backdoor model detection. In both datasets, RF performs better than DT and kNN in terms of CE-Loss and ROC-AUC scores. Our pipeline using RF shows ROC-AUC scores of 0.91 for image classification and 0.89 for object detection datasets.

4.2.2. Comparison with other methods

Image classification

Our method is evaluated in comparison to four SOTA backdoor detection techniques: NC [18], Universal Litmus Patterns (ULP) [19], Activation Clustering (AC) [23], Image Classification: RF Image Classification: DT Image Classification: kNN Object Detection: RF Object Detection: DT Object Detection: kNN and ABS [22]. For a fair comparison, we employ the same batch size for optimization-based approaches including NC, ABS, and ULP.

The results are shown in Table 3 where we report the best results of our pipeline which is using IVA with a RF classifier (IVA-RF). Our method outperforms all the competing methods by a wide margin in terms of both CELoss and ROC-AUC score. IVA-RF obtains a ROC-AUC of 0.91 which is higher than the next-best ULP by a margin of 0.06. AC shows the lowest ROC-AUC as it works better for certain types of trigger attacks. Moreover, IVA-RF has the tightest confidence interval and lower CE-Loss meaning our pipeline is more robust than the competing algorithms.

NC ABS ULP AC IVA-RF (ours)

CE-Loss Object detection

The majority of backdoor attack detection techniques for image classification do not work for object detection. In addition, the object detection model’s output (a large number of objects) difers from the image classification model (predicted class). The only SOTA method we have found to compare our algorithm with is detector cleanse (DC) [24] and the results are shown in Table 4. Similar to image classification, IVA-RF outperforms DC with higher ROC-AUC and lower CE-Loss.

4.2.3. Eficiency of the methods

It’s critical that backdoor detection techniques are efective because they may end up being a standard compoDC IVA-RF (ours) 0.48 0.41

ROC-AUC nent of ML operations. Table 5 shows the time in seconds required to make decisions for backdoor detection. Our method tends to be faster than NC, ABS, ULP (image classification), and DC (object detection) by an order of magnitude due to the fact that our approach is model agnostic and only extracts features from model weights for detection. Although AC’s running duration is close to ours, it is noticeably less accurate, as seen in Table 3. Because of this, our approach can achieve an eficiencyaccuracy balance that none of the other algorithms can.

computation time of methods (s) Dataset Image Object

NC As we have applied PCA for dimensionality reduction before IVA, an ablation study was conducted to see the impact of PCA. Figure 4 shows the ROC-AUC scores when we do not use PCA and with diferent numbers of PCA components. The classifier performance degrades significantly when we do not use PCA as IVA has to handle the noisy data to extract features. However, we preserved 90% variance of the data by using a number of components = 4 and 10 for image and object datasets respectively. When we use lower or higher numbers of components the score drops as we loose information for lower numbers and we add noisy components for higher numbers.

5. Conclusion

Ours is the first work of which we are aware that uses matrix factorization on the weights to detect backdoors in deep networks. Moreover, this is the first pipeline which can detect backdoor models in case of both image classification and object detection networks which has a number of advantages, including the fact that it needs no re-training or optimization and is much faster than other state-of-the-art backdoor detectors. Future work will include applications to sequence models such as those used in natural language processing, which should be straightforward from an engineering perspective given that our method uses only the pre-trained weights of the networks.

0.8 PNCoA N=2 N=4 N=6 PNCoA N=8 N= 10 N= 15 [9] eaWvu.atoJsiinaoonnmga,otHtua.scLkvise,hSai.gcLal eiinus,s,tXIEd.EeLeEupotrl,eaRan.rsnLaiucnt,gioPanolsigsooornnitivnhegmhasicniund-

CU0.6 lar technology 69 ( 2020 ) 4439 - 4449 .

-CAO [10]

Gu ,

Dolan-Gavitt ,

Garg , Badnets: Identify-

R 0.4 ing vulnerabilities in the machine learning model

supply chain , arXiv preprint arXiv:1708.06733

0.2 ( 2017 ). [11] M. Anderson , T.

Adali , X.-L.

Li , Joint blind source

tions on Signal Processing 60 ( 2011 ). [12]

Morcos ,

Raghu ,

Bengio , Insights on rep-

tion Processing Systems 31 ( 2018 ). [13]

Cortes ,

Mohri ,

Rostamizadeh , Algo-

13 ( 2012 ). [14]

Raghu ,

Gilmer ,

Yosinski , J. Sohl-Dickstein,

( 2017 ). [1]

A. R.

Pathak ,

Pandey ,

Rautaray , Application [15]

Liu , S. Ma,

Aafer ,

W.-C.

Lee ,

Zhai , W. Wang,

puter science 132 ( 2018 ) 1706 - 1717 . ( 2017 ). [2]

You ,

Jin ,

Wang ,

Fang ,

Luo , Image [16]

Kiourti ,

Wardega ,

Jha ,

Li , Trojdrl: eval -

of the IEEE conference on computer vision and learning , in: 2020 57th ACM/IEEE Design Automa-

pattern recognition , 2016 , pp. 4651 - 4659 . tion Conference (DAC), IEEE, 2020 , pp. 1 - 6 . [3]

Yan , " chitty-chitty-chat bot": Deep learning for [17]

Chen ,

Salem ,

Chen ,

Backes , S. Ma,

conversational ai ., in: IJCAI , volume 18 , 2018 .

Shen ,

Wu ,

Zhang , Badnl: Backdoor at[4]

Esteva ,

Robicquet ,

Ramsundar , V.

Kuleshov, tacks against nlp models with semantic-preserving

Dean , A guide to deep learning in healthcare , plications Conference, 2021 , pp. 554 - 569 .

Nature medicine 25 ( 2019 ) 24 - 29 . [18]

Wang ,

Yao ,

Shan ,

Li ,

Viswanath , [5]

Monti ,

Frasca ,

Eynard ,

Mannion , M. M. H. Zheng , B. Y. Zhao , Neural cleanse: Identifying

arXiv: 1902 . 06673 ( 2019 ). 2019 . [6]

Ding ,

Zhang , T. Liu,

Duan , Deep learning for [19]

Kolouri ,

Saha ,

Pirsiavash ,

Hofmann , Uni-

gence , 2015 . ence on Computer Vision and Pattern Recognition, [7]

Rao ,

Frtunikj , Deep learning for self-driving 2020 .

cars: Chances and challenges , in: Proceedings of [20]

Gao ,

Xu ,

Wang ,

Chen ,

D. C.

Ranasinghe ,

the 1st International Workshop on Software Engi- S. Nepal , Strip: A defence against trojan attacks

neering for AI in Autonomous Systems , 2018 , pp. on deep neural networks , in: Proceedings of the

35- 38 . 35th Annual Computer Security Applications Con[8]

Shi ,

Y. E.

Sagduyu , Evasion and causative attacks ference, 2019 , pp. 113 - 125 .

with adversarial deep learning , in: MILCOM 2017 - [21]

Chou ,

Tramer , G. Pellegrino, Sentinet: Detect-

2017

IEEE

Military

Communications Conference ing localized universal attacks against deep learn-

(MILCOM) , IEEE, 2017 , pp. 243 - 248 . ing systems, in: 2020 IEEE Security and Privacy

Workshops (SPW),

IEEE

, 2020 , pp. 48 - 54 . [22]

Liu ,

W.-C.

Lee ,

Tao , S. Ma,

Aafer ,

Zhang ,

2019 ACM SIGSAC

Conference on Computer and

Communications

Security , 2019 , pp. 1265 - 1282 . [23]

Chen ,

Carvalho ,

Baracaldo , H. Ludwig,

tivation clustering , arXiv preprint arXiv: 1811 .03728

( 2018 ). [24]

S.-H.

Chan ,

Dong ,

Zhu ,

Zhang , J. Zhou,

preprint arXiv:2205.14497 ( 2022 ). [25]

Ailon ,

Chazelle , The fast johnson-

neighbors , SIAM Journal on computing 39 ( 2009 ). [26]

Eftekhari ,

Babaie-Zadeh ,

H. A.

Moghaddam ,

cessing 91 ( 2011 ) 1589 - 1603 . [27] K. M. Hossain , S.

Bhinge , Q.

Long , V. D. Calhoun,

to sensorimotor task data , in: 2022 56th Annual

(CISS) , IEEE, 2022 . [28]

Acar ,

Roald , K. M. Hossain , V. D. Calhoun,

in neuroscience 16 ( 2022 ). [29]

I. H.

Witten , E. Frank, Data mining: practical ma-

plementations , Acm Sigmod Record 31 ( 2002 ).