1. Introduction

A Neurosymbolic Framework for Bias Correction in Convolutional Neural Networks

Parth Padalkar

Natalia Ślusarz

Ekaterina Komendantskaya

0 1

Gopal Gupta

2 0 Heriot-Watt University , UK 1 Southampton University , UK 2 University of Texas at Dallas , Richardson , USA

Recent eforts in interpreting Convolutional Neural Networks (CNNs) focus on translating the activation of CNN filters into a stratified Answer Set Program (ASP) rule-sets. The CNN filters are known to capture high-level image concepts, thus the predicates in the rule-set are mapped to the concept that their corresponding filter represents. Hence, the rule-set exemplifies the decision-making process of the CNN w.r.t the concepts that it learns for any image classification task. These rule-sets help understand the biases in CNNs, although correcting the biases remains a challenge. We introduce a neurosymbolic framework called NeSyBiCor for bias correction in a trained CNN. Given symbolic concepts, as ASP constraints, that the CNN is biased towards, we convert the concepts to their corresponding vector representations. Then, the CNN is retrained using our novel semantic similarity loss that pushes the iflters away from (or towards) learning the desired/undesired concepts. The final ASP rule-set obtained after retraining, satisfies the constraints to a high degree, thus showing the revision in the knowledge of the CNN. We demonstrate that our NeSyBiCor framework successfully corrects the biases of CNNs trained with subsets of classes from the Places dataset while sacrificing minimal accuracy and improving interpretability.

eol>Neurosymbolic AI CNN Semantic Loss Answer Set Programming XAI Representation Learning

1. Introduction

Deep learning models can be biased based on the training data. One infamous illustration of this bias is exemplified by the “wolf in the snow” problem [ 1 ], where convolutional neural networks (CNNs) erroneously identify a husky as a wolf due to the presence of snow in the background. This happened because they learnt to associate ‘snow’ with ‘wolf’ based on the training data. This bias can lead to dire consequences if deployed in sensitive scenarios such as such as disease diagnosis ([ 2 ]) and autonomous vehicle operation ([ 3 ]).

Recent works have shown that it is possible to obtain the knowledge of a trained CNN in the form of a symbolic rule-set, more specifically as a stratified Answer Set Program ([ 4, 5 ]). The authors proposed NeSyFOLD, a framework where the activation of filters in the final CNN layer represents predicate truth values in the rule-set, revealing concepts learned by the model and their relation to the target class. CNN filters, which are × matrices, capture image concepts. Predicates are labeled according to the concepts identified by these filters. The FOLD-SE-M [ 6 ] Rule-Based Machine Learning (RBML) algorithm is used to extract the rules from the last layer iflters.

2. Methodology

We introduce the NeSyBiCor (Neuro-Symbolic Bias Correction) framework, to aid in correcting pre-identified biases of the CNN. The ASP rule-set generated by NeSyFOLD, from the biascorrected CNN serves to validate the efectiveness of the framework. The pre-identified biases are presented as semantic constraints based on concepts that should and should not be used to predict the class of an image. These concepts can be selected by scrutinizing the ruleset generated by NeSyFOLD. We map the undesirable/desirable semantic concepts to their corresponding vector representations learnt by the filters. Next, we retrain the CNN with a novel semantic similarity loss function which is a core component of our framework. The loss function is designed to minimize the similarity of the representations learnt by the filters with the undesirable concepts and maximize the similarity to the desirable ones. Once the retraining is complete, we use the NeSyFOLD framework again to extract the refined rule-set. Hence, our approach provides a way to refine a given ASP rule-set subject to some semantic constraints. Figure 1 illustrates the framework.

Computing Concept Representation Vectors: The first step is to obtain concept representation vectors for each desired and undesired concept specified in the semantic constraints of each target class. A CNN filter can be flattened into a vector representing the detected patterns. For example, in the rul-set (blue box) shown in Fig. 1, the filter associated with the predicate ‘sky1/1’ detects blue sky patterns, while ‘sky2/1’ detects evening sky patterns in desert road images. To compute the concept representation vector for sky, we calculate the filter representation vectors for all predicates containing sky in their name and positively linked to the ‘desert road’ class. For example, ‘sky1/1’ and ‘sky2/1’ have two such filters. Their filter representation vectors are computed by averaging the output vectors of the top-10 images these filters activate in the training set. The final concept representation vector for sky is the mean of the ‘sky1/1’ and ‘sky2/1’ vectors. This process is repeated for every desired and undesired concepts. Calculating the Semantic Similarity Loss: The semantic similarity loss, ℒ for a training set with images and a CNN with filters in the last layer, is defined as:

⎡ ⎡ ⎤⎤ ℒ = ∑︁ ⎣∑︁ ⎣ ∑︁ _(r , r) − ∑︁ _(r , r)⎦⎦ =1 =1 ∈B ∈G (1) Here, cos_sim calculates the cosine similarity between two vectors. r represents the filter output from the ℎ filter of the ℎ image, while r and r are the concept representation vectors for undesired concepts ∈ B and desired concepts ∈ G, respectively. and are hyperparameters that balance the influence of these terms.

The loss increases when filter vectors resemble undesired concepts and decreases when they are closer to desired concepts, similar to the word2vec loss function. As training progresses, the iflters are encouraged to learn desired concepts while avoiding undesired ones. The total loss is defined as the sum of crossentropy loss ℒℰ and semantic similarity loss ℒ , ℒ ℒ = Recalibrating the Concept Representation Vectors: We propose rectifying all the concept representation vectors for each class after every epochs during training. We do this by running the NeSyFOLD framework after every epochs and obtaining a new rule set from the partially retrained CNN. We then aggregate the new concept representation vectors with the old concept representation vectors by taking their mean.

3. Experiments and Results

We train a CNN on subsets of the Places [ 7 ] dataset and compare the results with the rule-set obtained using NeSyFOLD before and after the bias correction with our NeSyBiCor framework. Details of the experiments can be found elsewhere (Padalkar et al. [ 8 ]). Figure 2 shows the initial and bias-corrected rule-set for 2 subsets of the Places dataset. The des subset constitutes the ‘desert road’ and ‘street’ class and the defs subset comprises of the ‘desert road’, ‘forest road’ and ‘street’ classes.

The undesired concepts for the ‘desert road’ class are ‘sky’ and ‘building’. In RULE-SET 1, rule 2 uses the ‘sky1/1’ predicate to determine if the image belongs to the ‘desert road’ class. In the bias-corrected rule-set, RULE-SET 1*, there is no ‘sky’ based predicate. Moreover, the only predicate positively linked with the ‘desert road’ class is the ‘ground1_road1/1’ predicate which is based on the desired concept ‘ground’ and refers to the corresponding filter, learning a pattern comprising of specific type of patches of ‘ground’ and ‘road’. Thus it is clear that at the end of the bias correction, very few/none of the filters learn representations of the ‘sky’ or ‘building’ concepts, hence correcting the “bias” of the CNN towards them while predicting the “desert road” class.

4. Conclusion and Future Work

Our framework, in addition to correcting the bias of a CNN also allows the user to fine-tune the bias based on general concepts according to their specific needs or application. To the best of our knowledge, this is the first method that does bias correction by using the learnt representations of the CNN filters in a targeted manner. We show through our experiments that the bias-corrected rule-set is highly efective at avoiding the classification of images based on undesired concepts. It is also more likely to classify the images based on the desired concepts.

Finally, the work we presented here may be used to extend implementations of loss functions based on Diferentiable Logics: [ 9, 10, 11 ].

[1]

M. T.

Ribeiro ,

Singh ,

Guestrin , "why should i trust you?": Explaining the predictions of any classifier , 2016 . URL: https://arxiv.org/abs/1602.04938. doi: 10 .48550/ARXIV.1602. 04938.

[2]

Müezzinoglu ,

Baygin , I. Tuncer ,

P. D.

Barua ,

Baygin ,

Dogan ,

Tuncer ,

E. E.

Palmer ,

K. H.

Cheong ,

U. R.

Acharya , Patchresnet: Multiple patch division-based deep feature fusion framework for brain tumor classification using MRI images , 2023 . URL: https://doi.org/10.1007/s10278-023-00789-x. doi: 10 .1007/S10278-023-00789-X.

[3]

Barea ,

L. M.

Bergasa ,

Romera ,

López-Guillén ,

Perez ,

Tradacete ,

López , Integrating state-of-the-art cnns for multi-sensor 3d vehicle detection in real autonomous driving environments , in: Proc. ITSC , IEEE, 2019 , pp. 1425 - 1431 .

[4]

Padalkar ,

Wang , G. Gupta, Nesyfold: A framework for interpretable image classification , in: Proc. AAAI , AAAI Press, 2024 , pp. 4378 - 4387 .

[5]

Padalkar ,

Wang , G. Gupta, Using logic programming and kernel-grouping for improving interpretability of convolutional neural networks , in: Proc. PADL , volume 14512 of LNCS , Springer, 2024 , pp. 134 - 150 . URL: https://doi.org/10.1007/978-3- 031 -52038- 9 _9. doi: 10 .1007/978-3- 031 -52038-9\_9.

[6]

Wang , G. Gupta, FOLD-SE: an eficient rule-based machine learning algorithm with scalable explainability , in: Proc. PADL , volume 14512 of LNCS , Springer, 2024 , pp. 37 - 53 .

[7]

Zhou ,

Lapedriza ,

Khosla ,

Oliva ,

Torralba , Places: A 10 million image database for scene recognition , 2017 .

[8]

Padalkar ,

Ślusarz , E. Komendantskaya,

Gupta , A neurosymbolic framework for bias correction in cnns , 2024 .

[9]

Fischer ,

Balunovic , D. Drachsler-Cohen, T.

Gehr , C.

Zhang , M. Vechev, Dl2: training and querying neural networks with logic , in: International Conference on Machine Learning, PMLR , 2019 , pp. 1931 - 1941 .

[10]

Ślusarz , E. Komendantskaya,

M. L.

Daggitt ,

Stewart ,

Stark , Logic of diferentiable logics: Towards a uniform semantics of dl , in: 24th International Conference on Logic for Programming , Artificial Intelligence and Reasoning , LPAR 2023 , 2023 , pp. 473 - 493 .

[11] M. L. Daggitt , W.

Kokke , R.

Atkey , N.

Slusarz , L.

Arnaboldi , E. Komendantskaya, Vehicle: Bridging the embedding gap in the verification of neuro-symbolic programs , 2024 . URL: https://doi.org/10.48550/arXiv.2401.06379. doi: 10 .48550/ARXIV.2401. 06379. arXiv: 2401 . 06379 .