<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Journal of Computer Vision 115 (2015) 211-252. URL: https://doi.org/10.1007/
s11263</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">2640-3498</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3375627.3375830</article-id>
      <title-group>
        <article-title>Attack logics, not outputs: Towards eficient robustification of deep neural networks by falsifying concept-based properties</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raik Dankworth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gesina Schwalbe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Lübeck</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>4</volume>
      <fpage>180</fpage>
      <lpage>186</lpage>
      <abstract>
        <p>Deep neural networks (NNs) for computer vision are vulnerable to adversarial attacks, i.e., miniscule malicious changes to inputs may induce unintuitive outputs. One key approach to verify and mitigate such robustness issues is to falsify expected output behavior. This allows, e.g., to locally proof security, or to (re)train NNs on obtained adversarial input examples. Due to the black-box nature of NNs, current attacks only falsify a class of the final output , such as flipping from stop_sign to ¬stop_sign. In this short position paper we generalize this to search for generally illogical behavior, as considered in NN verification: falsify constraints ( concept-based properties) involving further human-interpretable concepts, like red ∧ octogonal → stop_sign. For this, an easy implementation of concept-based properties on already trained NNs is proposed using techniques from explainable artificial intelligence. Further, we sketch the theoretical proof that attacks on concept-based properties are expected to have a reduced search space compared to simple class falsification, whilst arguably be more aligned with intuitive robustness targets. As an outlook to this work in progress we hypothesize that this approach has potential to eficiently and simultaneously improve logical compliance and robustness.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Trustworthy AI</kwd>
        <kwd>Neural Network Verification</kwd>
        <kwd>Adversarial Attack</kwd>
        <kwd>Explainable Neural Network</kwd>
        <kwd>Concept-based XAI</kwd>
        <kwd>Computer Vision</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Neural Networks (NNs) excel in processing subsymbolic inputs like images, and are increasingly being
considered for use in safety-critical domains [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This makes it crucial to ensure their robust and intuitive
generalization, at least around known training cases. One tool to evaluate vulnerability to malicious
attacks are Adversarial Attacks (AAs): These craft inputs that induce incorrect or unexpected predictions,
using minimal modifications to a correctly handled input x with y =  (x) [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref7 ref8">2, 3, 4, 5, 6, 7, 8</xref>
        ]. However,
existing attacks solely focus on altering the model’s final output, i.e., falsify ∀x′ ∈ Nbhd(x) :  (x′) = 
for some neighborhood Nbhd(x) around x like an  -ball. This disregards whether the prediction
still conforms to high-level, interpretable properties. Common examples of known properties are
suficient conditions, e.g., red(x) ∧ octogonal(x) =⇒ stop_sign(x) in trafic sign recognition
from images ; and necessary conditions, like ¬octogonal(x) =⇒ ¬stop_sign(x). More general,
rules involving unary predicates not available from the NN outputs are here called concept-based
properties. Rich semantic rules are known to be well suited for runtime plausibility monitoring [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]
and respective fixing of NN outputs [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ]. In particular, they don’t constrain the local output to
be correct, but the underlying general logical reasoning locally around the sample.
      </p>
      <p>One reason why falsification of such informative constraints are not considered for attack generation is
that they require outputs for all involved predicates—not only the available final output, like stop_sign.
These however, might need a considerable amount of training data or hyperparameter tuning if added
right away during the training; or, even worse, not all properties and thus not all required concepts
might be known at training time due to specification gaps or later domain transfer.
7th International Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis (OVERLAY 2025),
October 26, 2025, Bologna, Italy
$ r.dankworth@uni-luebeck.de (R. Dankworth); gesina.schwalbe@uni-luebeck.de (G. Schwalbe)
 https://isp.uni-luebeck.de/staf/r-dankworth (R. Dankworth); https://isp.uni-luebeck.de/staf/g-schwalbe (G. Schwalbe)
0009-0001-5617-2069 (R. Dankworth); 0000-0003-2690-2478 (G. Schwalbe)</p>
      <p>© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        The trick we now use here is that NNs automatically learn to encode task-related concepts in their
intermediate outputs. For example, when trained for stop_sign recognition, the NN may implicitly
learn to identify octagons, red color, and the stop_label. Post-hoc supervised concept-based
explainability methods [
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16">13, 14, 15, 16</xref>
        ] can recover this information in a very sample-eficient manner
with minimal additions to the NN structure.
      </p>
      <p>Altogether, we propose and theoretically analyze a general AA goal —the Concept-based Property
Attack (ConPAtt)—that explicitly targets falsification of symbolic concept-based properties over
nonsymbolic inputs. As we will show, our formulation ofers a more general way to define both targeted
and untargeted attacks. Furthermore, as opposed to classical attacks that purely change the output,
our attack on ¬octogonal =⇒ ¬stop_sign can produce an image still classified as stop_sign, in
which the octogonal concept is no longer recognized. This newly allows to uncover failure cases with
semantically inconsistent yet possibly high-confidence predictions that are invisible to standard attacks.
As we show, standard white-box attack techniques can still easily and eficiently be applied, producing
meaningful attacks and a more constrained adversarial space as compared to traditional AAs.
Contributions. Our main contributions are:
• We introduce ConPAtt, a general XAI-supported adversarial attack goal that targets concept-based
properties rather than just NN outputs.
• We proof that ConPAtt generalizes both classical targeted and untargeted AA formulations, but
same-sized or smaller adversarial space.
• We hypothesize several advantages of ConPAtts for certifying robustness and for adversarial
retraining, posing the chance to eficiently improve both semantic consistency and robustness.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Adversarial Attacks AAs generally search within the vicinity of an input sample  for minimally
perturbed variants ˜ =  +  that have a malicious efect on the NN’s output [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The perturbations
can be arbitrary (digital AAs, considered) [
        <xref ref-type="bibr" rid="ref6 ref7">18, 19, 20, 21, 6, 7</xref>
        ], or further constrained to realistic changes
(physical AAs) [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">3, 4, 22, 23, 2, 5</xref>
        ]. However, the minimality makes the changes often invisible or dificult
to see for humans. At the methodological level, black-box approaches only require access to NN
inputs and outputs [
        <xref ref-type="bibr" rid="ref2 ref4 ref5">23, 2, 4, 24, 25, 5</xref>
        ]. White-box attacks as considered here instead exploit NN model
internals, such as the gradient, for a more eficient search [
        <xref ref-type="bibr" rid="ref3 ref6 ref7">18, 19, 21, 3, 6, 7</xref>
        ]. Generally, AAs can be
seen as a subfield of NN verification that falsifies a continuity property [ 26, 27]. Thus, usual search,
reachability analysis, and—most prominently—optimization techniques are applicable to find or disprove
adversarial examples [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Regarding types of specifications beyond continuity properties, approaches
such as Scenic [28] and VerifAI [29] demonstrate how formal specifications can be used to generate and
analyze simulation-based scenarios with symbolic inputs. In contrast, our approach targets AAs on
non-symbolic image inputs, which prevents the direct use of such tools but similarly requires formal
specifications.
      </p>
      <p>
        Concept-based Explainability Concept-based explainability generally aims to associate
humaninterpretable concepts with representations in NN latent space [30, 31, 32]. This includes understanding
which concepts are relevant to the decision and to what extent [
        <xref ref-type="bibr" rid="ref16">33, 16</xref>
        ], and how these can be accurately
recognized in NNs [
        <xref ref-type="bibr" rid="ref14">14, 34</xref>
        ]. If concept definitions in form of labeled samples are available at training
time, ante-hoc approaches [35, 33, 36, 37, 38] can train individual neurons to activate for the concept.
We here instead consider post-hoc approaches: These train a simple model to predict the concept of
interest from an NN layer’s activation [39]. Other than single-neuron-associations [40, 41], or complex
models [42, 43], linear models considered here [44, 39, 45, 46] pose a good tradeof between capturing
the entanglement of representations [44, 47], interpretability [39], and favorably simple representation
of the concept as halfspace in the NN’s latent space.
      </p>
      <p>XAI and Verification Prior work has shown that concept-based explanation methods are vulnerable
to adversarial attacks. Perturbations can mislead attribution [48] and concept-based tools [49, 50], and
adversarial examples significantly alter the internal concept composition of NNs [ 49], confirming the
general fragility of interpretability methods [51]. However, these studies target concepts in isolation,
without considering their joint relation to model predictions.</p>
      <p>
        Beyond highlighting vulnerabilities, concept outputs have also been used for verification. Mangal et
al. [52] employed vision–language models to check concept-based properties. While expressive, this
approach relies on semantic similarity in multimodal embeddings (e.g., CLIP [53]), which can introduce
linguistic ambiguity as well as imprecision for similar terms with small visual diferences, e.g., circle
versus octagon. Moreover, it is restricted to the latent space of a specific layer, although simple visual
concepts may predominantly appear earlier and diminish in later layers. Cheng et al. [54] proposed
specifications close to the output layer, but without decomposing them into underlying concepts and by
employing an additional NN. Semantic losses [
        <xref ref-type="bibr" rid="ref12">55, 12</xref>
        ] like logic tensor networks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] suggest to directly
train concept-based rules into the network. These techniques, however, are only used for updating the
NN, not for verification as done in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and not for AAs. Furthermore, they rely on concepts being direct
outputs of the NN. Even further decoupling the verification from the NN’s learned representations and
thus exacerbating training eforts, Xie et al. [ 56] even trained completely separate NNs for predicting
the concepts. Our work also directly addresses the relationship between concepts and model outputs
like, a perspective that has received little attention so far [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, similar to the verification
testing techniques from [
        <xref ref-type="bibr" rid="ref9">54, 9</xref>
        ], we suggest to keep training and verification eforts low by using faithful
explainability techniques to access concept predictions, and we newly apply the setup to AAs.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <p>Adversarial Attacks Let x ∈  be a real image, y ∈  be its true label, and  :  →  be a NN. An
AA seeks an adversarial example xadv := x +  ∈  so that its output is (suficiently) diferent from the
original, and the perturbation  is minimal to an objective function  (usually the L1, L2, or L-infinity
norm on the input for digital attacks). Suficient diference can be formulated in terms of a y-specific
partition of the output set  into a benign output set + ⊂  with  (x) ∈ +, and a malicious one
− :=  ∖ +. The search for the minimum perturbation  then is the optimization problem
argmin ( )
s.t.  (x +  ) ∈ − .</p>
      <p>
        (1)
Adversarial attack strategies for classification are categorized as targeted or untargeted according to
their choice of − : Let  :  → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] denote the confidence assigned to class , and   ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] the
threshold required to accept class . In untargeted attacks, the goal is to reduce the confidence of the
true class below threshold, i.e., − = {y ∈  | (y) &lt;  }. In contrast, targeted attacks aim to raise
the confidence of an incorrect class ′ above a threshold, i.e., − = {y ∈  | ′ (y) ≥  ′ }.
Post-hoc Concept Extraction Let  be a set of concepts (e.g.,  = {red, orthogonal}), and
assume a possibly small classification dataset  = ((x, y,)) is available per concept  ∈ . Further
denote by → :  →  the NN part that maps from the th to the th layer. Through linear post-hoc
concept extraction, additional concept outputs are added to the NN by attaching for each  a linear
classification model → :  →  = [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] to the th hidden layer as illustrated in Figure 1. Keeping
the NN’s weights fixed, the weights of → are trained on pairs ((→(x), y,)), such that ’s concept
function  = → ∘ → :  →  correctly predicts presence of the concept in an input image. Note
that → being linear conveniently makes any subspace { ∈  | →() &gt;  } an afine linear
half-space. In the following, we denote by  = ()∈ :  →  = ()∈ the complete prediction
of all concepts, and by  =  ×  the complete output set after attaching the concept outputs.
T-Norm Fuzzy Logic The standard Boolean logical connectives (and ∧, or ∨, not ¬) can only operate
on binary truth values in B = {0, 1}. T-norm fuzzy logics extend the connectives to many-valued
input layer
hidden
layer 1
hidden
layer 2
hidden
layer 3
output layer
concept
truth values in B = [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] using a so-called t-norm ∧ : [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] × [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] to replace the ∧. A valid
t-norm must be monotonic, commutative and associative, have a neutral element (the 1), and match ∧
on Boolean values. Typical choices for  ∧  are Product ( · ), Łukasiewicz (max(0,  +  − 1)), and
Gödel (min(, )) t-norms [57], since these form a generating system for all continuous t-norms. Given
a ∧, then ¬ := 1 − , ∨, =⇒  : [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]2 → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] can be derived and maintain desirable properties,
giving the resulting t-norm logic.
      </p>
      <p>
        Desirable properties for use of t-norm logic with NN classification outputs are: (1) The NN typically
produces a confidence prediction in [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] instead of a Boolean value, which can be propagated by t-norm
fuzzy logic to the confidence of entire logical expressions. (2) The classicale piece-wise continuous
t-norm logic connectives are also piece-wise diferentiable like ReLU activations of NNs. So, they can
directly be used in backpropagation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <p>In this chapter we first define our new notion of concept-based AAs. Then we show that standard AAs
are a special case, and existing attack techniques can easily be adopted to our new attack.</p>
      <sec id="sec-4-1">
        <title>4.1. Concept-based Property Attacks</title>
        <p>These classical AA types can also be interpreted as special cases of property attacks, where class
predictions are treated as logical literals. Using fuzzy logic (see paragraph 3), we can evaluate logical
expressions over outputs using a function solve :  → B that returns the truth value of a property  .
A property attack falsifies a given property, i.e., − = − = {y ∈  | solve¬ (y)}. Untargeted and
targeted attacks correspond to properties  =  and  = ¬, respectively.</p>
        <p>This perspective allows adversarial examples to be crafted with higher-order conditions — e.g.,
enforcing both “dog” () and “cat” () simultaneously. The corresponding attacked property is its logical
negation:  = ¬ ∨ ¬.</p>
        <p>The point of view of property attacks can also be applied to NNs that are boosted with XAI techniques.
The additional concept outputs can also be used as well as the original task output of the NN to define
property attacks— Concept-based Property Attacks (ConPAtt). For denoting the properties we propose to
use the following intuitive and convenient implication format generalizing our introductory examples
(all logical expressions can be reformulated like this, see Lemma 1). Note that for simplicity we shorten
(x) to , and (¬) shorthands possibly negated .</p>
        <p>Lemma 1. Each logical expression  with two disjoint literal sets  and  can be reformulated into a term
of conjunctively linked implication terms where antecedents consist only of conjunctively linked, possibly
negated literals of , and consequences consist only of disjunctively linked, possibly negated literals of .
Proof. Each logical expression can be reformulated into the conjunctive normal form  ≡
⋀︀&gt;0(⋁︀∈⊆  (¬) ⋁︀∈⊆ (¬)). Let us introduce two additional variable families  ,   that
condense the disjunctive subformulas:
  := ¬</p>
        <p>⋁︁
∈⊆ 
(¬) ≡</p>
        <p>⋀︁
∈⊆ 
(¬)
  :=</p>
        <p>⋁︁ (¬)
∈⊆ 
The subformulas can be replaced by these variables and the whole logical expression  reformulates to
 ≡ ⋀︀ (¬  ∨  ) ≡ ⋀︀ (  =⇒  ).</p>
        <p>&gt;0 &gt;0
Definition 1 (Concept-based property). A concept-based property  is a logical expression with two
disjoint literal sets —the concept literals—and —the task literals—in the form of conjunctively linked
implication terms whose antecedents consist only of conjunctively linked, possibly negated concept
literals and whose consequences consist only of disjunctively linked, possibly negated task literals.
 := ⋀︁ (  =⇒  ) ,
&gt;0
with   := ⋀︁(¬)</p>
        <p>and   := ⋁︁(¬)
∈⊆ 
∈⊆ 
Definition 2 (Concept-based Property Attack). Let solve :  → B be the function to calculate the
truth value of a concept-based property  which evaluates to true at an input x, and  a minimality
measure for perturbations  . A Concept-based Property Attack of  is the search for a -minimal
perturbation  to an input x into an adversarial example xadv = x +  which falsifies  , i.e., lies in the
malicious output set</p>
        <p>− = − = {z ∈  | solve¬ (z)}</p>
        <p>Intuitively, a ConPAtt adversarial example xadv to  = (⋀︀  =⇒ ⋁︀ ) like red ∧ octogonal =⇒
stop_sign, causes the NN to predict all  as true, and all  as false. This can happen if (1) some 
is predicted true even though it should be false (e.g., red predicted true even though the change 
turned the sign gray), and/or (2) some  is predicted negative even though it should be positive (e.g.,
stop_sign flipped to false).
(2)
(3)
(4)</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. ConPAtts as Generalized Adversarial Attacks</title>
        <p>Note that falsifying one implication term is enough to falsify a concept-based property and thus, it is
suficient to consider one implication  =  =⇒  for an attack. The set of adversarial example task
outputs can be derived from this definition, i.e. − := {y ∈  | (y, c) ∈ − }. Furthermore:
Theorem 1. Standard targeted and untargeted AAs are special cases of ConPAtt.</p>
        <p>Proof. First note the two special cases of ConPAtt where only a single task literal is used:
1. Generalized untargeted AAs:  =⇒ .</p>
        <p>2. Generalized targeted AAs:  =⇒ ¬.</p>
        <p>Un-/targeted respective are generalized un-/targeted AAs with  ≡ true, i.e., no concept restriction.</p>
        <p>A neat property of ConPAtts is that the search space is generally reduced compared to vanilla AAs:
Theorem 2. The task output spaces of adversarial examples for generalized untargeted/targeted AAs are
smaller than or equal to those for standard untargeted/targeted AAs.</p>
        <p>− =⇒  ⊆  −
− =⇒ ¬ ⊆  ¬−
Proof. Let us first look at generalized untargeted AA properties like  =⇒ . Each adversarial example
must lack class prediction  but requires concept predictions  , i.e., they satisfy the property  ∧ ¬. In
contrast to that, standard untargeted AAs only require the misclassification of , i.e. each adversarial
example satisfies ¬ and they accept adversarial examples that do not additionally fulfill  . It follows
that the valid output space of adversarial examples for generalized untargeted AAs − =⇒  is smaller
than or equal to that for standard untargeted AAs − as well as for their valid task output spaces
− =⇒  ⊆  − .</p>
        <p>In this explanation, it does not matter whether both adversarial examples expect a misclassification ¬
or a specific task output . That is why this relation also applies between generalized targeted AAs and
standard targeted AAs, i.e. − =⇒ ¬ ⊆  ¬−.</p>
        <p>ConPAtt Procedure ConPAtt can be easily performed with any existing AA approach. The trick is to
use the result of the (partially) diferentiable fuzzy operation  ∘ (,  ) :  → B instead of the output
of the NN. This makes ConPAtt a targeted AA with the expected result False or 0 for the adversarial
examples.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Outlook: ConPAtt for Adversarial Training</title>
      <p>In the following we discuss further what practical benefits we expect from this more general formulation
of attack goals, how this could be evaluated, and which challenges are still open.</p>
      <sec id="sec-5-1">
        <title>5.1. Hypothesized Benefits of ConPAtts</title>
        <p>We hypothesize that
• generalized (un-)targeted AAs with at least one concept reduce the search space for adversarial
examples not only theoretically but also empirically,
• the adversarial examples obtained via ConPAtt are particularly eficient for retraining because
they are pinpoint adversarial examples with a high information content.</p>
        <p>ConPAtts versus Standard AAs: To understand above claims, one should first have a closer look
at the vulnerabilities that can be exploited for a successfull ConPAtt attack against a concept-based
property  = ( =⇒ ). Standard AAs capture any cases, where the final output  is changed,
regardless of whether this resulted in illogical behavior breaking  or not. Thus, standard AAs may
primarily focus on turning of causally related early-layer concepts, i.e., falsifying  to falsify . For
example, falsify red to cause a negative output of stop_sign. This is not suficient for a ConPAtt to ,
for which not only  must become false, but simultaneously  must remain true (cf. Theorem 2). It is
therefore not guaranteed that one obtains the same results for ConPAtts against any of the following
concept-based properties:
•  = (true =⇒ ), which is the standard AA against the output ,
•  = (¬ =⇒ false), which is the standard AA against the concept outputs, i.e., the attack flips
any concept  in the conjunction  = ⋀︀  to false, and
•  = ( =⇒ ), which is a generalized concept-based property attack.</p>
        <p>Whether the obtained adversarial examples are similar depends on whether it is easier to attack
concepts—then falsifying  and  should yield similar results—or logics, in which case falsifying 
and  are expected to yield similar results. Since concepts themselves represent noisy variables with
non-perfect accuracy, chances are high that attacking concepts generally is easier than attacking logics.
Our ConPAtt framework provides the option to test and train on these diferent rules individually,
and hence distinguish more finegrained between simply attacking the concepts or outputs, and truly
attacking internal logics.</p>
        <p>
          Benefits of Targeting Logics: One reason for both of the claims is on semantic level:
Humandefined properties typically encode important knowledge about the task at hand, thus should strengthen
both the adherence to the properties and indirectly the actual main task of the network. Given that
well-generalizing NNs typically adopt this knowledge to large extend, the cases of logic breaches should
be few but meaningful. This would make attacking logics especially beneficial for retraining
purposes similar to adversarial training [
          <xref ref-type="bibr" rid="ref17">17, 58</xref>
          ].
        </p>
        <p>Benefits for Computational Eficiency: Also, here directly benefit from low integration overhead:
(1) Preparation only requires cheap post-hoc concept extraction; (2) Only very few additional operations
(the →) are added that need backpropagation/-tracing if gradient-based attack methods are used; and
(3) The beneficial formulation of concepts as half-spaces in latent spaces allows eficient reachability
analysis with substantial reduction in the search space as illustrated in Figure 2 and sketched in</p>
        <p>Appendix A. Next steps should empirically test the attack success and the efect of retraining with
adversarial examples of this approach.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Future Work: Evaluation and Challenges</title>
        <p>
          Planned Experimental Setting: We suggest to evaluate several aspects to ensure a comprehensive
assessment. As metrics, we consider both task performance and rule adherence, measured through
accuracy and Intersection-over-Union (IoU) for task prediction as well as rule satisfaction. In addition,
we track the success of adversarial attacks before retraining, as well as the efectiveness of defences and
the accuracy of concepts after retraining. For evaluation, we draw on three established datasets: MNIST
[59], GTSRB [60], and ImageNet [61]. The models include self-trained simple architectures for MNIST
and GTSRB, as well as a range of widely used ImageNet classifiers: Inception-v3 [ 62], Inception-v4
[63], Inception-Resnet-v2 [63], Resnet-v2-101 [64] and the ensemble-based variants Inception v33,
Inception v34 and IncRes v2 [58]. For baselines, we rely on several state-of-the-art adversarial
attack methods, namely SGM [19], VMI-FGSM and VNI-FGSM [21], L2T [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and BSR [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>The attacked concept-based properties reflect both simple and more complex relations. Examples
include that class 1 implies the concept line, that classes 1 and 2 should never be predicted simultaneously
(i.e., ¬1 ∨ ¬2), and that the concepts red, octagon, and stop_label together imply stop_sign.
Challenges and further Future Work: As explained above, it is expected that ConPAtts not
necessarily yield the same results as standard AAs that attack outputs or concepts. In addition to above
experiments, one could contrastively compare results for the diferent attacks for insights how large
the gap truly is. However, a considerable challenge for the experimental evaluation is that retraining
procedures may need to be adapted: (Adversarially) retraining with respect to the task output might
accidentally destroy the post-hoc attached concept outputs. Countermeasures might be to freeze earlier
NN parts up to the concept prediction, or alternatingly or simultaneously retrain the NN and the concept
predictors. Experiments must show how to balance need for concept labels with concept accuracy
during adversarial finetuning.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this position paper, we introduce a novel generalized adversarial attack goal: Instead of targeting a
change in (respectively falsification of) the output class, our attacks aim to falsify the compliance of the
NN with prior symbolic knowledge on suficient indicators for an output class. Standard AAs are shown
to be a specific case of our generalized formulation for concept-based properties. Also, these allow to
substantially reduce the expected search space of the AA search with increasing number of concepts.
Also, we argue that these concept-based properties provide a more natural and human-aligned target
for AAs. This suggests that they might be particularly suited for NN robustification via adversarial
model (re)training or runtime monitoring.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported through the junior research group project “chAI” funded by the German
Federal Ministry of Research, Technology and Space (BMFTR), grant no. 01IS24058. The authors are
solely responsible for the content of this publication.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used ChatGPT based on GPT-4o in order to: Improve
writing style. After using these tool(s)/service(s), the author reviewed and edited the content as needed
and takes full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Considerations for Reachability-based Search</title>
      <p>Existing reachability-based techniques conduct forward and/or backward passes through the NN to
trace / estimate regions of interest through the NN processing. We here show how the considered
concept-based properties give rise to a particularly eficient formulation of this approach: Being
halfspaces in intermediate layers, the (negated) concepts have the potential to easily and substantially
reduce the adversarial space that one needs to keep track of half-way through the network and can
also be easily described in later layers as sketched in Figure 2. In the following, this is illustrated for a
back-propagation approach for a simple generalized untargeted attack (1 ∧ · · · ∧ ) =⇒ . Recall
that a valid counterexample falsifying the property (1 ∧ · · · ∧ ) =⇒  must fulfil 1 ∧ · · · ∧  ∧ ¬.</p>
      <p>
        Denote by ℒ→ℒ′ : ℒ → ℒ′ the NN part mapping from layer ℒ to ℒ′, and (ℒℒ′ →) =  ∘ ℒ→ ∘
ℒ′→ℒ : ℒ′ → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] the function evaluating the presence of concept  in layer ℒ for a latent vector
 from an earlier layer ℒ′. Denote by ℒ the layer which was chosen for the embedding of concept
, and let ℒ1 be the earliest layer for which ℒ1 = ℒ for some . Note that ℒ = ℒ− 1 is the final
representation layer before the output confidence prediction, if this is  layers later than ℒ1. Let
 = { ∈ ℒ | ℒ→() &lt;  } be the halfspace of the concept  in the concept’s ℒ.
      </p>
      <p>Now we can reformulate the falsification as a search for a region in latent space:
Lemma 2. A representation  = →ℒ() ∈ ℒ in layer ℒ of a valid counterexample  ∈  to the
concept-based property (1 ∧ · · · ∧ ) =⇒  must fulfil  ∈ ∈⋀︀,  ℒ− →1ℒ ().</p>
      <p>While it is costly to determine  − 1</p>
      <p>ℒ→ℒ () independently, the concept-based property gives rise to a
recursive definition:
Theorem 3. Recursively define the propagation of halfspace intersections through the NN
− 1 =</p>
      <p>⋂︁
ℒ=ℒ
 ,
 =  ℒ−1− 1→ℒ (+1) ∩</p>
      <p>⋂︁ 
ℒ=ℒ
(5)
constraint of considering ReLU networks.</p>
      <p>Then for any counterexample  to above concept-based property it must hold that →ℒ() ∈ 1. 1 can
be eficiently calculated using a single backward propagation through layers  − 1 to 1.
Proof. The property inductively follow from the definition, noting that 1 = ⋀︀  − 1
∈, ℒ→ℒ () and the</p>
      <p>In particular, each propagation step only requires to obtain a polytope’s preimage for a single NN
layer operation, and apply a cheap intersection of the resulting polytope with halfspaces. This makes
the first part of the search very eficient, promising speedup compared to a full end-to-end search for
counterexamples directly in the input space.</p>
      <p>The forward-propagation case is similar. Here, it can additionally be shown, that the propagated
 always is a connected polytope, since intersection with half-spaces does not change this property,
neither does the forward pass through continuous layer operations.
A Survey on Adversarial Attacks and Defense Mechanisms on Image Classification, IEEE Access
10 (2022) 102266–102291. doi:10.1109/ACCESS.2022.3208131.
[18] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing
properties of neural networks, 2014. URL: http://arxiv.org/abs/1312.6199. doi:10.48550/arXiv.
1312.6199, arXiv:1312.6199 [cs].
[19] D. Wu, Y. Wang, S.-T. Xia, J. Bailey, X. Ma, Skip Connections Matter: On the Transferability of
Adversarial Examples Generated with ResNets, 2019. URL: https://openreview.net/forum?id=
BJlRs34Fvr.
[20] J. Su, D. V. Vargas, K. Sakurai, One Pixel Attack for Fooling Deep Neural Networks, IEEE
Transactions on Evolutionary Computation 23 (2019) 828–841. URL: https://ieeexplore.ieee.org/
document/8601309. doi:10.1109/TEVC.2019.2890858, conference Name: IEEE Transactions
on Evolutionary Computation.
[21] X. Wang, K. He, Enhancing the Transferability of Adversarial Attacks Through Variance
Tuning, 2021, pp. 1924–1933. URL: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_
Enhancing_the_Transferability_of_Adversarial_Attacks_Through_Variance_Tuning_CVPR_
2021_paper.html.
[22] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, D. Song,
Robust Physical-World Attacks on Deep Learning Visual Classification, 2018, pp. 1625–1634.
URL: https://openaccess.thecvf.com/content_cvpr_2018/html/Eykholt_Robust_Physical-World_
Attacks_CVPR_2018_paper.
[23] A. Liu, X. Liu, J. Fan, Y. Ma, A. Zhang, H. Xie, D. Tao, Perceptual-Sensitive GAN for Generating
Adversarial Patches, Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019) 1028–
1035. URL: https://ojs.aaai.org/index.php/AAAI/article/view/3893. doi:10.1609/aaai.v33i01.
33011028, number: 01.
[24] W. Huang, X. Zhao, G. Jin, X. Huang, SAFARI: Versatile and Eficient Evaluations for Robustness
of Interpretability, 2023, pp. 1988–1998. URL: https://openaccess.thecvf.com/content/ICCV2023/
html/Huang_SAFARI_Versatile_and_Eficient_Evaluations_for_Robustness_of_Interpretability_
ICCV_2023_paper.html.
[25] D. Wang, W. Yao, T. Jiang, C. Li, X. Chen, RFLA: A Stealthy Reflected Light Adversarial Attack in
the Physical World, 2023, pp. 4455–4465. URL: https://openaccess.thecvf.com/content/ICCV2023/
html/Wang_RFLA_A_Stealthy_Reflected_Light_Adversarial_Attack_in_the_Physical_ICCV_
2023_paper.html.
[26] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, M. J. Kochenderfer, Algorithms for verifying
deep neural networks, Foundations and Trends® in Optimization 4 (2021) 244–404. doi:10.1561/
2400000035. arXiv:1903.06758.
[27] Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, D. Kroening, Concolic testing for deep
neural networks, in: Proc. 33rd ACM/IEEE Int. Conf. Automated Software Engineering, ACM,
Montpellier, France, 2018, pp. 109–119. doi:10.1145/3238147.3238172.
[28] D. J. Fremont, T. Dreossi, S. Ghosh, X. Yue, A. L. Sangiovanni-Vincentelli, S. A. Seshia, Scenic: a
language for scenario specification and scene generation, in: Proceedings of the 40th ACM
SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Association
for Computing Machinery, New York, NY, USA, 2019, pp. 63–78. URL: https://dl.acm.org/doi/10.
1145/3314221.3314633. doi:10.1145/3314221.3314633.
[29] T. Dreossi, D. J. Fremont, S. Ghosh, E. Kim, H. Ravanbakhsh, M. Vazquez-Chanlatte, S. A. Seshia,
VerifAI: A Toolkit for the Formal Design and Analysis of Artificial Intelligence-Based Systems, in:
I. Dillig, S. Tasiran (Eds.), Computer Aided Verification, Springer International Publishing, Cham,
2019, pp. 432–442. doi:10.1007/978-3-030-25540-4_25.
[30] J. H. Lee, G. Mikriukov, G. Schwalbe, S. Wermter, D. Wolter, Concept-Based Explanations in
Computer Vision: Where Are We and Where Could We Go?, in: A. Del Bue, C. Canton, J. Pont-Tuset,
T. Tommasi (Eds.), Computer Vision – ECCV 2024 Workshops, Springer Nature Switzerland, Cham,
2025, pp. 266–287. doi:10.1007/978-3-031-92648-8_17.
[31] E. Poeta, G. Ciravegna, E. Pastor, T. Cerquitelli, E. Baralis, Concept-based Explainable Artificial</p>
      <p>Intelligence: A Survey, 2023. doi:10.48550/arXiv.2312.12936. arXiv:2312.12936.
[32] G. Schwalbe, Concept Embedding Analysis: A Review, 2022. doi:10.48550/arXiv.2203.13909.</p>
      <p>arXiv:2203.13909.
[33] A. Wan, L. Dunlap, D. Ho, J. Yin, S. Lee, S. Petryk, S. A. Bargal, J. E. Gonzalez, NBDT: Neural-Backed</p>
      <p>Decision Tree, 2020. URL: https://openreview.net/forum?id=mCLVeEpplNE.
[34] G. Schwalbe, Verification of Size Invariance in DNN Activations Using Concept Embeddings, in:
I. Maglogiannis, J. Macintyre, L. Iliadis (Eds.), Artificial Intelligence Applications and Innovations,
volume 627, Springer International Publishing, Cham, 2021, pp. 374–386. URL: https://link.springer.
com/10.1007/978-3-030-79150-6_30. doi:10.1007/978-3-030-79150-6_30.
[35] P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, P. Liang, Concept Bottleneck
Models, in: Proceedings of the 37th International Conference on Machine Learning, PMLR, 2020,
pp. 5338–5348. URL: https://proceedings.mlr.press/v119/koh20a.html.
[36] M. Yuksekgonul, M. Wang, J. Zou, Post-hoc Concept Bottleneck Models, 2022. URL: https:
//openreview.net/forum?id=nA5AZ8CEyow.
[37] T. Oikarinen, S. Das, L. M. Nguyen, T.-W. Weng, Label-free Concept Bottleneck Models, 2022.</p>
      <p>URL: https://openreview.net/forum?id=FlCg47MNvBA.
[38] Y. Yang, A. Panagopoulou, S. Zhou, D. Jin, C. Callison-Burch, M. Yatskar, Language in a Bottle:
Language Model Guided Concept Bottlenecks for Interpretable Image Classification, 2023, pp.
19187–19197. URL: https://openaccess.thecvf.com/content/CVPR2023/html/Yang_Language_in_a_
Bottle_Language_Model_Guided_Concept_Bottlenecks_for_CVPR_2023_paper.html.
[39] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, R. Sayres, Interpretability beyond
feature attribution: Quantitative testing with concept activation vectors (TCAV), in: Proc. 35th
Int. Conf. Machine Learning, volume 80 of Proceedings of Machine Learning Research, PMLR,
Stockholmsmässan, Stockholm, Sweden, 2018, pp. 2668–2677.
[40] D. Bau, B. Zhou, A. Khosla, A. Oliva, A. Torralba, Network dissection: Quantifying interpretability
of deep visual representations, in: Proc. 2017 IEEE Conf. Comput. Vision and Pattern Recognition,
IEEE Computer Society, Honolulu, HI, USA, 2017, pp. 3319–3327. doi:10.1109/CVPR.2017.354.
arXiv:1704.05796.
[41] C. Olah, A. Mordvintsev, L. Schubert, Feature visualization, Distill 2 (2017) e7. doi:10.23915/
distill.00007.
[42] J. Crabbé, M. van der Schaar, Concept Activation Regions: A Generalized Framework For
Concept</p>
      <p>Based Explanations, Advances in Neural Information Processing Systems 35 (2022) 2590–2607.
[43] R. Zhang, P. Madumal, T. Miller, K. A. Ehinger, B. I. P. Rubinstein, Invertible concept-based
explanations for CNN models with non-negative concept activation vectors, in: Proc. 35th AAAI
Conf. Artificial Intelligence, volume 35, AAAI Press, virtual, 2021, pp. 11682–11690.
[44] R. Fong, A. Vedaldi, Net2Vec: Quantifying and explaining how concepts are encoded by filters
in deep neural networks, in: Proc. 2018 IEEE Conf. Comput. Vision and Pattern Recognition,
IEEE Computer Society, Salt Lake City, UT, USA, 2018, pp. 8730–8738. doi:10.1109/CVPR.2018.
00910.
[45] M. Graziani, V. Andrearczyk, H. Müller, Regression concept vectors for bidirectional explanations
in histopathology, in: D. Stoyanov, Z. Taylor, S. M. Kia, I. Oguz, M. Reyes, A. Martel, L. Maier-Hein,
A. F. Marquand, E. Duchesnay, T. Löfstedt, B. Landman, M. J. Cardoso, C. A. Silva, S. Pereira,
R. Meier (Eds.), Understanding and Interpreting Machine Learning in Medical Image Computing
Applications, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2018,
pp. 124–132. doi:10.1007/978-3-030-02628-8_14.
[46] G. Mikriukov, G. Schwalbe, K. Bade, Local Concept Embeddings for Analysis of Concept
Distributions in Vision DNN Feature Spaces, International Journal of Computer Vision (2025).
doi:10.1007/s11263-025-02446-y.
[47] M. Dreyer, E. Purelku, J. Vielhaben, W. Samek, S. Lapuschkin, PURE: Turning Polysemantic
Neurons Into Pure Features by Identifying Relevant Circuits, in: CVPR2024 Workshops, XAI4CV,
arXiv, Seattle Convention Center, Seattle, WA, USA, 2024. doi:10.48550/arXiv.2404.06453.
arXiv:2404.06453.
[63] C. Szegedy, S. Iofe, V. Vanhoucke, A. Alemi, Inception-v4, Inception-ResNet and the Impact of
Residual Connections on Learning, Proceedings of the AAAI Conference on Artificial
Intelligence 31 (2017). URL: https://ojs.aaai.org/index.php/AAAI/article/view/11231. doi:10.1609/aaai.
v31i1.11231, number: 1.
[64] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, 2016, pp. 770–
778. URL: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_
CVPR_2016_paper.html.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rech</surname>
          </string-name>
          ,
          <article-title>Artificial Neural Networks for Space and Safety-Critical Applications: Reliability Issues and Potential Solutions</article-title>
          ,
          <source>IEEE Transactions on Nuclear Science</source>
          <volume>71</volume>
          (
          <year>2024</year>
          )
          <fpage>377</fpage>
          -
          <lpage>404</lpage>
          . URL: https: //ieeexplore.ieee.org/abstract/document/10380628. doi:
          <volume>10</volume>
          .1109/TNS.
          <year>2024</year>
          .
          <volume>3349956</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Suryanto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Larasati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yun</surname>
          </string-name>
          , T.-
          <string-name>
            <surname>T.-H. Le</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            , S.-Y. Oh,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
          </string-name>
          , DTA:
          <string-name>
            <surname>Physical Camouflage Attacks Using Diferentiable Transformation Network</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>15305</fpage>
          -
          <lpage>15314</lpage>
          . URL: https://openaccess.thecvf.com/content/CVPR2022/html/Suryanto_DTA_
          <article-title>Physical_ Camouflage_Attacks_Using_Diferentiable_Transformation_Network_CVPR_2022_paper</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , Physical-World
          <source>Optical Adversarial Attacks on 3D Face Recognition</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>24699</fpage>
          -
          <lpage>24708</lpage>
          . URL: https://openaccess.thecvf.com/content/CVPR2023/html/ Li_Physical-World_
          <article-title>Optical_Adversarial_Attacks_on_3D_Face_Recognition_CVPR_2023_paper</article-title>
          . html.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tiliwalidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Adversarial Laser Spot: Robust and Covert Physical-World Attack to DNNs</article-title>
          , in
          <source>: Proceedings of The 14th Asian Conference on Machine Learning, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>483</fpage>
          -
          <lpage>498</lpage>
          . URL: https://proceedings.mlr.press/v189/hu23b.html, iSSN:
          <fpage>2640</fpage>
          -
          <lpage>3498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <source>Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>24452</fpage>
          -
          <lpage>24461</lpage>
          . URL: https://openaccess.thecvf.com/content/CVPR2024/html/Zheng_Physical_3D_Adversarial_ Attacks_against_Monocular_Depth_Estimation_in_Autonomous_CVPR_
          <year>2024</year>
          <article-title>_paper</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liang</surname>
          </string-name>
          , Learning to Transform
          <source>Dynamically for Better Adversarial Transferability</source>
          ,
          <year>2024</year>
          . URL: https://openreview.net/forum?id=k76ngWX9OR.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Boosting Adversarial Transferability by Block Shufle and Rotation</article-title>
          , in: 2024
          <source>IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>24336</fpage>
          -
          <lpage>24346</lpage>
          . URL: https://ieeexplore.ieee.org/abstract/document/10656871. doi:
          <volume>10</volume>
          .1109/CVPR52733.
          <year>2024</year>
          .
          <volume>02297</volume>
          , iSSN:
          <fpage>2575</fpage>
          -
          <lpage>7075</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ming</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <source>Boosting the Transferability of Adversarial Attack on Vision Transformer with Adaptive Token Tuning</source>
          ,
          <year>2024</year>
          . URL: https://openreview.net/forum?id= sNz7tptCH6.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schwalbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wirth</surname>
          </string-name>
          , U. Schmid,
          <article-title>Enabling verification of deep neural networks in perception tasks using fuzzy logic and</article-title>
          concept embeddings,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2201.00572. arXiv:
          <volume>2201</volume>
          .
          <fpage>00572</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stoian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cuzzolin</surname>
          </string-name>
          , T. Lukasiewicz,
          <string-name>
            <surname>ROAD-R: The Autonomous Driving</surname>
          </string-name>
          <article-title>Dataset with Logical Requirements</article-title>
          , in: IJCLR 2022 Workshops, Vienna, Austria,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ledaguenel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hudelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khouadjia</surname>
          </string-name>
          ,
          <article-title>Improving Neural-based Classification with Logical Background Knowledge</article-title>
          ,
          <source>in: ECAI 2024 Workshop Proceedings</source>
          , arXiv, Santiago de Compostela, Spain,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>13019</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Badreddine</surname>
          </string-name>
          , A.
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Serafini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Spranger</surname>
          </string-name>
          , Logic Tensor Networks,
          <source>Artificial Intelligence</source>
          <volume>303</volume>
          (
          <year>2022</year>
          )
          <article-title>103649</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.artint.
          <year>2021</year>
          .
          <volume>103649</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <source>Network Dissection: Quantifying Interpretability of Deep Visual Representations</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>6541</fpage>
          -
          <lpage>6549</lpage>
          . URL: https://openaccess.thecvf.com/content_ cvpr_2017/html/Bau_Network_Dissection_Quantifying_CVPR_
          <year>2017</year>
          <article-title>_paper</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fong</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Vedaldi,</surname>
          </string-name>
          <article-title>Net2Vec: Quantifying and Explaining How Concepts Are Encoded by Filters in Deep Neural Networks</article-title>
          ,
          <year>2018</year>
          , pp.
          <fpage>8730</fpage>
          -
          <lpage>8738</lpage>
          . URL: https://openaccess.thecvf.com/content_cvpr_ 2018/html/Fong_Net2Vec_
          <article-title>Quantifying_and_CVPR_2018_paper</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Crabbé</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van der Schaar</surname>
          </string-name>
          , Concept Activation Regions:
          <string-name>
            <given-names>A Generalized</given-names>
            <surname>Framework For Concept-Based</surname>
          </string-name>
          <string-name>
            <surname>Explanations</surname>
          </string-name>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>2590</fpage>
          -
          <lpage>2607</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2022/hash/ 11a7f429d75f9f8c6e9c630aeb6524b5-Abstract-Conference.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Oikarinen</surname>
          </string-name>
          , T.-W. Weng,
          <source>CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/forum?id=iPWiwWHc1V.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Khamaiseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bagagem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Alaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mancino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Alomari</surname>
          </string-name>
          , Adversarial Deep Learning:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>