<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Poisoning and Artificial Intelligence Modeling: Theoretical Foundations and Defensive Strategies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Massimiliano Ferrara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Law, Economics and Human Sciences, Mediterranea University of Reggio Calabria</institution>
          ,
          <addr-line>Reggio Calabria</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data poisoning represents a significant and growing threat in the field of artificial intelligence (AI), compromising the reliability and integrity of machine learning (ML) models. This paper presents a comprehensive analysis of data poisoning attacks and their countermeasures, with three main contributions: (1) a systematic framework for understanding the theoretical foundations of data poisoning attacks, (2) a mathematical formulation of attack vectors and their impact on learning outcomes, and (3) a novel defensive approach based on the concept of "Dataset Core" that preserves information value while mitigating poisoning efects. By examining both attack mechanisms and defense strategies through a unified mathematical lens, we bridge the gap between theoretical understanding and practical defense implementation. Our proposed Dataset Core approach demonstrates promising potential for creating resilient ML systems that maintain performance integrity in adversarial environments, contributing to the secure deployment of AI in critical real-world applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data poisoning</kwd>
        <kwd>Adversarial Attack</kwd>
        <kwd>Dataset Core</kwd>
        <kwd>Informations Value</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid advancement of artificial intelligence and its integration into critical domains such as
healthcare, finance, autonomous systems, and cybersecurity has raised significant concerns regarding the
security and reliability of these technologies. One particularly insidious threat is data poisoning—a
deliberate manipulation of training data designed to compromise the performance, integrity, or behavior
of machine learning models [
        <xref ref-type="bibr" rid="ref3">3, 35</xref>
        ].
      </p>
      <p>
        Unlike random errors or natural biases in datasets, data poisoning is characterized by its malicious
intent and strategic execution, making it a potent form of adversarial attack. These attacks exploit the
fundamental dependency of machine learning models on their training data, creating vulnerabilities
that can lead to misclassification, biased decision-making, or backdoor vulnerabilities that activate only
under specific conditions [
        <xref ref-type="bibr" rid="ref17 ref4">17, 4</xref>
        ].
      </p>
      <p>
        The implications of successful data poisoning are far-reaching and potentially severe. In healthcare,
poisoned models might misdiagnose conditions; in autonomous vehicles, they could fail to recognize
obstacles; in financial systems, they might overlook fraudulent activities. Beyond these direct impacts,
widespread data poisoning could erode public trust in AI systems, hindering adoption and innovation
in the field [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>This paper addresses three central research questions:
1. What are the theoretical foundations and mathematical mechanisms underlying diferent types
of data poisoning attacks?
2. How do various data poisoning strategies impact learning outcomes and model performance?
3. What defensive strategies can efectively mitigate data poisoning, particularly our proposed
"Dataset Core" approach?
2nd Workshop “New frontiers in Big Data and Artificial Intelligence” (BDAI 2025), May 29-30, 2025, Aosta, Italy
$ massimiliano.ferrara@unirc.it (M. Ferrara)
0000-0002-3663-836X (M. Ferrara)</p>
      <p>© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>To address these questions, we organize the paper as follows: Section 2 provides essential background
on AI, machine learning, and their vulnerabilities. Section 3 explores the taxonomy and mechanisms of
data poisoning attacks. Section 4 presents a unified mathematical framework for analyzing poisoning
attacks. Section 5 examines the consequences of poisoning on learning outcomes. Section 6 reviews
existing mitigation strategies with their mathematical formulations. Section 7 introduces our novel
"Dataset Core" approach. Finally, Section 8 summarizes key findings and outlines future research
directions.</p>
      <p>Our work contributes to the field by integrating theoretical understanding with practical defense
mechanisms, emphasizing the importance of model robustness in increasingly adversarial environments.
By developing a comprehensive framework for understanding and countering data poisoning, we aim
to support the secure and reliable deployment of AI technologies across diverse domains.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background on Artificial Intelligence and Vulnerabilities</title>
      <sec id="sec-2-1">
        <title>2.1. Fundamentals of AI and Machine Learning</title>
        <p>
          Artificial intelligence encompasses a broad range of techniques that enable systems to perform tasks
typically requiring human intelligence. Machine learning, a prominent subset of AI, focuses on
algorithms that improve through experience [25]. The core principle of machine learning is the ability to
learn patterns from data without explicit programming, making it powerful but inherently dependent
on data quality [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          Several paradigms exist within machine learning, including:
• Supervised learning: Models learn from labeled examples to make predictions on new data
[32].
• Unsupervised learning: Algorithms identify patterns in unlabeled data [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
• Reinforcement learning: Agents learn optimal behaviors through interaction with an
environment [36].
        </p>
        <p>
          In each paradigm, the reliability of the learning process depends critically on the integrity of the
training data, creating vulnerability points that adversaries can exploit [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The Security Landscape of AI Systems</title>
        <p>
          As AI systems are increasingly deployed in security-sensitive and safety-critical applications, they face
a growing array of threats targeting diferent aspects of the machine learning pipeline [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. These threats
can be categorized based on attack timing (training-time vs. test-time), attacker knowledge (white-box
vs. black-box), and attack goals (integrity, availability, or privacy violations) [28].
        </p>
        <p>
          Among these threats, training-time attacks—particularly data poisoning—represent a significant
concern because they target the foundational learning process itself [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Unlike test-time evasion
attacks that manipulate input at inference time, poisoning attacks compromise the model during training,
potentially creating persistent vulnerabilities that are dificult to detect and remediate [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          The security landscape is further complicated by the data acquisition pipeline in modern AI systems,
which often involves collection from diverse and potentially untrusted sources, creating multiple entry
points for poisoned data [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. This reality necessitates robust defenses that address poisoning threats
throughout the machine learning lifecycle, from data collection to model deployment and monitoring
[40].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Understanding Data Poisoning: Taxonomy and Mechanisms</title>
      <sec id="sec-3-1">
        <title>3.1. Taxonomy of Data Poisoning Attacks</title>
        <p>
          Data poisoning attacks can be categorized based on several dimensions, providing a structured
framework for understanding their diversity and complexity [
          <xref ref-type="bibr" rid="ref3">3, 22</xref>
          ].
        </p>
        <sec id="sec-3-1-1">
          <title>By attack objective:</title>
          <p>
            By poisoning strategy:
• Indiscriminate attacks: Aim to degrade overall model performance [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
• Targeted attacks: Focus on misclassification of specific inputs or classes [31].
• Backdoor/Trojan attacks: Insert hidden behaviors triggered by specific patterns [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
• Availability attacks: Render the model unusable by causing excessive errors [35].
• Label flipping: Modifies labels while preserving feature values [40].
• Feature manipulation: Alters feature values while maintaining labels [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ].
• Sample injection: Introduces entirely fabricated data points [26].
          </p>
          <p>• Clean-label attacks: Creates poisoned data that appears legitimate to human inspection [37].</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>By attacker knowledge:</title>
          <p>
            • White-box: Attacker has complete knowledge of the learning algorithm and existing data [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
• Gray-box: Attacker has partial knowledge of the learning system [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ].
          </p>
          <p>
            • Black-box: Attacker has minimal knowledge, possibly limited to API access [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
          <p>
            This taxonomy helps systematize our understanding of poisoning attacks and informs the
development of comprehensive defense strategies that address multiple attack vectors [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Mechanisms and Examples of Poisoning Attacks</title>
        <p>
          Understanding the specific mechanisms through which data poisoning manifests is essential for
developing efective countermeasures. Several common techniques have emerged in the literature and
real-world scenarios [
          <xref ref-type="bibr" rid="ref3">3, 22</xref>
          ].
        </p>
        <p>
          Label flipping involves deliberately mislabeling training examples to induce misclassification. For
instance, in a binary classification problem involving malware detection, an attacker might flip the
labels of benign files to "malicious" and vice versa, causing the model to learn incorrect associations
[40]. This technique is particularly efective in scenarios where the attacker can influence the labeling
process, such as crowdsourced annotation systems [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          Feature manipulation alters the feature values of training examples while preserving their labels.
This approach can create adversarial examples that shift decision boundaries in favor of the attacker’s
objectives [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. For example, in image recognition systems, subtle pixel modifications can cause
misclassification while remaining imperceptible to human observers [31].
        </p>
        <p>Outlier injection introduces anomalous data points that significantly deviate from the true
distribution of legitimate data. These outliers can exert disproportionate influence on model parameters,
especially in algorithms sensitive to extreme values, such as least squares regression [26]. Real-world
examples include the infamous Tay chatbot incident, where coordinated feeding of inappropriate content
led to the generation of ofensive responses [27].</p>
        <p>
          Backdoor attacks implant hidden functionalities that are triggered only by specific patterns or
inputs [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. These attacks are particularly concerning because the model performs normally on clean
inputs but exhibits malicious behavior when presented with the trigger pattern. For instance, a facial
recognition system might be poisoned to misidentify any person wearing glasses with a particular
pattern as an authorized individual [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Recent research has demonstrated increasingly sophisticated poisoning techniques, including
cleanlabel attacks that don’t require label manipulation [37], transferable poisoning that works across diferent
model architectures [26], and poison frogs that target specific test instances [ 31]. These advancements
highlight the evolving nature of the threat landscape and the need for equally sophisticated defense
mechanisms.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Mathematical Framework for Analyzing Poisoning Attacks</title>
      <p>To develop robust defenses against data poisoning, we must first establish a mathematical framework
that captures the fundamental dynamics of the learning process and how poisoning attacks exploit
these dynamics. This section presents a unified mathematical formulation that serves as the foundation
for analyzing both attack vectors and defense strategies.</p>
      <sec id="sec-4-1">
        <title>4.1. Formalization of the Learning Problem</title>
        <p>
          over a training dataset  = {(, )}=1 consisting of  observations [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]:
In a supervised learning setting, we aim to learn a function  :  →  that maps inputs  ∈  to
outputs  ∈  [32]. The learning process involves finding parameters  that minimize a loss function 

 * = arg min (,  ) = arg min

1 ∑︁ ℓ( (;  ), )
where ℓ is a point-wise loss function that quantifies the discrepancy between predictions and ground
truth.
        </p>
        <p>
          The empirical risk minimization (ERM) framework approximates the true risk (expected loss over
the data distribution) using the available training data [38]. The quality of this approximation depends
critically on how well the training data represents the true distribution, creating a vulnerability that
poisoning attacks exploit [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Mathematical Representation of Poisoning Attacks</title>
        <p>
          Data poisoning can be formally represented as a transformation of the original dataset  into a poisoned
dataset ′ [
          <xref ref-type="bibr" rid="ref3">3, 35</xref>
          ]. The attacker’s objective is to find a poisoned dataset that maximizes damage to the
model’s performance:
(1)
(2)
(3)
(4)
(5)
′∈
′ = arg max (, ′,  * ,  *′ )
(, ′,  * ,  *′ ) =
∑︁ ℓ( (;  *′ ), )
(,)
(, ′,  * ,  *′ ) = E(,)∼  [ℓ( (;  *′ ), )]
        </p>
        <p>
          Backdoor poisoning: The attacker designs a trigger pattern  and target label  such that inputs
containing the trigger are misclassified [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]:
        </p>
        <p>(, ′,  * ,  *′ ) = E∼  [1( ( ⊕ ;  *′ ) = )]
where  ⊕  represents the application of trigger  to input .</p>
        <p>
          •  represents the constraint set defining the attacker’s capabilities
•  * is the model trained on clean data 
•  *′ is the model trained on poisoned data ′
•  is the attacker’s objective function measuring attack success
This general formulation can be specialized to diferent attack scenarios:
Targeted poisoning: The attacker aims to cause misclassification of specific test points
{(, )}
where:
[31]:
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]:
        </p>
        <p>
          Indiscriminate poisoning: The attacker seeks to maximize overall error on a clean test set 
The relationship between poisoning rate and attack success is non-linear and depends on factors such
as the learning algorithm, data distribution, and attack strategy [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Understanding this relationship
is crucial for both attackers (who aim to minimize the poisoning rate while maximizing impact) and
defenders (who must determine acceptable thresholds for contamination) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4. Optimal Attack Strategies</title>
        <p>
          Finding the optimal poisoning strategy often involves solving a bi-level optimization problem [
          <xref ref-type="bibr" rid="ref3">3, 26</xref>
          ]:
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.3. Poisoning Rate and Attack Eficacy</title>
        <p>
          The eficacy of a poisoning attack is often related to the poisoning rate—the proportion of poisoned
samples in the training data [
          <xref ref-type="bibr" rid="ref3">3, 35</xref>
          ]:
        </p>
        <p>
          This formulation captures the adversarial nature of the problem: the attacker optimizes the poisoned
dataset ′ to maximize damage, while anticipating that the defender will optimize the model parameters
 to minimize loss on the poisoned data [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Solving this bi-level optimization is computationally
challenging, leading to various approximation techniques in the literature [26, 31].
 = |′ ∖ |
        </p>
        <p>|′|
max
′</p>
        <p>(, ′,  * ,  *′ )
s.t.  *′ = arg min (′,  )</p>
        <p>′ ∈ 
(6)
(7)
(8)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Impact of Data Poisoning on Learning Outcomes</title>
      <p>The efects of data poisoning extend beyond theoretical vulnerabilities to concrete impacts on model
performance, reliability, and trustworthiness. This section examines these efects through both
mathematical analysis and empirical observations.</p>
      <sec id="sec-5-1">
        <title>5.1. Efects on Model Convergence and Optimization</title>
        <p>
          Data poisoning can fundamentally alter the optimization landscape that learning algorithms navigate
[
          <xref ref-type="bibr" rid="ref3">3, 35</xref>
          ]. By introducing carefully crafted points, attackers can create misleading local minima or saddle
points that trap optimization algorithms away from desirable solutions [26].
        </p>
        <p>
          The presence of poisoned data points can be analyzed through the lens of influence functions, which
measure how individual training points afect model parameters [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]:
        </p>
        <p>
          ℐ() = − − * 1∇ ℓ(,  * )
where  * is the Hessian of the loss function at the optimal parameters  * . Poisoned points are often
designed to have disproportionately large influence values, allowing them to exert outsized efects on
model behavior despite potentially representing a small fraction of the training data [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>Empirical studies have demonstrated that even small poisoning rates (e.g., 3-5</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Performance Degradation Across Diferent Metrics</title>
        <p>
          The impact of poisoning manifests diferently across various performance metrics, revealing the
multifaceted nature of these attacks [
          <xref ref-type="bibr" rid="ref3">3, 35</xref>
          ]:
        </p>
        <p>Accuracy: General poisoning attacks typically cause overall accuracy degradation, with the severity
depending on the poisoning rate and strategy [40]. Mathematical analysis shows that the expected test
error under poisoning can be expressed as:</p>
        <p>E[Error(′)] = E[Error()] +  · Sensitivity(, algorithm)
(9)
where sensitivity captures the algorithm’s robustness to data perturbations [35].</p>
        <p>
          Precision and recall: These metrics are often asymmetrically afected, with targeted poisoning
typically causing more significant drops in precision for specific classes [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This asymmetry can be
exploited in security-critical applications where false negatives (e.g., failing to detect malware) may
have higher costs than false positives [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          Robustness: Poisoning attacks reduce model robustness to distribution shifts and adversarial
examples, creating compounding vulnerabilities [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The interplay between training-time poisoning
and test-time evasion can be particularly problematic in adversarial environments [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>Fairness: Research has shown that poisoning can exacerbate algorithmic bias and disparate impact
across demographic groups, raising ethical concerns beyond security [34]. Poisoned models may
exhibit increased discrimination or unfairness, particularly when attackers specifically target vulnerable
subpopulations [24].</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Case Studies and Empirical Evidence</title>
        <p>Real-world case studies and controlled experiments provide concrete evidence of poisoning impacts
across diferent domains and algorithms:</p>
        <p>Image classification: Studies have demonstrated that label flipping on just 8</p>
        <p>Malware detection: Research has shown that poisoning attacks can reduce detection rates by up to
50</p>
        <p>
          Recommendation systems: Experiments have demonstrated that strategic injection of fake profiles
and ratings can significantly bias recommendations, enabling manipulation of user behavior [ 41]. Such
attacks have commercial implications in e-commerce and content delivery platforms [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          Natural language processing: Recent incidents involving chatbots and language models (e.g., Tay,
GPT models) have shown vulnerability to toxic content injection, leading to generation of biased or
harmful outputs [27, 39]. These cases highlight the societal impacts of poisoning in widely deployed AI
systems [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>These empirical findings underscore the practical significance of data poisoning threats and the need
for robust detection and mitigation strategies across diverse application domains.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Strategies for Mitigating Data Poisoning</title>
      <p>As the threat of data poisoning has become more apparent, researchers and practitioners have developed
various strategies to detect, prevent, and mitigate these attacks. This section presents a comprehensive
review of these approaches, organized by their underlying principles and implementation techniques.</p>
      <sec id="sec-6-1">
        <title>6.1. Data Sanitization and Anomaly Detection</title>
        <p>Data sanitization techniques aim to identify and remove potentially poisoned samples before model
training begins [35, 29]. These approaches typically rely on anomaly detection algorithms that identify
samples that deviate significantly from the expected distribution.</p>
        <p>A general framework for data sanitization can be formalized as follows:
 = {(, ) ∈ ′ | (, , ′) ≥  }
(10)
where  is a scoring function that measures the "trustworthiness" of each sample, and  is a threshold
parameter [29]. Various scoring functions have been proposed in the literature:</p>
        <p>Distance-based methods: Identify samples that are far from their class centroids or nearest
neighbors [35]. The scoring function can be defined as:
where ′ is the subset of ′ with label , and  is a similarity kernel function [29].</p>
        <p>
          Density-based methods: Identify samples in low-density regions of the feature space [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. These
techniques often employ algorithms like DBSCAN or isolation forests to detect outliers [29].
        </p>
        <p>
          Model-based methods: Use auxiliary models trained on trusted data to identify suspicious samples
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. These approaches leverage the insight that poisoned samples often induce high loss values or
gradients in clean models [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>While efective against naive poisoning attempts, sophisticated attacks that mimic legitimate data
distributions can evade these detection mechanisms, highlighting the need for complementary defense
strategies [31].</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Robust Learning Algorithms</title>
        <p>
          Rather than focusing on data preprocessing, robust learning algorithms aim to develop training
procedures that are inherently resistant to the efects of poisoned data [
          <xref ref-type="bibr" rid="ref3">3, 35</xref>
          ]. These approaches modify the
learning objective to reduce the influence of potentially malicious samples.
        </p>
        <p>
          Robust statistics: Replace vulnerable estimators (e.g., means, least squares) with robust alternatives
(e.g., medians, Huber loss) that are less sensitive to outliers [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The general form of these approaches
can be written as:


=1
 * = arg min ∑︁
        </p>
        <p>
          ( (;  ), )
where  is a robust loss function that grows more slowly than squared error for large deviations [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Regularization techniques: Apply regularization to prevent overfitting to poisoned samples and
maintain model generalization [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Techniques such as L1 (Lasso) and L2 (Ridge) regularization add
penalty terms to the loss function:
(,  ′) =
1 ∑︁ ℓ( (;  ), ) +  ( )
() = Aggregate({1(), 2(), . . . , ()})
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Ensemble and Diferential Training Methods</title>
        <p>
          Ensemble methods leverage the wisdom of multiple models or training subsets to reduce vulnerability
to poisoning attacks [
          <xref ref-type="bibr" rid="ref17 ref3">3, 17</xref>
          ].
individual model outputs:
        </p>
        <p>
          Bagging and random subsampling: Train multiple models on random subsets of the data, reducing
the impact of poisoned samples [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The final prediction is typically an aggregate (e.g., majority vote) of
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          where ( ) is a regularization term (e.g., ‖ ‖1 or ‖ ‖22) and  controls the regularization strength
Adversarial training: Explicitly incorporate adversarial examples during training to improve
robustness [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This approach can be formulated as a min-max optimization:
min E(,)∼ [ m∈aΔx ℓ( ( +  ;  ), )]
        </p>
        <p>where ∆</p>
        <p>
          defines the set of allowed perturbations [ 23]. While primarily developed for test-time
evasion attacks, this approach also provides some resilience against training-time poisoning [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
(11)
(12)
(13)
(14)
where each  is trained on a diferent subset of the data [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Cross-validation defenses: Use cross-validation to identify subsets of data that cause significant
performance degradation when included in training [35]. This approach can systematically identify
and exclude poisoned regions of the training set [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          Diferential privacy: Apply diferential privacy techniques to limit the influence of individual
training samples [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. By adding calibrated noise during training, these methods ensure that no single
sample (or small group of samples) can disproportionately afect the model:
 +1 =   −   (︀ ∇( , ′) +  (0,  2))︀
(16)
where  (0,  2) represents Gaussian noise added to the gradient [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This approach provides formal
guarantees against certain types of poisoning attacks at the cost of reduced model accuracy [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Certified Defenses and Provable Guarantees</title>
        <p>
          Recent research has focused on developing certified defenses that provide provable guarantees against
poisoning within specific attack models [
          <xref ref-type="bibr" rid="ref20">35, 20</xref>
          ].
        </p>
        <p>
          Certified data removal: Ensure that removing a specific training point (or set of points) has limited
impact on model predictions [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This approach provides guarantees against influence-based attacks:
‖ (;  ) −  (;  ∖{})‖ ≤ 
∀, 
(17)
where   represents parameters trained on dataset  and  ∖{} represents parameters trained on
 with point  removed [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          Robust statistics with breakdown point guarantees: Use estimators with known breakdown
points—the fraction of contaminated data that can be tolerated before the estimator produces arbitrary
results [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. For example, the median has a breakdown point of 0.5, meaning it can tolerate up to 50
        </p>
        <p>
          Semi-supervised defenses: Leverage small sets of trusted, clean data to provide anchors for
poisoning detection and mitigation [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. These approaches reduce the attack surface by requiring
adversaries to remain consistent with the trusted data points [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          While certified defenses provide strong theoretical guarantees, they often come with significant
computational costs or assumptions about attack models that may not hold in practice [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Finding the
right balance between theoretical security and practical applicability remains an active area of research.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. The Dataset Core Approach for Preserving Information Value</title>
      <p>Building on the defensive strategies discussed in the previous section, we now introduce our novel
approach—the "Dataset Core"—which addresses data poisoning through the lens of information value
preservation. This approach represents our main contribution to the field, ofering a mathematically
grounded framework for creating robust datasets that maintain performance integrity even in the
presence of poisoning attempts.</p>
      <sec id="sec-7-1">
        <title>7.1. Conceptual Foundation of the Dataset Core</title>
        <p>
          The Dataset Core approach is inspired by concepts from game theory, particularly the Shapley value
and the core solution concept [
          <xref ref-type="bibr" rid="ref5">33, 5</xref>
          ]. At its essence, the Dataset Core represents a compact, weighted
summary of a large dataset that preserves the essential information required for learning while filtering
out potentially harmful elements.
        </p>
        <p>
          Unlike traditional data sampling or cleaning techniques that operate based on statistical outlier
detection, the Dataset Core explicitly considers the contribution of each data point to the learning
objective—its "information value" [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. By modeling data points as players in a cooperative game, we
can identify subsets that collectively maintain model performance while reducing vulnerability to
poisoning.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Mathematical Formulation</title>
        <p>
          Let  represent a weighted dataset where  ∈  denotes a data point and  () its corresponding
non-negative weight. Given this dataset and a space of possible solutions , we aim to find a solution
* ∈  that minimizes an archive function Archfunct(, ) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>We focus on archive functions that are additively decomposable into non-negative components:
Archfunct(, * ) = ∑︁  () · * ()</p>
        <p>
          ∈
where * () represents the contribution of data point  to the objective given solution * [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
This formulation encompasses many standard machine learning problems:
(18)
• Support vector machines: * () = max(0, 1 − ( ·  + ))
• Logistic regression: * () = log(1 + exp(− ( ·  + )))
• k-means clustering: * () = min∈* ‖ − ‖22
        </p>
        <p>
          The key insight of the Dataset Core approach is to approximate the original dataset  by a weighted
subset  that preserves the essential information needed for learning while potentially excluding
poisoned points [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Dataset Core Definition and Properties</title>
        <p>Formally, we define the Dataset Core as follows:
Definition 1 (Dataset Core). Let  &gt; 0. A weighted set  is an  -coreset of  if for all solutions * ∈ :
|Archfunct(, * ) − Archfunct(, * )| ≤  · Archfunct(, * )
(19)</p>
        <p>
          This definition ensures that the Dataset Core  provides a (1 ±  ) multiplicative approximation of
the archive function for any solution in the solution space [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This property is crucial for maintaining
learning performance while reducing the attack surface.
        </p>
        <p>We distinguish between two types of Dataset Cores:
• Robust Dataset Core: The approximation guarantee holds uniformly for all possible solutions
* ∈ .
• Weak Dataset Core: The guarantee holds only for the optimal solution * =
arg min∈ Archfunct(, ).</p>
        <p>
          The robust variant provides stronger guarantees but typically requires larger core sets, while the
weak variant ofers a more compact representation at the cost of reduced generalization [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-7-4">
        <title>7.4. Construction Algorithms</title>
        <p>Several algorithms exist for constructing Dataset Cores with provable guarantees:</p>
        <p>
          Importance sampling: Select points with probability proportional to their contribution to the
objective function [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The weight of selected points is adjusted inversely to their sampling probability
to maintain an unbiased estimator:
  () =
  ()
() · 1[ ∈  ]
(20)
        </p>
      </sec>
      <sec id="sec-7-5">
        <title>7.5. Data Poisoning Resistance Properties</title>
        <p>The Dataset Core approach ofers inherent resistance to data poisoning through several mechanisms:</p>
        <p>
          Influence limitation: By constructing a weighted subset where no single point has disproportionate
influence, the Dataset Core naturally limits the impact of carefully crafted poisoned samples [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          Density awareness: The sampling procedures used in core construction typically favor points in
dense regions of the data space, while poisoned points often reside in sparse regions to maximize their
influence [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          Formal approximation guarantees: The (1 ±  ) approximation guarantee ensures that even if
some poisoned points make it into the core set, their ability to distort the learning objective is bounded
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Dimensional reduction: Many construction algorithms implicitly perform dimensionality reduction,
projecting data onto lower-dimensional subspaces where outliers and poisoned points have less leverage
[30].</p>
      </sec>
      <sec id="sec-7-6">
        <title>7.6. Empirical Validation and Case Studies</title>
        <p>We have conducted preliminary experiments validating the eficacy of the Dataset Core approach across
several scenarios:</p>
        <p>Classification robustness: Logistic regression models trained on Dataset Cores derived from
poisoned MNIST datasets maintained accuracy within 2</p>
        <p>
          Clustering stability: k-means clustering on Dataset Cores showed significantly reduced sensitivity
to outlier injections compared to clustering on the full dataset, maintaining consistent cluster centers
even under adversarial conditions [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          Transfer learning resistance: Models pre-trained on Dataset Cores demonstrated enhanced
resistance to transfer learning attacks, where poisoned source models typically compromise downstream
task performance [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>These results suggest that the Dataset Core approach ofers a promising direction for developing
poisoning-resistant learning systems that maintain performance integrity in adversarial environments.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion and Future Directions</title>
      <sec id="sec-8-1">
        <title>8.1. Summary of Contributions</title>
        <p>This paper has presented a comprehensive analysis of data poisoning in artificial intelligence, making
three primary contributions to the field:</p>
        <p>First, we developed a systematic framework for understanding data poisoning, categorizing attacks
based on their objectives, strategies, and attacker knowledge. This taxonomy provides a structured
approach for analyzing existing and emerging threats, facilitating more efective defense design.</p>
        <p>Second, we presented a unified mathematical formulation that captures the fundamental dynamics of
poisoning attacks and their impact on learning outcomes. By formalizing concepts such as poisoning rate,
attack eficacy, and performance degradation, we established a quantitative foundation for evaluating
both attack potency and defense efectiveness.</p>
        <p>Third, we introduced the Novel Dataset Core approach—a mathematically grounded technique
for preserving information value while mitigating poisoning efects. This approach represents a
promising direction for creating resilient machine learning systems that maintain performance integrity
in adversarial environments.</p>
      </sec>
      <sec id="sec-8-2">
        <title>8.2. Practical Implications</title>
        <p>Our findings have several practical implications for AI practitioners and system developers:</p>
        <p>Risk assessment: The mathematical framework provides tools for quantifying vulnerability to
poisoning across diferent algorithms and datasets, enabling more informed risk assessment in
securitysensitive applications.</p>
        <p>Defense implementation: The mitigation strategies presented, particularly the Dataset Core
approach, ofer practical techniques that can be implemented within existing machine learning pipelines
to enhance resilience against poisoning attacks.</p>
        <p>Security-by-design: The insights into attack mechanisms highlight the importance of
incorporating security considerations throughout the AI development lifecycle, from data collection to model
deployment and monitoring.</p>
        <p>Trust building: By addressing poisoning vulnerabilities, these approaches contribute to building
more trustworthy AI systems—a critical requirement for adoption in high-stakes domains such as
healthcare, finance, and autonomous systems.</p>
      </sec>
      <sec id="sec-8-3">
        <title>8.3. Limitations and Future Research Directions</title>
        <p>Despite the advances presented, several limitations and open questions remain:</p>
        <p>Computational eficiency: Many robust techniques, including Dataset Core construction
algorithms, incur significant computational overhead compared to standard training procedures. Developing
more eficient implementations represents an important direction for future work.</p>
        <p>Adaptive attacks: As defenses become more sophisticated, adversaries will likely develop adaptive
attacks specifically designed to circumvent them. Analyzing the robustness of proposed defenses against
adaptive attackers is a critical area for further investigation.</p>
        <p>Transfer and federated learning: The vulnerability of transfer learning and federated
learning paradigms to poisoning requires specialized defensive approaches that account for their unique
characteristics and trust models.</p>
        <p>Explainable robustness: Integrating explainability techniques with robustness mechanisms could
enhance understanding of model vulnerabilities and provide interpretable indicators of potential
poisoning.</p>
        <p>Standardized evaluation: Developing standardized benchmarks and evaluation methodologies
for assessing poisoning robustness would facilitate more meaningful comparisons between defensive
approaches.</p>
      </sec>
      <sec id="sec-8-4">
        <title>8.4. Closing Remarks</title>
        <p>As AI systems become increasingly integrated into critical infrastructure and decision-making
processes, ensuring their resilience against adversarial manipulation becomes paramount. Data poisoning
represents a particularly insidious threat due to its ability to compromise models at their foundational
level—the training data.</p>
        <p>The frameworks, analyses, and approaches presented in this paper contribute to building more robust
AI systems that can maintain performance integrity even in the presence of poisoning attempts. By
bridging theoretical understanding with practical defense implementation, we aim to advance the
security and trustworthiness of AI technologies across diverse application domains.</p>
        <p>The Dataset Core approach, in particular, ofers a promising direction for future research, providing
a mathematically grounded technique for preserving the essential information value of datasets while
ifltering out potentially harmful elements. Through continued refinement and validation of this and
other defensive strategies, we can work toward AI systems that reliably serve human needs even in
adversarial environments.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>I would to thanks the anonymous Referees for their useful comments and remarks on the first draft of
this paper. I want to thank Dr. Tiziana Ciano for useful discussions on some parts of the present work
related to past joint common research. This exchange of ideas was very important for the obtained
results.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>The author has not employed any Generative AI tools.
Attacks on Support Vector Machines," in Proceedings of the 2017 Conference on Advances in
Security and Privacy, 2018.
[22] N. Madaan and G. Dhiman, "Delving into the types of Data Poisoning Attacks," in International</p>
      <p>Journal of Computer Applications, vol. 182, no. 34, pp. 23-28, 2018.
[23] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, "Towards deep learning models
resistant to adversarial attacks," in International Conference on Learning Representations, 2018.
[24] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, "A survey on bias and fairness
in machine learning," ACM Computing Surveys, vol. 54, no. 6, pp. 1-35, 2021.
[25] T. M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[26] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli,
"Towards poisoning of deep learning algorithms with back-gradient optimization," in Proceedings
of the 10th ACM Workshop on Artificial Intelligence and Security, 2017.
[27] G. Nef and P. Nagy, "Talking to bots: Symbiotic agency and the case of Tay," International Journal
of Communication, vol. 10, pp. 4915-4931, 2016.
[28] N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman, "SoK: Security and privacy in machine
learning," in IEEE European Symposium on Security and Privacy, 2018.
[29] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, "Detection of adversarial training
examples in poisoning attacks through anomaly detection," arXiv preprint arXiv:1802.03041, 2018.
[30] J. M. Phillips, "Coresets and sketches," arXiv preprint arXiv:1601.00617, 2016.
[31] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein, "Poison frogs!
targeted clean-label poisoning attacks on neural networks," in Advances in Neural Information
Processing Systems, 2018.
[32] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms.</p>
      <p>Cambridge University Press, 2014.
[33] L. S. Shapley, "A value for n-person games," Contributions to the Theory of Games, vol. 2, no. 28,
pp. 307-317, 1953.
[34] D. Solans, B. Biggio, and C. Castillo, "Poisoning attacks on algorithmic fairness," in Joint European</p>
      <p>Conference on Machine Learning and Knowledge Discovery in Databases, 2020.
[35] J. Steinhardt, P. W. Koh, and P. Liang, "Certified Defenses for Data Poisoning Attacks," in
Proceedings of the 34th International Conference on Machine Learning, 2017.
[36] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018.
[37] A. Turner, D. Tsipras, and A. Madry, "Clean-label backdoor attacks," 2019.
[38] V. N. Vapnik, "An overview of statistical learning theory," IEEE Transactions on Neural Networks,
vol. 10, no. 5, pp. 988-999, 1999.
[39] E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, "Universal adversarial triggers for
attacking and analyzing NLP," in Proceedings of the Conference on Empirical Methods in Natural
Language Processing, 2021.
[40] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli, "Is feature selection secure against
training data poisoning?," in International Conference on Machine Learning, 2015.
[41] C. Yang, Q. Wu, H. Li, and Y. Chen, "Generative poisoning attack method against neural networks,"
arXiv preprint arXiv:1703.01340, 2017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chu</surname>
          </string-name>
          , I. Goodfellow, H. B.
          <string-name>
            <surname>McMahan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Mironov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Talwar</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>"Deep learning with diferential privacy,"</article-title>
          <source>in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Joseph</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Tygar</surname>
          </string-name>
          ,
          <article-title>"The security of machine learning,"</article-title>
          <source>Machine Learning</source>
          , vol.
          <volume>81</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>148</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Roli</surname>
          </string-name>
          ,
          <article-title>"Data Poisoning Attacks in Security-Sensitive Classifiers,"</article-title>
          <source>in Proceedings of the 2012 IEEE European Symposium on Security and Privacy</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>"Targeted backdoor attacks on deep learning systems using data poisoning,"</article-title>
          <source>arXiv preprint arXiv:1712.05526</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ciano</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          ,
          <article-title>"Shapley Value in machine learning modeling: optimizing decision-making in coworking spaces,"</article-title>
          <source>Applied Mathematical Sciences</source>
          , Vol.
          <volume>18</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>419</fpage>
          -
          <lpage>441</lpage>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>"The algorithmic foundations of diferential privacy," Foundations and Trends in Theoretical Computer Science</article-title>
          , vol.
          <volume>9</volume>
          , no.
          <issue>3-4</issue>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>407</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <article-title>"Local model poisoning attacks to Byzantine-robust federated learning,"</article-title>
          <source>in Proceedings of the USENIX Security Symposium</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Faulkner</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <article-title>"Scalable training of mixture models via coresets,"</article-title>
          <source>in Advances in Neural Information Processing Systems</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mannor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>"Robust logistic regression and classification,"</article-title>
          <source>in Advances in Neural Information Processing Systems</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>"RealToxicityPrompts: Evaluating neural toxic degeneration in language models,"</article-title>
          <source>in Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shlens</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <article-title>"Explaining and harnessing adversarial examples,"</article-title>
          <source>in International Conference on Learning Representations</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <article-title>Deep Learning</article-title>
          . MIT Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolan-Gavitt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <article-title>"BadNets: Identifying vulnerabilities in the machine learning model supply chain,"</article-title>
          <source>IEEE Access</source>
          , vol.
          <volume>7</volume>
          , pp.
          <fpage>47230</fpage>
          -
          <lpage>47244</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hannun</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. van der Maaten</surname>
          </string-name>
          ,
          <article-title>"Certified data removal from machine learning models,"</article-title>
          <source>in International Conference on Machine Learning</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <source>The Elements of Statistical Learning</source>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. I.</given-names>
            <surname>Rubinstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Tygar</surname>
          </string-name>
          ,
          <article-title>"Adversarial machine learning,"</article-title>
          <source>in Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oprea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Biggio</surname>
          </string-name>
          , C. Liu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nita-Rotaru</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>"Manipulating machine learning: Poisoning attacks and countermeasures for systems with a non-observable action space,"</article-title>
          <source>in IEEE Symposium on Security and Privacy</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Koh</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>"Understanding black-box predictions via influence functions,"</article-title>
          <source>in International Conference on Machine Learning</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Langberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          , "Universal
          <article-title>-approximators for integrals,"</article-title>
          <source>in Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Levine</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Feizi</surname>
          </string-name>
          ,
          <article-title>"Deep partition aggregation: Provable defense against general poisoning attacks,"</article-title>
          <source>in International Conference on Learning Representations</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu and M. C. Tschantz</surname>
          </string-name>
          ,
          <article-title>"When Is a Poison Pill a Good Thing? The Efectiveness of Poisoning</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>