Epistemic Defenses against Scientific and Empirical Adversarial AI Attacks∗

                                        Nadisha-Marie Aliman1 and Leon Kester2†
                                        1
                                          Utrecht University, Utrecht, The Netherlands
                                      2
                                        TNO Netherlands, The Hague, The Netherlands
                                                      leon.kester@tno.nl


                            Abstract                                 et al., 2018; Jobin et al., 2019; ÓhÉigeartaigh et al., 2020;
                                                                     Raji et al., 2020] gained momentum at an international
        In this paper, we introduce “scientific and empir-           level. In addition, cybersecurity-oriented frameworks in AI
        ical adversarial AI attacks” (SEA AI attacks) as             safety [Aliman et al., 2021; Brundage et al., 2018; Pistono
        umbrella term for not yet prevalent but technically          and Yampolskiy, 2016] stressed the necessity to not only ad-
        feasible deliberate malicious acts of specifically           dress unintentional errors, unforeseen repercussions and bugs
        crafting AI-generated samples to achieve an epis-            in the context of ethical AI design but also AI risks linked to
        temic distortion in (applied) science or engineer-           intentional malice i.e. deliberate unethical design, attacks and
        ing contexts. In view of possible socio-psycho-              sabotage by malicious actors. In parallel, the convergence of
        technological impacts, it seems responsible to pon-          AI with other technologies increases and diversifies the attack
        der countermeasures from the onset on and not in             surface available to malevolent actors. For instance, while
        hindsight. In this vein, we consider two illustrative        AI-enhanced cybersecurity opens up novel valuable possi-
        use cases: the example of AI-produced data to mis-           bilities for defenders [Zeadally et al., 2020], AI simultane-
        lead security engineering practices and the conceiv-         ously provides new affordances for attackers [Ashkenazy and
        able prospect of AI-generated contents to manipu-            Zini, 2019] from AI-aided social engineering [Seymour and
        late scientific writing processes. Firstly, we con-          Tully, 2016] to AI-concealed malware [Kirat et al., 2018].
        textualize the epistemic challenges that such future         Next to the capacity of AI to extend classical cyberattacks
        SEA AI attacks could pose to society in the light of         in scope, speed and scale [Kaloudi and Li, 2020], a no-
        broader i.a. AI safety, AI ethics and cybersecurity-         table emerging threat is what we denote AI-aided epistemic
        relevant efforts. Secondly, we set forth a corre-            distortion. The latter represents a form of AI weaponiza-
        sponding supportive generic epistemic defense ap-            tion and is increasingly studied in its currently most salient
        proach. Thirdly, we effect a threat modelling for            form, namely AI-aided disinformation [Aliman et al., 2021;
        the two use cases and propose tailor-made defenses           Chesney and Citron, 2019; Kaloudi and Li, 2020; Tully and
        based on the foregoing generic deliberations. Strik-         Foster, 2020] which is especially relevant to information war-
        ingly, our transdisciplinary analysis suggests that          fare [Hartmann and Giles, 2020]. Recently, the weaponiza-
        employing distinct explanation-anchored, trust-              tion of Generative AI for information operations has been de-
        disentangled and adversarial strategies is one pos-          scribed as “a sincere threat to democracies” [Hartmann and
        sible principled complementary epistemic defense             Steup, 2020]. In this paper, we analyze attacks and defenses
        against SEA AI attacks – albeit with caveats yield-          pertaining to another not yet prevalent but technically feasible
        ing incentives for future work.                              and similarly concerning form of AI-aided epistemic distor-
                                                                     tion with potentially profound societal implications: scientific
                                                                     and empirical adversarial AI attacks (SEA AI attacks).
1       Introduction
                                                                        With SEA AI attacks, we refer to any deliberately mali-
Progress in the AI field unfolds a wide growing array of bene-
                                                                     cious AI-aided epistemic distortion which predominantly and
ficial societal effects with AI permeating more and more cru-
                                                                     directly targets (applied) science and technology assets (as
cial application domains. To forestall ethically-relevant rami-
                                                                     opposed to information operations where a wider societal
fications, research from a variety of disciplines tackling perti-
                                                                     target is often selected on ideological/political grounds). In
nent AI safety [Amodei et al., 2016; Bostrom, 2017; Burden
                                                                     short, the expression acts as an umbrella term for malicious
and Hernández-Orallo, 2020; Fickinger et al., 2020; Leike
                                                                     actors utilizing or attacking AI at pre- or post-deployment
et al., 2017], AI ethics and AI governance issues [Floridi
                                                                     stages with the deliberate adversarial aim to deceive, sabo-
    ∗
     Copyright © 2021 for this paper by its authors. Use permitted   tage, slow down or disrupt (applied) science, engineering or
under Creative Commons License Attribution 4.0 International (CC     related endeavors. Obviously, SEA AI attacks could be per-
BY 4.0).                                                             formed in a variety of modalities (see e.g. “deepfake geog-
   †
     Contact Author                                                  raphy” [Zhao et al., 2021] related to vision). However, for
illustrative purposes, we base our two exemplary use cases          matized and even lately applied to “fake news” in the health-
on misuses of language models. The first use case treats SEA        care domain [Baris and Boukhers, 2021]. Furthermore, in the
AI attacks on security engineering via schemes in which a           context of counteracting risks posed by the deployment of
malicious actor poisons training data resources [Mahlangu et        sophisticated online bots, it has been suggested that “techni-
al., 2019] that are vital to data-driven defenses in the cyberse-   cal solutions, while important, should be complemented with
curity ecosystem. Lately, a proof-of-concept for an AI-based        efforts involving informed policy and international norms to
data poisoning attack has been implemented in the context           accompany these technological developments” and that “it
of cyber threat intelligence (CTI) [Ranade et al., 2021]. The       is essential to foster increased civic literacy of the nature
authors utilized a fine-tuned version of the GPT-2 language         of ones interactions” [Boneh et al., 2019]. Another analy-
model [Radford et al., 2019] and were able to generate fake         sis presented a set of defense measures against the spread of
CTI which was indistinguishable from its legitimate coun-           deepfakes [Chesney and Citron, 2019] which contained i.a.
terpart when presented to cybersecurity experts. The sec-           legal solutions, administrative agency solutions, coercive and
ond use case studies conceivable SEA AI attacks on proce-           covert responses as well as sanctions (when effectuated by
dures that are essential to scientific writing. Related exam-       state actors) and speech policies for online platforms. Con-
ples that have been depicted in recent work encompass plagia-       cerning “fake science news” and their impacts on “credibility
rism studies with transformers like BERT [Wahle et al., 2021]       and reputation of the science community” [Ho et al., 2020],
and with the pre-trained GPT-3 language model [Brown et             it has been even postulated by Makri that “science is losing
al., 2020] that “may very well pass peer review” [Dehouche,         its relevance as a source of truth” and “the new focus on
2021] but also AI-generated fake reviews (with a fine-tuned         post-truth shows there is now a tangible danger that must be
version of GPT-2) apt to mislead experienced researchers in         addressed” [Makri, 2017]. Following the author, scientists
a small user study [Tallón-Ballesteros, 2020]. Future mali-        could equip citizens with sense-making tools without which
cious actors could deliberately breed a large-scale agenda in       “emotions and beliefs that pander to false certainties become
the spirit of “fake science news” [Ho et al., 2020] and AI-         more credible” [Makri, 2017].
generated papers that would widely exceed in quality (later            While some of those socio-psycho-technological counter-
withdrawn) computer-generated research papers [Van Noor-            measures and underlying assumptions are debatable, we com-
den, 2014] published at respected venues. In short, techni-         plementarily zoom in different epistemic defenses against
cally already practicable SEA AI attacks could have consid-         SEA AI attacks being directed against scientific and empirical
erable negative effects if jointly potentiated with regard to       frameworks. Amidst an information ecosystem with quasi-
scale, scope and speed by malicious actors equipped with            omnipresent terms such as “post-truth” or “fake news” and
sufficient resources. As later exemplified in Subsection 3.1,       in light of data-driven research trends embedded within trust-
the security engineering use case could e.g. involve dynamic        based infrastructures, it seems daunting to face a threat land-
domino-effects leading to large financial losses and even risks     scape populated by AI-generated artefacts such as: 1) “fake
to human lives while the scientific writing use case seems to       data” and “fake experiments”, 2) “fake research papers” (or
moreover reveal a domain-general epistemic problem. The             “fraudulent academic essay writing” [Brown et al., 2020])
mere existence of the latter also affects the former and could      and 3) “fake reviews”. More broadly, it has been stated that
engender serious pitfalls whose generically formulated prin-        deepfakes “seem to undermine our confidence in the original,
cipled management is compactly treated in the next Section 2.       genuine, authentic nature of what we see and hear” [Floridi,
                                                                    2018]. Taking the perspective of an empiricism-based episte-
2   Theoretical Generic Epistemic Defenses                          mology grounded in justification with the aim to obtain truer
                                                                    beliefs via (probabilistic) belief updates given evidence, a re-
As reflected in the law of requisite variety (LRV) known            cent in-depth analysis found that the existence of deepfake
from cybernetics, “only variety can destroy variety” [Ashby,        videos confronts society with epistemic threats [Fallis, 2020].
1961]. Applied to SEA AI attacks, it signifies that since           Thereby, it is assumed that “deepfakes reduce the amount
malicious adversaries are not only exploiting vulnerabilities       of information that videos carry to viewers” [Fallis, 2020]
from a heterogeneous socio-psycho-technological landscape           which analogously quantitatively affected the amount of in-
but also specially vulnerabilities of epistemic nature, suitable    formation in text-based news due to earlier “fake news” phe-
defense methods may profit from an epistemic stance. Ap-            nomena. In our view, when applying this stance to audiovi-
plying the cybernetic LRV offers a valuable domain-general          sual and textual samples of scientific material but also broadly
transdisciplinary tool able to stimulate and invigorate novel       to the context of security engineering and scientific communi-
tailored defenses in a diversity of harm-related problems           cation where the deployment of deepfakes for SEA AI attacks
from cybersecurity [Vinnakota, 2013] to AI safety [Aliman,          could occur in multifarious ways, the consequences seem dis-
2020a] over AI ethics [Ashby, 2020]. In short, utilizing in-        astrous. In brief, SEA AI defenses seem relevant to AI safety
sights from epistemology as complementary basis to frame            since an inability to build up resiliency against those attacks
defense methods against SEA AI attacks seems indispens-             may suggest that already present-day AI could (be used to)
able. Past work predominantly analyzed countermeasures of           outmaneuver humans on a large scale – without any “su-
socio-psycho-technological nature to combat the spread of           perintelligent” competency. However, empiricist epistemol-
(audio-)visual, audio and textual deepfakes as well as “fake        ogy is not without any alternative. In the following, we thus
news” more broadly. For instance, the technical detection of        first mentally enact one alternative epistemic stance (without
AI-generated content [Wahle et al., 2021] has been often the-       claiming that it represents the only possible alternative). We
present its key generic epistemic suppositions serving as a            sider mistakenly that T has been made problematic. How-
basis for the next Section 3 where we tailor defenses against          ever, since it is not permissible to drop T in the absence of a
SEA AI attacks for the specific use cases.                             rival theory T 0 representing a better explanation than T , the
    Firstly, it has been lately propounded that the societal per-      adverarial capabilities of the SEA AI attacker are limited. In
ception of a “post-truth” era is often linked to the implicit          short, theories cannot be deleted from the collective knowl-
assumption that truth can be equated with consensus which              edge via such SEA AI attacks without more ado. Secondly,
is why it seems recommendable to consider a deflationary               when contemplating the case of AI-generated “fake research
account of truth [Bufacchi, 2021] – i.e. where the concept             papers”, it seems that they could slow down but not disrupt
is for instance strictly reserved to scientifically-relevant epis-     scientific methodology. Overall, one could state that the dan-
temic contexts. On such a deflationary account of truth disen-         ger lies in the uptake of deceptive theories. However, the-
tangled from consensus, it has been argued that even if con-           ories are only integrated in explanatory-anchored science if
sensus and trust seem eroded, we neither inhabit a post-truth          they represent better explanations in comparison to alterna-
nor a science-threatening post-falsification age [Aliman and           tives or in the absence of alternatives if they explain novel
Kester, 2020]. Secondly, we never had a direct access to phys-         phenomena. In a nutshell, it takes explanations that are si-
ical reality which we could have suddenly lost with the advent         multaneously misguiding and better for such a SEA AI attack
of “fake news”. In fact, as stated by Karl Popper: “Once we            to succeed. This is a high bar for imitative language mod-
realize that human knowledge is fallible, we realize also that         els if meant to be repeatedly and systematically performed2
we can never be completely certain that we have not made a             and not merely as a unique event by chance. Further, even
mistake” [Popper, 1996]. Thirdly, the epistemic aim in sci-            in the case a deceptive theory has been integrated in a field,
ence can neither be truth directly [Frederick, 2020] nor can it        that is always only provisionally such that it could be revoked
be truer beliefs via justifications. The former is not directly        at any suitable moment e.g. once a better explanation arises
experienced and the latter has been shown to be logically              and repeated experiments falsify its claims. If in the course of
invalid by Popper [Popper, 2014]. Science is quintessen-               this, an actually better explanation had been mistakenly con-
tially explanatory i.e. it is based on explanations [Deutsch,          sidered as refuted, it can always be re-integrated once this is
2011] and not merely on data. While the epistemic aim can-             noticed. In fact, “a falsified theory may be true” [Frederick,
not be certainty or justification (and not even “truer explana-        2020] if the accepted observations believed to have falsified it
tions” [Frederick, 2020]1 for lack of direct access to truth),         were wrong. Thirdly, when now considering the final case of
a pragmatic way to view it is that our epistemic aim can be            AI-generated “fake reviews”, it becomes clear that they could
to achieve better explanations [Frederick, 2020]. One can              similarly slow down but not terminally disrupt the scientific
collectively agree on practical updatable criteria which better        method. At worst some existing theories could be unneces-
explanations should fulfill. In short, one does not assess a sci-      sarily problematized and misguiding theories uptaken, but all
entific theory in isolation, but in comparison to rival theories       these epistemic procedures can be repealed retrospectively.
and one is thereby embedded in a context with other scien-                In short, explanation-anchored science is resilient (albeit
tists. Fourthly, there are distinct ways to handle falsification       not immune) against SEA AI attacks but one can humbly face
and integrate empirical findings in explanation-anchored sci-          the idea that it is not because scientists can “tease out false-
ence. One can e.g. criticize an explanation and pinpoint in-           hood from truths” [Ho et al., 2020], but because explanation-
consistencies at a theoretical level. One can attempt to make          anchored science attempts to tease out better from worse ex-
a theory problematic via falsifying experiments whose results          planations while permanently requiring the creation of new
are accepted to seem to conflict with the predictions that the         ones whereby the steps made can always be revoked, revised
theory entailed [Deutsch, 2016]. Vitally, in the absence of a          and even actively adversarially counteracted. That entails a
better rival theory, it holds that “an explanatory theory can-         sort of epistemic dizziness and one can never trust one’s own
not be refuted by experiment: at most it can be made prob-             observations. Also, human mental constructions are insep-
lematic” [Deutsch, 2016].                                              arably cognitive-affective and science is not detached from
    Against the background of this epistemic bedrock, one can          social reality [Barrett, 2017]. In our view, for a systematic
now re-assess the threat landscape of SEA AI attacks. Firstly,         management of this epistemic dizziness, one may profit from
one can conclude that AI-generated “fake data” and “fake ex-           an adversarial approach that permanently brings to mind that
periments” could slow down but not terminally disrupt scien-           one might be wrong. Last but not least, an important feature
tific and empirical procedures. In the case of misguiding con-         discussed is that the epistemic aim not being truth (which it-
firmatory data, it has no epistemic effect since as opposed to         self is also not consensus and does not rely on trust to ex-
empiricist epistemology, explanation-anchored science does             ist) but instead better explanations, none of the mentioned
not utilize any scheme of credence updates for a theory and               2
it is clear that “a severely tested but unfalsified theory may              That there could exist a task which imitative language mod-
be false” [Frederick, 2020]. In the case of misleading data            els are “theoretically incapable of handling” has been often put
                                                                       into question [Sahlgren and Carlsson, 2021]. However, on epis-
that is accepted to falsify a theory T , one runs the risk to con-     temic grounds elaborated in-depth previously [Aliman, 2020a; Al-
                                                                       iman et al., 2021] which might be amenable to experimental falsi-
    1
      That our epistemic aim can be “truer explanations” or explana-   fiability [Aliman, 2020b], we assume that the task to consciously
tions that lead us “closer to the truth” has been sometimes confus-    create and understand novel yet unknown explanatory knowl-
ingly written by Deutsch and Popper respectively but this type of      edge [Deutsch, 2011] – which humans are capable of performing
account requires a semantic refinement [Frederick, 2020].              if willing to – cannot be learned by AI systems by mere imitation.
methods are dependent on trust per se – making it a trust-              the attacker directly targets human security analysts by
disentangled view. To sum up, we identified 3 key generic               exposing them to misleading CTI, the SEA AI attack
features for epistemic defenses against SEA AI attacks:                 can be interpreted as a type of adversarial example on
    1. Explanation-anchored instead of data-driven                      human cognition in a black-box setting. However, in
                                                                        such cases “open-source intelligence gathering and so-
    2. Trust-disentangled instead of trust-dependent                    cial engineering are exemplary tools that the adversary
    3. Adversarial instead of (self-)compliant                          can employ to widen its knowledge of beliefs, prefer-
                                                                        ences and personal traits exhibited by the victim” [Al-
3     Practical Use of Theoretical Defenses                             iman et al., 2021]. Hence, depending on the required
                                                                        sophistication, a type of grey-box setting is achievable.
In the following Subsection 3.1, we briefly perform an ex-
emplary threat modelling for the two specific use cases intro-       • Adversarial capabilities: The use of SEA AI attacks
duced in Section 1. The threat model narratives are naturally          could have been useful at multiple stages. CTI text could
non-exhaustive and are selected for illustrative purposes to           have been altered in a micro-targeted way offering di-
display plausible downward counterfactuals projecting capa-            verse capacities to a malicious actor: to distract ana-
bilities to the recent counterfactual past in the spirit of co-        lysts from patching existing vulnerabilities, to gain time
creation design fictions in AI safety [Aliman et al., 2021]. In        for the exploitation of zero-days, to let systems misclas-
Subsection 3.2, we then derive corresponding tailor-made de-           sify malign files as benign [Mahlangu et al., 2019] or
fenses from the generic characteristics that have been carved          to covertly take over victim networks. In the light of
out in the last Section 2 while thematizing notable caveats.           complex interdependencies, the malicious actor might
                                                                       not even have had a full overview of all repercussions
3.1     Threat Modelling for Use Cases                                 that AI-generated “fake CTI” attacks can engender. Poi-
Use Case Security Engineering                                          soned knowledge graphs could have led to unforeseen
  • Adversarial goals: As briefly mentioned in Sec-                    domino-effects inducing unknown second-order harm.
    tion 1, CTI (which is information related to cyberse-              As long-term strategy, the malicious actor could have
    curity threats and threat actors to support analysts and           harnessed SEA AI attacks on applied science writing to
    security systems in the detection and mitigation of cy-            automate the generation of cybersecurity reports (for it
    berattacks) can be polluted via misleading AI-generated            to later serve as CTI inputs) corroborating the robustness
    samples to fool cyber defense systems at the training              of actually unsafe defenses to covertly subvert those or
    stage [Ranade et al., 2021]. Among others, CTI is                  simply to spread confusion.
    available as unstructured texts but also as knowledge
                                                                   Use Case Scientific Writing
    graphs taking CTI texts as input. A textual data poi-
    soning via AI-produced “fake CTI” represents a form              • Adversarial goals: The emerging issue of (AI-aided)
    of SEA AI attack that was able to succesfully deceive              information operations in social media contexts which
    (AI-enhanced) automated cyber defense and even cyber-              involves entities related to state actors has gained mo-
    security experts which “labeled the majority of the fake           mentum in the last years [Prier, 2017; Hartmann and
    CTI samples as true despite their expertise” [Ranade et            Giles, 2020]. A key objective of information operations
    al., 2021]. It is easily conceivable that malicious ac-            that has been repeatedly mentioned is the intention to
    tors could specifically tailor such SEA AI attacks in or-          blur what is often termed as the line between facts and
    der to subvert cyber defense in the service of subsequent          fictions [Jakubowski, 2019]. Naturally, when logically
    covert time-efficient, micro-targeted and large-scale cy-          applying the epistemic stance introduced in the last Sec-
    bercrime. For 2021, cybercrime damages are estimated               tion 2, it seems recommendable to avoid such formu-
    to reach 6 trillion USD [Benz and Chatterjee, 2020;                lations for clarity since potentially confusing. Hence,
    Ozkan et al., 2021] making cybercrime a top interna-               we refer to it simply as epistemic distortion. SEA AI
    tional risk with a growing set of affordances which ma-            attacks on scientific writing being a form of AI-aided
    licious actors do not hesitate to enact. Actors interested         epistemic distortion, it could represent a lucrative oppor-
    in “fake CTI” attacks could be financially motivated cy-           tunity for state actors or politically motivated cybercrim-
    bercriminals or state-related actors. Adversarial goals            inals willing to ratchet up information operations. On a
    could e.g. be to acquire private data, CTI poisoning in            smaller scale, other potential malicious goals could also
    a cybercrime-as-a-service form, gain strategical advan-            involve companies with a certain agenda for a product
    tages in cyber operations, conduct espionage or even at-           that could be threatened by scientific research. Another
    tack critical infrastructure endangering human lives.              option could be advertisers that monetize attention via
                                                                       AI-generated research papers in click-bait schemes.
    • Adversarial knowledge: Since it is the attacker that fine-
      tunes the language model generating the “fake CTI”             • Adversarial knowledge: As in the first use case, the lan-
      samples for the SEA AI attack, we consider a white               guage model is available in a white-box setting. More-
      box setting for this system. The attacker does not re-           over, since this SEA AI attack directly targets human
      quire knowledge about the internal details of the tar-           entities, one can again assume a black-box or grey-box
      geted automated cyber defense allowing a black-box set-          scenario depending on the required sophistication of the
      ting with regard to this system at training time. In case        attack. For instance, since many scientists utilize social
    media platforms, open source intelligence gathering on        restrict the use of data from open platforms is not conducive
    related sources can be utilized to tailor contents.           to practicability not only due to the amount of crucial infor-
  • Adversarial capabilities: In the domain of adversar-          mation that a defense might miss, but also because it does
    ial machine learning, it has been stressed that for se-       not protect from insider threats [Ranade et al., 2021]. How-
    curity reasons it is important to also consider adaptive      ever, common solutions such as the AI-based detection of
    attacks [Carlini et al., 2019], namely reactive attacks       AI-generated outputs or trust-reliant scoring systems to flag
    that adapt to what the defense did. A malicious ac-           trusted sources do not seem sufficient either without more
    tor aware of the discussed explanation-anchored, trust-       ado since the former may fail in the near future if the genera-
    disentangled and adversarial epistemic defense approach       tor tends to win and the latter is at risk due to impersonation
    could have exploited a wide SEA AI attack surface in          possibilities that AI itself augments and due to the mentioned
    case of no consensus on the utility of this defense. For      insider threats. Interestingly, the issue of malicious insider
    instance, a polarization between two dichotomously op-        threats is also reflected in the second use case with scientific
    posed camps in that regard could have offered an ideal        writing being open to arbitrary participants.
    breeding ground for divisive information warfare en-          Defense for Security Engineering Use Case and Caveats
    deavors. For some, the perception of increasing dis-
    agreement tendencies may have confirmed post-truth             1. Explanation-anchored instead of data-driven: An
    narratives. Not for malicious reasons, but because it             explanation-anchored solution can be formulated from
    was genuinely considered. This in turn could have ce-             the inside out. Although AI does not understand expla-
    mented echo chamber effects now fuelled by a divided              nations, it is thinkable that a technically feasible future
    set of scientists one part of which considered science            hybrid active intelligent system3 for automated cyber de-
    to be epistemically defeated. This combined with post-            fense could use knowledge graph inconsistencies [Hey-
    truth narratives and the societal-level automated discon-         vaert et al., 2019] as signals to calculate when it will
    certion [Aliman et al., 2021] via the mere existence of           epistemically seek clarification from a human analyst,
    AI-generated fakery could have destabilized a fragile             when to actively query differing sources and sensors or
    society and incited violence. Massive and rapid large-            when to follow habitual courses of action. But the cre-
    scale SEA AI attacks in the form of a novel type of               ativity of human malicious actors cannot be predicted
    scientific astroturfing could have been employed to au-           and thus neither the system nor human analysts are able
    tomatically reinforce the widespread impression of per-           to prophesy over a space of not yet created attacks. Also,
    manently conflicting research results on-demand and tai-          as long as the system’s sensors are learning-based AI, it
    lored to a scientific topic. The concealed or ambiguous           stays an Achilles heel due to the vulnerability to attacks.
    AI-generated samples (be it data, experiments, papers or        2. Trust-disentangled instead of trust-dependent: Such a
    reviews) would not even need to be overrepresented in              procedure could seem disadvantageous given the fast re-
    respected venues but only made salient via social media            actions required in cyber defense. However, an adver-
    platforms being one of the main information sources for            sarial explanation-anchored framework is orthogonal to
    researchers – a task which could have been automated               the trust policy used. Trust-disentangled does not neces-
    via social bots influencing trending and sharing patterns.         sarily signify zero-trust4 at all levels if impracticable.
    A hinted variant of such SEA AI attacks could have been
                                                                    3. Adversarial instead of (self-)compliant: A permanently
    a flood of confirmatory AI-generated texts that corrobo-
                                                                       rotating in-house adversarial team is required. Activi-
    rate the robustness of defenses across a large array of
                                                                       ties can include red teaming, penetration testing and the
    security areas in order to exploit any reduced vulnerabil-
                                                                       development of (adaptive) attacks i.a. with AI-generated
    ity awareness. Finally, hyperlinks with attention-driving
                                                                       “fake CTI” text samples. A staggered approach is cog-
    fake research contribution titles competing with science
                                                                       itable in which automated defense processes that hap-
    journalism and redirecting to advertisement pages could
                                                                       pen at fast scales (e.g. requiring rapid access to open
    have polluted results displayed by search engines.
                                                                       source CTI) rely on interim (distributed) trust while all
3.2   Practical Defenses and Caveats                                   others – especially those involving human deliberation
As is also the case with other advanced not yet prevalent              to create novel defenses and attacks – strive for zero-
but technically already feasible AI-aided information oper-            trust information sharing (e.g. via a closed blockchain
ations [Hartmann and Giles, 2020] and cyberattacks targeting           with a restricted set of authorized participants having
AIs [Hartmann and Steup, 2020], consequences could have                read and write rights). In this way, one can create an
ranged from severe financial losses to threats to human lives.         interconnected 3-layered epistemically motivated secu-
Multiple socio-psycho-technological solutions including the            rity framework: a slow creative human-run adversarial
ones reviewed in Section 1 which may be (partially) relevant         3
                                                                       Such a system could instantiate technical self-awareness [Ali-
to SEA AI attack scenarios have been previously presented.        man, 2020a] (e.g. via active inference [Smith et al., 2021]).
Here, we complementarily focus on the epistemic dimensions           4
                                                                       The zero-trust [Kindervag, 2010] paradigm advanced in cyber-
one can add to the pool of potential solutions by applying the    security in the last decade which assumes “that adversaries are al-
3 generic features extracted in Section 2 to both use cases. We   ready inside the system, and therefore imposes strict access and au-
also emphasize novel caveats. Concerning the first use case       thentication requirements” [Collier and Sarkis, 2021] seems highly
of “fake CTI” SEA AI attacks, the straightforward thought to      appropriate in this increasingly complex security landscape.
       counterfactual layer on top of a slow creative human-        worked out a transdisciplinary and pragmatic 3-layered epis-
       run defensive layer steering a very fast hybrid-active-AI-   temically motivated security framework composed of adver-
       aided automated cyber defense layer. Important caveats       sarial, defensive and hybrid-active-AI-aided elements with
       are that such a framework: 1) can be resilient but not im-   two major caveats: 1) it can be resilient but not immune, 2) it
       mune, 2) can not and should not be entirely automated.       can not and should not be entirely automated. In both cases, a
                                                                    proactive exposure to synthetic AI-generated material could
Defense for Science Writing Use Case and Caveats                    foster critical thinking. Vitally, the existence of truth stays
 1. Explanation-anchored instead of data-driven: A prac-            a legitimate raison d’être for science. It is only that in ef-
    tical challenge for SEA AI attacks may seem the need            fect, one is not equipped with a direct acces to truth, all ob-
    for scientists to agree on pragmatic criteria for “bet-         servations are theory-laden and what one think one knows is
    ter” explanations (but widely accepted cases are e.g. the       linked to what is co-created in one’s collective enactment of
    preference for “simpler”, “more innovative” and “more           a world with other entities shaping and shaped by physical
    interesting” ones). Also, due to automated disconcer-           reality. Thereby, one can craft explanations to try to improve
    tion, reviewers could always suspect that a paper was           one’s active grip on a field of affordances but it stays an eter-
    AI-generated (potentially at the detriment of human lin-        nal mental tightrope walking of creativity. In view of this
    guistic statistical outliers). However, this is not a suffi-    inescapable epistemic dizziness, the main task of explanation-
    cient argument since explanation-anchored science and           anchored science is then neither to draw a line between truth
    criticism focus on content and not on source or style.          and falsity nor between the trusted and the untrusted. Instead,
    2. Trust-disentangled instead of trust-dependent: Via           it is to seek to robustly but provisionally separate better from
       trust-disentanglement, a paper generated by a present-       worse explanations. While this steadily renewed societally
       day AI would not only be rejected on provenance              relevant act does not yield immunity against AI-aided epis-
       grounds but due to its merely imitative and non-             temic distortion, it enables resiliency against at-present think-
       explanatory content. Though, an important asset is           able SEA AI attacks. To sum up, the epistemic dizziness of
       the review process which if infiltrated by imitative         conjecturing that one could always be wrong could stimulate
       AI-generated content could slow down explanation-            intellectual humility, but also unbound(ed) (adversarial) ex-
       anchored criticism if not thwarted fastly. A zero-trust      planatory knowledge co-creation. Future work could study
       scheme could mitigate this risk time-efficiently (e.g. via   how language AI – which could be exploited for future SEA
       a consortium blockchain for review activities). Another      AI attacks e.g. instrumental in performing cyber(crime) and
       zero-trust method would be to taxonomically monitor          information operations – could conversely serve as transfor-
       SEA AI attack events at an international level e.g. via      mative tool to augment anthropic creativity and tackle the
       an AI incident base [McGregor, 2020] tailored to these       SEA AI threat itself. For instance, language AI could be used
       attacks and complemented by adversarial retrospective        to stimulate human creativity in future AI and security design
       counterfactual risk analyses [Aliman et al., 2021] and       fictions for new threat models and defenses. In retrospective,
       defensive solutions. The monitoring can be AI-aided (or      AI is already acting as a catalyst since the very defenses hu-
       in the future hybrid-active-AI-aided) but human analysts     manity now crafts can broaden, deepen and refine the scope of
       are indispensable for a deep semantic understanding [Al-     explanations i.a. also about better explanations – an unceas-
       iman et al., 2021]. In short, also here, we suggest an in-   ing but also potentially strengthening safety relevant quest.
       terconnected 3-layered epistemic framework with adver-
       sarial, defensive and hybrid-active-AI-aided elements.       References
    3. Adversarial instead of (self-)compliant: As advanced
                                                                    [Aliman and Kester, 2020] Nadisha-Marie Aliman and Leon
       adversarial strategy which would also require respon-
       sible coordinated vulnerability disclosures [Kranenbarg        Kester. Facing Immersive “Post-Truth” in AIVR? Philoso-
       et al., 2018], one could perform red teaming, penetra-         phies, 5(4):45, 2020.
       tion tests and (adaptive) attacks employing AI-generated     [Aliman et al., 2021] Nadisha-Marie Aliman, Leon Kester,
       “fake data and experiments”, “fake papers” and “fake re-       and Roman Yampolskiy. Transdisciplinary AI Observa-
       views” [Tallón-Ballesteros, 2020]. Candidates for a blue      tory—Retrospective Analyses and Future-Oriented Con-
       team are e.g. reviewers and editors. Concurrently, urgent      tradistinctions. Philosophies, 6(1):6, 2021.
       AI-related plagiarism issues arise [Dehouche, 2021].
                                                                    [Aliman, 2020a] Nadisha-Marie Aliman. Hybrid Cognitive-
                                                                      Affective Strategies for AI Safety. PhD thesis, Utrecht Uni-
4     Conclusion and Future Work                                      versity, 2020.
For requisite variety, we introduced a complementary generic        [Aliman, 2020b] Nadisha-Marie Aliman.       Self-Shielding
epistemic defense against not yet prevalent but technically
                                                                      Worlds. https://nadishamarie.jimdo.com/clipboard/, 2020.
feasible SEA AI attacks. This generic approach fore-
                                                                      Online; accessed 23-November-2020.
grounded explanation-anchored, trust-disentangled and ad-
versarial features that we instantiated within two illustrative     [Amodei et al., 2016] Dario Amodei, Chris Olah, Jacob
use cases involving language models: AI-generated samples             Steinhardt, Paul Christiano, John Schulman, and Dan
to fool security engineering practices and AI-crafted contents        Mané. Concrete problems in AI safety. arXiv preprint
to distort scientific writing. For both use cases, we compactly       arXiv:1606.06565, 2016.
[Ashby, 1961] W Ross Ashby. An introduction to cybernet-         [Collier and Sarkis, 2021] Zachary A Collier and Joseph
  ics. Chapman & Hall Ltd, 1961.                                    Sarkis. The zero trust supply chain: Managing supply
[Ashby, 2020] Mick Ashby. Ethical regulators and super-             chain risk in the absence of trust. International Journal
  ethical systems. Systems, 8(4):53, 2020.                          of Production Research, pages 1–16, 2021.
                                                                 [Dehouche, 2021] Nassim Dehouche. Plagiarism in the age
[Ashkenazy and Zini, 2019] Adi Ashkenazy and Shahar
                                                                    of massive Generative Pre-trained Transformers (GPT-3).
  Zini. Attacking Machine Learning – The Cylance Case
                                                                    Ethics in Science and Environmental Politics, 21:17–23,
  Study .                      https://skylightcyber.com/2019/
                                                                    2021.
  07/18/cylance-i-kill-you/Cylance%20-%20Adversarial%
  20Machine%20Learning%20Case%20Study.pdf, 2019.                 [Deutsch, 2011] David Deutsch. The beginning of infinity:
  Skylight; accessed 24-May-2020.                                   Explanations that transform the world. Penguin UK, 2011.
[Baris and Boukhers, 2021] Ipek Baris and Zeyd Boukhers.         [Deutsch, 2016] David Deutsch. The logic of experimental
  ECOL: Early Detection of COVID Lies Using Content,                tests, particularly of Everettian quantum theory. Studies in
  Prior Knowledge and Source Information. arXiv preprint            History and Philosophy of Science Part B: Studies in His-
  arXiv:2101.05499, 2021.                                           tory and Philosophy of Modern Physics, 55:24–33, 2016.
[Barrett, 2017] Lisa Feldman Barrett. Functionalism cannot       [Fallis, 2020] Don Fallis. The Epistemic Threat of Deep-
  save the classical view of emotion. Social Cognitive and          fakes. Philosophy & Technology, pages 1–21, 2020.
  Affective Neuroscience, 12(1):34–36, 2017.                     [Fickinger et al., 2020] Arnaud         Fickinger,      Simon
[Benz and Chatterjee, 2020] Michael Benz and Dave Chat-             Zhuang, Dylan Hadfield-Menell, and Stuart Russell.
  terjee. Calculated risk? A cybersecurity evaluation tool          Multi-principal assistance games.           arXiv preprint
  for SMEs. Business Horizons, 63(4):531–540, 2020.                 arXiv:2007.09540, 2020.
                                                                 [Floridi et al., 2018] Luciano Floridi, Josh Cowls, Monica
[Boneh et al., 2019] Dan Boneh, Andrew J Grotto, Patrick
                                                                    Beltrametti, Raja Chatila, Patrice Chazerand, Virginia
  McDaniel, and Nicolas Papernot. How relevant is the Tur-
                                                                    Dignum, Christoph Luetge, Robert Madelin, Ugo Pagallo,
  ing test in the age of sophisbots? IEEE Security & Privacy,
                                                                    Francesca Rossi, et al. AI4People—an ethical framework
  17(6):64–71, 2019.
                                                                    for a good AI society: opportunities, risks, principles, and
[Bostrom, 2017] Nick Bostrom. Strategic implications of             recommendations. Minds and Machines, 28(4):689–707,
  openness in AI development. Global policy, 8(2):135–              2018.
  148, 2017.                                                     [Floridi, 2018] Luciano Floridi. Artificial intelligence, deep-
[Brown et al., 2020] Tom B Brown, Benjamin Mann, Nick               fakes and a future of ectypes. Philosophy & Technology,
  Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhari-             31(3):317–321, 2018.
  wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,          [Frederick, 2020] Danny Frederick. Against the Philosoph-
  Amanda Askell, et al. Language models are few-shot                ical Tide: Essays in Popperian Critical Rationalism.
  learners. arXiv preprint arXiv:2005.14165, 2020.                  Critias Publishing, 2020.
[Brundage et al., 2018] Miles Brundage, Shahar Avin, Jack        [Hartmann and Giles, 2020] Kim Hartmann and Keir Giles.
  Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Al-           The Next Generation of Cyber-Enabled Information War-
  lan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Fi-               fare. In 2020 12th International Conference on Cyber
  lar, et al. The malicious use of artificial intelligence:         Conflict (CyCon), volume 1300, pages 233–250. IEEE,
  Forecasting, prevention, and mitigation. arXiv preprint           2020.
  arXiv:1802.07228, 2018.
                                                                 [Hartmann and Steup, 2020] Kim Hartmann and Christoph
[Bufacchi, 2021] Vittorio Bufacchi. Truth, lies and tweets:         Steup. Hacking the AI - the Next Generation of Hijacked
  A consensus theory of post-truth. Philosophy & Social             Systems. In 2020 12th International Conference on Cy-
  Criticism, 47(3):347–361, 2021.                                   ber Conflict (CyCon), volume 1300, pages 327–349. IEEE,
[Burden and Hernández-Orallo, 2020] John Burden and José          2020.
  Hernández-Orallo. Exploring AI Safety in Degrees: Gen-        [Heyvaert et al., 2019] Pieter Heyvaert, Ben De Meester,
  erality, Capability and Control. In SafeAI@ AAAI, pages           Anastasia Dimou, and Ruben Verborgh. Rule-driven in-
  36–40, 2020.                                                      consistency resolution for knowledge graph generation
[Carlini et al., 2019] Nicholas Carlini, Anish Athalye, Nico-       rules. Semantic Web, 10(6):1071–1086, 2019.
  las Papernot, Wieland Brendel, Jonas Rauber, Dimitris          [Ho et al., 2020] Shirley S Ho, Tong Jee Goh, and Yan Wah
  Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey             Leung. Let’s nab fake science news: Predicting sci-
  Kurakin. On evaluating adversarial robustness. arXiv              entists’ support for interventions using the influence of
  preprint arXiv:1902.06705, 2019.                                  presumed media influence model. Journalism, page
[Chesney and Citron, 2019] Bobby Chesney and Danielle               1464884920937488, 2020.
  Citron. Deep fakes: A looming challenge for privacy,           [Jakubowski, 2019] G Jakubowski. What’s not to like? So-
  democracy, and national security. Calif. L. Rev., 107:1753,       cial media as information operations force multiplier. Joint
  2019.                                                             Force Quarterly, 3:8–17, 2019.
[Jobin et al., 2019] Anna Jobin, Marcello Ienca, and Effy        [Radford et al., 2019] Alec Radford, Jeffrey Wu, Rewon
   Vayena. The global landscape of AI ethics guidelines. Na-        Child, David Luan, Dario Amodei, and Ilya Sutskever.
   ture Machine Intelligence, 1(9):389–399, 2019.                   Language models are unsupervised multitask learners.
[Kaloudi and Li, 2020] Nektaria Kaloudi and Jingyue Li.             OpenAI blog, 1(8):9, 2019.
   The AI-based Cyber Threat Landscape: A Survey. ACM            [Raji et al., 2020] Inioluwa Deborah Raji, Timnit Gebru,
   Computing Surveys (CSUR), 53(1):1–34, 2020.                      Margaret Mitchell, Joy Buolamwini, Joonseok Lee, and
[Kindervag, 2010] John Kindervag. Build security into your          Emily Denton. Saving face: Investigating the ethical con-
   network’s DNA: The zero trust network architecture. For-         cerns of facial recognition auditing. In Proceedings of the
   rester Research Inc, pages 1–26, 2010.                           AAAI/ACM Conference on AI, Ethics, and Society, pages
                                                                    145–151, 2020.
[Kirat et al., 2018] Dhilung Kirat, Jiyong Jang, and Marc
                                                                 [Ranade et al., 2021] Priyanka Ranade, Aritran Piplai, Sudip
   Stoecklin. Deeplocker–concealing targeted attacks with
   AI locksmithing. Blackhat USA, 2018.                             Mittal, Anupam Joshi, and Tim Finin. Generating Fake
                                                                    Cyber Threat Intelligence Using Transformer-Based Mod-
[Kranenbarg et al., 2018] Marleen Weulen Kranenbarg,                els. arXiv preprint arXiv:2102.04351, 2021.
   Thomas J Holt, and Jeroen van der Ham. Don’t shoot
                                                                 [Sahlgren and Carlsson, 2021] Magnus         Sahlgren      and
   the messenger! A criminological and computer science
   perspective on coordinated vulnerability disclosure.             Fredrik Carlsson. The Singleton Fallacy: Why Current
   Crime Science, 7(1):1–9, 2018.                                   Critiques of Language Models Miss the Point. arXiv
                                                                    preprint arXiv:2102.04310, 2021.
[Leike et al., 2017] Jan Leike, Miljan Martic, Victoria
                                                                 [Seymour and Tully, 2016] John Seymour and Philip Tully.
   Krakovna, Pedro A Ortega, Tom Everitt, Andrew
   Lefrancq, Laurent Orseau, and Shane Legg. AI safety grid-        Weaponizing data science for social engineering: Auto-
   worlds. arXiv preprint arXiv:1711.09883, 2017.                   mated E2E spear phishing on Twitter. Black Hat USA,
                                                                    37:1–39, 2016.
[Mahlangu et al., 2019] Thabo Mahlangu, Sinethemba Jan-
                                                                 [Smith et al., 2021] Ryan Smith, Karl Friston, and Christo-
   uary, Thulani Mashiane, Moses Dlamini, Sipho Ngob-
   eni, Nkqubela Ruxwana, and Sun Tzu. Data Poisoning:              pher Whyte. A Step-by-Step Tutorial on Active Inference
   Achilles Heel of Cyber Threat Intelligence Systems. In           and its Application to Empirical Data. PsyArXiv, 2021.
   Proceedings of the ICCWS 2019 14th International Con-         [Tallón-Ballesteros, 2020] AJ Tallón-Ballesteros. Exploring
   ference on Cyber Warfare and Security: ICCWS, 2019.              the Potential of GPT-2 for Generating Fake Reviews of Re-
[Makri, 2017] Anita Makri. Give the public the tools to trust       search Papers. Fuzzy Systems and Data Mining VI: Pro-
   scientists. Nature News, 541(7637):261, 2017.                    ceedings of FSDM 2020, 331:390, 2020.
[McGregor, 2020] Sean McGregor. Preventing Repeated              [Tully and Foster, 2020] Philip Tully and Lee Foster. Repur-
   Real World AI Failures by Cataloging Incidents: The              posing Neural Networks to Generate Synthetic Media for
   AI Incident Database. arXiv preprint arXiv:2011.08512,           Information Operations.         https://www.blackhat.com/
   2020.                                                            us-20/briefings/schedule/, 2020. Session at blackhat USA
                                                                    2020; accessed 08-August-2020.
[ÓhÉigeartaigh et al., 2020] Seán S ÓhÉigeartaigh, Jess
                                                                 [Van Noorden, 2014] Richard Van Noorden. Publishers with-
   Whittlestone, Yang Liu, Yi Zeng, and Zhe Liu. Overcom-
   ing barriers to cross-cultural cooperation in AI ethics and      draw more than 120 gibberish papers. Nature News, 2014.
   governance. Philosophy & Technology, 33(4):571–593,           [Vinnakota, 2013] Tirumala Vinnakota.                   A cy-
   2020.                                                            bernetics paradigms framework for cyberspace: Key lens
[Ozkan et al., 2021] Bilge Yigit Ozkan, Sonny van Lingen,           to cybersecurity. In 2013 IEEE International Conference
                                                                    on Computational Intelligence and Cybernetics (CYBER-
   and Marco Spruit. The Cybersecurity Focus Area Maturity
                                                                    NETICSCOM), pages 85–91. IEEE, 2013.
   (CYSFAM) Model. Journal of Cybersecurity and Privacy,
   1(1):119–139, 2021.                                           [Wahle et al., 2021] Jan Philip Wahle, Terry Ruas, Norman
[Pistono and Yampolskiy, 2016] Federico Pistono and Ro-             Meuschke, and Bela Gipp. Are neural language models
                                                                    good plagiarists? A benchmark for neural paraphrase de-
   man V Yampolskiy. Unethical Research: How to Create
                                                                    tection. arXiv preprint arXiv:2103.12450, 2021.
   a Malevolent Artificial Intelligence. arXiv e-prints, pages
   arXiv–1605, 2016.                                             [Zeadally et al., 2020] Sherali Zeadally, Erwin Adi, Zubair
[Popper, 1996] Karl Popper. In search of a better world:            Baig, and Imran A Khan. Harnessing artificial intelli-
                                                                    gence capabilities to improve cybersecurity. IEEE Access,
   Lectures and essays from thirty years. Psychology Press,
                                                                    8:23817–23837, 2020.
   1996.
                                                                 [Zhao et al., 2021] Bo Zhao, Shaozeng Zhang, Chunxue Xu,
[Popper, 2014] Karl Popper. Conjectures and refutations:
                                                                    Yifan Sun, and Chengbin Deng. Deep fake geography?
   The growth of scientific knowledge. routledge, 2014.
                                                                    When geospatial data encounter Artificial Intelligence. Car-
[Prier, 2017] Jarred Prier. Commanding the trend: Social            tography and Geographic Information Science, pages 1–
   media as information warfare. Strategic Studies Quarterly,       15, 2021.
   11(4):50–85, 2017.