Epistemic Defenses against Scientific and Empirical Adversarial AI Attacks∗ Nadisha-Marie Aliman1 and Leon Kester2† 1 Utrecht University, Utrecht, The Netherlands 2 TNO Netherlands, The Hague, The Netherlands leon.kester@tno.nl Abstract et al., 2018; Jobin et al., 2019; ÓhÉigeartaigh et al., 2020; Raji et al., 2020] gained momentum at an international In this paper, we introduce “scientific and empir- level. In addition, cybersecurity-oriented frameworks in AI ical adversarial AI attacks” (SEA AI attacks) as safety [Aliman et al., 2021; Brundage et al., 2018; Pistono umbrella term for not yet prevalent but technically and Yampolskiy, 2016] stressed the necessity to not only ad- feasible deliberate malicious acts of specifically dress unintentional errors, unforeseen repercussions and bugs crafting AI-generated samples to achieve an epis- in the context of ethical AI design but also AI risks linked to temic distortion in (applied) science or engineer- intentional malice i.e. deliberate unethical design, attacks and ing contexts. In view of possible socio-psycho- sabotage by malicious actors. In parallel, the convergence of technological impacts, it seems responsible to pon- AI with other technologies increases and diversifies the attack der countermeasures from the onset on and not in surface available to malevolent actors. For instance, while hindsight. In this vein, we consider two illustrative AI-enhanced cybersecurity opens up novel valuable possi- use cases: the example of AI-produced data to mis- bilities for defenders [Zeadally et al., 2020], AI simultane- lead security engineering practices and the conceiv- ously provides new affordances for attackers [Ashkenazy and able prospect of AI-generated contents to manipu- Zini, 2019] from AI-aided social engineering [Seymour and late scientific writing processes. Firstly, we con- Tully, 2016] to AI-concealed malware [Kirat et al., 2018]. textualize the epistemic challenges that such future Next to the capacity of AI to extend classical cyberattacks SEA AI attacks could pose to society in the light of in scope, speed and scale [Kaloudi and Li, 2020], a no- broader i.a. AI safety, AI ethics and cybersecurity- table emerging threat is what we denote AI-aided epistemic relevant efforts. Secondly, we set forth a corre- distortion. The latter represents a form of AI weaponiza- sponding supportive generic epistemic defense ap- tion and is increasingly studied in its currently most salient proach. Thirdly, we effect a threat modelling for form, namely AI-aided disinformation [Aliman et al., 2021; the two use cases and propose tailor-made defenses Chesney and Citron, 2019; Kaloudi and Li, 2020; Tully and based on the foregoing generic deliberations. Strik- Foster, 2020] which is especially relevant to information war- ingly, our transdisciplinary analysis suggests that fare [Hartmann and Giles, 2020]. Recently, the weaponiza- employing distinct explanation-anchored, trust- tion of Generative AI for information operations has been de- disentangled and adversarial strategies is one pos- scribed as “a sincere threat to democracies” [Hartmann and sible principled complementary epistemic defense Steup, 2020]. In this paper, we analyze attacks and defenses against SEA AI attacks – albeit with caveats yield- pertaining to another not yet prevalent but technically feasible ing incentives for future work. and similarly concerning form of AI-aided epistemic distor- tion with potentially profound societal implications: scientific and empirical adversarial AI attacks (SEA AI attacks). 1 Introduction With SEA AI attacks, we refer to any deliberately mali- Progress in the AI field unfolds a wide growing array of bene- cious AI-aided epistemic distortion which predominantly and ficial societal effects with AI permeating more and more cru- directly targets (applied) science and technology assets (as cial application domains. To forestall ethically-relevant rami- opposed to information operations where a wider societal fications, research from a variety of disciplines tackling perti- target is often selected on ideological/political grounds). In nent AI safety [Amodei et al., 2016; Bostrom, 2017; Burden short, the expression acts as an umbrella term for malicious and Hernández-Orallo, 2020; Fickinger et al., 2020; Leike actors utilizing or attacking AI at pre- or post-deployment et al., 2017], AI ethics and AI governance issues [Floridi stages with the deliberate adversarial aim to deceive, sabo- ∗ Copyright © 2021 for this paper by its authors. Use permitted tage, slow down or disrupt (applied) science, engineering or under Creative Commons License Attribution 4.0 International (CC related endeavors. Obviously, SEA AI attacks could be per- BY 4.0). formed in a variety of modalities (see e.g. “deepfake geog- † Contact Author raphy” [Zhao et al., 2021] related to vision). However, for illustrative purposes, we base our two exemplary use cases matized and even lately applied to “fake news” in the health- on misuses of language models. The first use case treats SEA care domain [Baris and Boukhers, 2021]. Furthermore, in the AI attacks on security engineering via schemes in which a context of counteracting risks posed by the deployment of malicious actor poisons training data resources [Mahlangu et sophisticated online bots, it has been suggested that “techni- al., 2019] that are vital to data-driven defenses in the cyberse- cal solutions, while important, should be complemented with curity ecosystem. Lately, a proof-of-concept for an AI-based efforts involving informed policy and international norms to data poisoning attack has been implemented in the context accompany these technological developments” and that “it of cyber threat intelligence (CTI) [Ranade et al., 2021]. The is essential to foster increased civic literacy of the nature authors utilized a fine-tuned version of the GPT-2 language of ones interactions” [Boneh et al., 2019]. Another analy- model [Radford et al., 2019] and were able to generate fake sis presented a set of defense measures against the spread of CTI which was indistinguishable from its legitimate coun- deepfakes [Chesney and Citron, 2019] which contained i.a. terpart when presented to cybersecurity experts. The sec- legal solutions, administrative agency solutions, coercive and ond use case studies conceivable SEA AI attacks on proce- covert responses as well as sanctions (when effectuated by dures that are essential to scientific writing. Related exam- state actors) and speech policies for online platforms. Con- ples that have been depicted in recent work encompass plagia- cerning “fake science news” and their impacts on “credibility rism studies with transformers like BERT [Wahle et al., 2021] and reputation of the science community” [Ho et al., 2020], and with the pre-trained GPT-3 language model [Brown et it has been even postulated by Makri that “science is losing al., 2020] that “may very well pass peer review” [Dehouche, its relevance as a source of truth” and “the new focus on 2021] but also AI-generated fake reviews (with a fine-tuned post-truth shows there is now a tangible danger that must be version of GPT-2) apt to mislead experienced researchers in addressed” [Makri, 2017]. Following the author, scientists a small user study [Tallón-Ballesteros, 2020]. Future mali- could equip citizens with sense-making tools without which cious actors could deliberately breed a large-scale agenda in “emotions and beliefs that pander to false certainties become the spirit of “fake science news” [Ho et al., 2020] and AI- more credible” [Makri, 2017]. generated papers that would widely exceed in quality (later While some of those socio-psycho-technological counter- withdrawn) computer-generated research papers [Van Noor- measures and underlying assumptions are debatable, we com- den, 2014] published at respected venues. In short, techni- plementarily zoom in different epistemic defenses against cally already practicable SEA AI attacks could have consid- SEA AI attacks being directed against scientific and empirical erable negative effects if jointly potentiated with regard to frameworks. Amidst an information ecosystem with quasi- scale, scope and speed by malicious actors equipped with omnipresent terms such as “post-truth” or “fake news” and sufficient resources. As later exemplified in Subsection 3.1, in light of data-driven research trends embedded within trust- the security engineering use case could e.g. involve dynamic based infrastructures, it seems daunting to face a threat land- domino-effects leading to large financial losses and even risks scape populated by AI-generated artefacts such as: 1) “fake to human lives while the scientific writing use case seems to data” and “fake experiments”, 2) “fake research papers” (or moreover reveal a domain-general epistemic problem. The “fraudulent academic essay writing” [Brown et al., 2020]) mere existence of the latter also affects the former and could and 3) “fake reviews”. More broadly, it has been stated that engender serious pitfalls whose generically formulated prin- deepfakes “seem to undermine our confidence in the original, cipled management is compactly treated in the next Section 2. genuine, authentic nature of what we see and hear” [Floridi, 2018]. Taking the perspective of an empiricism-based episte- 2 Theoretical Generic Epistemic Defenses mology grounded in justification with the aim to obtain truer beliefs via (probabilistic) belief updates given evidence, a re- As reflected in the law of requisite variety (LRV) known cent in-depth analysis found that the existence of deepfake from cybernetics, “only variety can destroy variety” [Ashby, videos confronts society with epistemic threats [Fallis, 2020]. 1961]. Applied to SEA AI attacks, it signifies that since Thereby, it is assumed that “deepfakes reduce the amount malicious adversaries are not only exploiting vulnerabilities of information that videos carry to viewers” [Fallis, 2020] from a heterogeneous socio-psycho-technological landscape which analogously quantitatively affected the amount of in- but also specially vulnerabilities of epistemic nature, suitable formation in text-based news due to earlier “fake news” phe- defense methods may profit from an epistemic stance. Ap- nomena. In our view, when applying this stance to audiovi- plying the cybernetic LRV offers a valuable domain-general sual and textual samples of scientific material but also broadly transdisciplinary tool able to stimulate and invigorate novel to the context of security engineering and scientific communi- tailored defenses in a diversity of harm-related problems cation where the deployment of deepfakes for SEA AI attacks from cybersecurity [Vinnakota, 2013] to AI safety [Aliman, could occur in multifarious ways, the consequences seem dis- 2020a] over AI ethics [Ashby, 2020]. In short, utilizing in- astrous. In brief, SEA AI defenses seem relevant to AI safety sights from epistemology as complementary basis to frame since an inability to build up resiliency against those attacks defense methods against SEA AI attacks seems indispens- may suggest that already present-day AI could (be used to) able. Past work predominantly analyzed countermeasures of outmaneuver humans on a large scale – without any “su- socio-psycho-technological nature to combat the spread of perintelligent” competency. However, empiricist epistemol- (audio-)visual, audio and textual deepfakes as well as “fake ogy is not without any alternative. In the following, we thus news” more broadly. For instance, the technical detection of first mentally enact one alternative epistemic stance (without AI-generated content [Wahle et al., 2021] has been often the- claiming that it represents the only possible alternative). We present its key generic epistemic suppositions serving as a sider mistakenly that T has been made problematic. How- basis for the next Section 3 where we tailor defenses against ever, since it is not permissible to drop T in the absence of a SEA AI attacks for the specific use cases. rival theory T 0 representing a better explanation than T , the Firstly, it has been lately propounded that the societal per- adverarial capabilities of the SEA AI attacker are limited. In ception of a “post-truth” era is often linked to the implicit short, theories cannot be deleted from the collective knowl- assumption that truth can be equated with consensus which edge via such SEA AI attacks without more ado. Secondly, is why it seems recommendable to consider a deflationary when contemplating the case of AI-generated “fake research account of truth [Bufacchi, 2021] – i.e. where the concept papers”, it seems that they could slow down but not disrupt is for instance strictly reserved to scientifically-relevant epis- scientific methodology. Overall, one could state that the dan- temic contexts. On such a deflationary account of truth disen- ger lies in the uptake of deceptive theories. However, the- tangled from consensus, it has been argued that even if con- ories are only integrated in explanatory-anchored science if sensus and trust seem eroded, we neither inhabit a post-truth they represent better explanations in comparison to alterna- nor a science-threatening post-falsification age [Aliman and tives or in the absence of alternatives if they explain novel Kester, 2020]. Secondly, we never had a direct access to phys- phenomena. In a nutshell, it takes explanations that are si- ical reality which we could have suddenly lost with the advent multaneously misguiding and better for such a SEA AI attack of “fake news”. In fact, as stated by Karl Popper: “Once we to succeed. This is a high bar for imitative language mod- realize that human knowledge is fallible, we realize also that els if meant to be repeatedly and systematically performed2 we can never be completely certain that we have not made a and not merely as a unique event by chance. Further, even mistake” [Popper, 1996]. Thirdly, the epistemic aim in sci- in the case a deceptive theory has been integrated in a field, ence can neither be truth directly [Frederick, 2020] nor can it that is always only provisionally such that it could be revoked be truer beliefs via justifications. The former is not directly at any suitable moment e.g. once a better explanation arises experienced and the latter has been shown to be logically and repeated experiments falsify its claims. If in the course of invalid by Popper [Popper, 2014]. Science is quintessen- this, an actually better explanation had been mistakenly con- tially explanatory i.e. it is based on explanations [Deutsch, sidered as refuted, it can always be re-integrated once this is 2011] and not merely on data. While the epistemic aim can- noticed. In fact, “a falsified theory may be true” [Frederick, not be certainty or justification (and not even “truer explana- 2020] if the accepted observations believed to have falsified it tions” [Frederick, 2020]1 for lack of direct access to truth), were wrong. Thirdly, when now considering the final case of a pragmatic way to view it is that our epistemic aim can be AI-generated “fake reviews”, it becomes clear that they could to achieve better explanations [Frederick, 2020]. One can similarly slow down but not terminally disrupt the scientific collectively agree on practical updatable criteria which better method. At worst some existing theories could be unneces- explanations should fulfill. In short, one does not assess a sci- sarily problematized and misguiding theories uptaken, but all entific theory in isolation, but in comparison to rival theories these epistemic procedures can be repealed retrospectively. and one is thereby embedded in a context with other scien- In short, explanation-anchored science is resilient (albeit tists. Fourthly, there are distinct ways to handle falsification not immune) against SEA AI attacks but one can humbly face and integrate empirical findings in explanation-anchored sci- the idea that it is not because scientists can “tease out false- ence. One can e.g. criticize an explanation and pinpoint in- hood from truths” [Ho et al., 2020], but because explanation- consistencies at a theoretical level. One can attempt to make anchored science attempts to tease out better from worse ex- a theory problematic via falsifying experiments whose results planations while permanently requiring the creation of new are accepted to seem to conflict with the predictions that the ones whereby the steps made can always be revoked, revised theory entailed [Deutsch, 2016]. Vitally, in the absence of a and even actively adversarially counteracted. That entails a better rival theory, it holds that “an explanatory theory can- sort of epistemic dizziness and one can never trust one’s own not be refuted by experiment: at most it can be made prob- observations. Also, human mental constructions are insep- lematic” [Deutsch, 2016]. arably cognitive-affective and science is not detached from Against the background of this epistemic bedrock, one can social reality [Barrett, 2017]. In our view, for a systematic now re-assess the threat landscape of SEA AI attacks. Firstly, management of this epistemic dizziness, one may profit from one can conclude that AI-generated “fake data” and “fake ex- an adversarial approach that permanently brings to mind that periments” could slow down but not terminally disrupt scien- one might be wrong. Last but not least, an important feature tific and empirical procedures. In the case of misguiding con- discussed is that the epistemic aim not being truth (which it- firmatory data, it has no epistemic effect since as opposed to self is also not consensus and does not rely on trust to ex- empiricist epistemology, explanation-anchored science does ist) but instead better explanations, none of the mentioned not utilize any scheme of credence updates for a theory and 2 it is clear that “a severely tested but unfalsified theory may That there could exist a task which imitative language mod- be false” [Frederick, 2020]. In the case of misleading data els are “theoretically incapable of handling” has been often put into question [Sahlgren and Carlsson, 2021]. However, on epis- that is accepted to falsify a theory T , one runs the risk to con- temic grounds elaborated in-depth previously [Aliman, 2020a; Al- iman et al., 2021] which might be amenable to experimental falsi- 1 That our epistemic aim can be “truer explanations” or explana- fiability [Aliman, 2020b], we assume that the task to consciously tions that lead us “closer to the truth” has been sometimes confus- create and understand novel yet unknown explanatory knowl- ingly written by Deutsch and Popper respectively but this type of edge [Deutsch, 2011] – which humans are capable of performing account requires a semantic refinement [Frederick, 2020]. if willing to – cannot be learned by AI systems by mere imitation. methods are dependent on trust per se – making it a trust- the attacker directly targets human security analysts by disentangled view. To sum up, we identified 3 key generic exposing them to misleading CTI, the SEA AI attack features for epistemic defenses against SEA AI attacks: can be interpreted as a type of adversarial example on 1. Explanation-anchored instead of data-driven human cognition in a black-box setting. However, in such cases “open-source intelligence gathering and so- 2. Trust-disentangled instead of trust-dependent cial engineering are exemplary tools that the adversary 3. Adversarial instead of (self-)compliant can employ to widen its knowledge of beliefs, prefer- ences and personal traits exhibited by the victim” [Al- 3 Practical Use of Theoretical Defenses iman et al., 2021]. Hence, depending on the required sophistication, a type of grey-box setting is achievable. In the following Subsection 3.1, we briefly perform an ex- emplary threat modelling for the two specific use cases intro- • Adversarial capabilities: The use of SEA AI attacks duced in Section 1. The threat model narratives are naturally could have been useful at multiple stages. CTI text could non-exhaustive and are selected for illustrative purposes to have been altered in a micro-targeted way offering di- display plausible downward counterfactuals projecting capa- verse capacities to a malicious actor: to distract ana- bilities to the recent counterfactual past in the spirit of co- lysts from patching existing vulnerabilities, to gain time creation design fictions in AI safety [Aliman et al., 2021]. In for the exploitation of zero-days, to let systems misclas- Subsection 3.2, we then derive corresponding tailor-made de- sify malign files as benign [Mahlangu et al., 2019] or fenses from the generic characteristics that have been carved to covertly take over victim networks. In the light of out in the last Section 2 while thematizing notable caveats. complex interdependencies, the malicious actor might not even have had a full overview of all repercussions 3.1 Threat Modelling for Use Cases that AI-generated “fake CTI” attacks can engender. Poi- Use Case Security Engineering soned knowledge graphs could have led to unforeseen • Adversarial goals: As briefly mentioned in Sec- domino-effects inducing unknown second-order harm. tion 1, CTI (which is information related to cyberse- As long-term strategy, the malicious actor could have curity threats and threat actors to support analysts and harnessed SEA AI attacks on applied science writing to security systems in the detection and mitigation of cy- automate the generation of cybersecurity reports (for it berattacks) can be polluted via misleading AI-generated to later serve as CTI inputs) corroborating the robustness samples to fool cyber defense systems at the training of actually unsafe defenses to covertly subvert those or stage [Ranade et al., 2021]. Among others, CTI is simply to spread confusion. available as unstructured texts but also as knowledge Use Case Scientific Writing graphs taking CTI texts as input. A textual data poi- soning via AI-produced “fake CTI” represents a form • Adversarial goals: The emerging issue of (AI-aided) of SEA AI attack that was able to succesfully deceive information operations in social media contexts which (AI-enhanced) automated cyber defense and even cyber- involves entities related to state actors has gained mo- security experts which “labeled the majority of the fake mentum in the last years [Prier, 2017; Hartmann and CTI samples as true despite their expertise” [Ranade et Giles, 2020]. A key objective of information operations al., 2021]. It is easily conceivable that malicious ac- that has been repeatedly mentioned is the intention to tors could specifically tailor such SEA AI attacks in or- blur what is often termed as the line between facts and der to subvert cyber defense in the service of subsequent fictions [Jakubowski, 2019]. Naturally, when logically covert time-efficient, micro-targeted and large-scale cy- applying the epistemic stance introduced in the last Sec- bercrime. For 2021, cybercrime damages are estimated tion 2, it seems recommendable to avoid such formu- to reach 6 trillion USD [Benz and Chatterjee, 2020; lations for clarity since potentially confusing. Hence, Ozkan et al., 2021] making cybercrime a top interna- we refer to it simply as epistemic distortion. SEA AI tional risk with a growing set of affordances which ma- attacks on scientific writing being a form of AI-aided licious actors do not hesitate to enact. Actors interested epistemic distortion, it could represent a lucrative oppor- in “fake CTI” attacks could be financially motivated cy- tunity for state actors or politically motivated cybercrim- bercriminals or state-related actors. Adversarial goals inals willing to ratchet up information operations. On a could e.g. be to acquire private data, CTI poisoning in smaller scale, other potential malicious goals could also a cybercrime-as-a-service form, gain strategical advan- involve companies with a certain agenda for a product tages in cyber operations, conduct espionage or even at- that could be threatened by scientific research. Another tack critical infrastructure endangering human lives. option could be advertisers that monetize attention via AI-generated research papers in click-bait schemes. • Adversarial knowledge: Since it is the attacker that fine- tunes the language model generating the “fake CTI” • Adversarial knowledge: As in the first use case, the lan- samples for the SEA AI attack, we consider a white guage model is available in a white-box setting. More- box setting for this system. The attacker does not re- over, since this SEA AI attack directly targets human quire knowledge about the internal details of the tar- entities, one can again assume a black-box or grey-box geted automated cyber defense allowing a black-box set- scenario depending on the required sophistication of the ting with regard to this system at training time. In case attack. For instance, since many scientists utilize social media platforms, open source intelligence gathering on restrict the use of data from open platforms is not conducive related sources can be utilized to tailor contents. to practicability not only due to the amount of crucial infor- • Adversarial capabilities: In the domain of adversar- mation that a defense might miss, but also because it does ial machine learning, it has been stressed that for se- not protect from insider threats [Ranade et al., 2021]. How- curity reasons it is important to also consider adaptive ever, common solutions such as the AI-based detection of attacks [Carlini et al., 2019], namely reactive attacks AI-generated outputs or trust-reliant scoring systems to flag that adapt to what the defense did. A malicious ac- trusted sources do not seem sufficient either without more tor aware of the discussed explanation-anchored, trust- ado since the former may fail in the near future if the genera- disentangled and adversarial epistemic defense approach tor tends to win and the latter is at risk due to impersonation could have exploited a wide SEA AI attack surface in possibilities that AI itself augments and due to the mentioned case of no consensus on the utility of this defense. For insider threats. Interestingly, the issue of malicious insider instance, a polarization between two dichotomously op- threats is also reflected in the second use case with scientific posed camps in that regard could have offered an ideal writing being open to arbitrary participants. breeding ground for divisive information warfare en- Defense for Security Engineering Use Case and Caveats deavors. For some, the perception of increasing dis- agreement tendencies may have confirmed post-truth 1. Explanation-anchored instead of data-driven: An narratives. Not for malicious reasons, but because it explanation-anchored solution can be formulated from was genuinely considered. This in turn could have ce- the inside out. Although AI does not understand expla- mented echo chamber effects now fuelled by a divided nations, it is thinkable that a technically feasible future set of scientists one part of which considered science hybrid active intelligent system3 for automated cyber de- to be epistemically defeated. This combined with post- fense could use knowledge graph inconsistencies [Hey- truth narratives and the societal-level automated discon- vaert et al., 2019] as signals to calculate when it will certion [Aliman et al., 2021] via the mere existence of epistemically seek clarification from a human analyst, AI-generated fakery could have destabilized a fragile when to actively query differing sources and sensors or society and incited violence. Massive and rapid large- when to follow habitual courses of action. But the cre- scale SEA AI attacks in the form of a novel type of ativity of human malicious actors cannot be predicted scientific astroturfing could have been employed to au- and thus neither the system nor human analysts are able tomatically reinforce the widespread impression of per- to prophesy over a space of not yet created attacks. Also, manently conflicting research results on-demand and tai- as long as the system’s sensors are learning-based AI, it lored to a scientific topic. The concealed or ambiguous stays an Achilles heel due to the vulnerability to attacks. AI-generated samples (be it data, experiments, papers or 2. Trust-disentangled instead of trust-dependent: Such a reviews) would not even need to be overrepresented in procedure could seem disadvantageous given the fast re- respected venues but only made salient via social media actions required in cyber defense. However, an adver- platforms being one of the main information sources for sarial explanation-anchored framework is orthogonal to researchers – a task which could have been automated the trust policy used. Trust-disentangled does not neces- via social bots influencing trending and sharing patterns. sarily signify zero-trust4 at all levels if impracticable. A hinted variant of such SEA AI attacks could have been 3. Adversarial instead of (self-)compliant: A permanently a flood of confirmatory AI-generated texts that corrobo- rotating in-house adversarial team is required. Activi- rate the robustness of defenses across a large array of ties can include red teaming, penetration testing and the security areas in order to exploit any reduced vulnerabil- development of (adaptive) attacks i.a. with AI-generated ity awareness. Finally, hyperlinks with attention-driving “fake CTI” text samples. A staggered approach is cog- fake research contribution titles competing with science itable in which automated defense processes that hap- journalism and redirecting to advertisement pages could pen at fast scales (e.g. requiring rapid access to open have polluted results displayed by search engines. source CTI) rely on interim (distributed) trust while all 3.2 Practical Defenses and Caveats others – especially those involving human deliberation As is also the case with other advanced not yet prevalent to create novel defenses and attacks – strive for zero- but technically already feasible AI-aided information oper- trust information sharing (e.g. via a closed blockchain ations [Hartmann and Giles, 2020] and cyberattacks targeting with a restricted set of authorized participants having AIs [Hartmann and Steup, 2020], consequences could have read and write rights). In this way, one can create an ranged from severe financial losses to threats to human lives. interconnected 3-layered epistemically motivated secu- Multiple socio-psycho-technological solutions including the rity framework: a slow creative human-run adversarial ones reviewed in Section 1 which may be (partially) relevant 3 Such a system could instantiate technical self-awareness [Ali- to SEA AI attack scenarios have been previously presented. man, 2020a] (e.g. via active inference [Smith et al., 2021]). Here, we complementarily focus on the epistemic dimensions 4 The zero-trust [Kindervag, 2010] paradigm advanced in cyber- one can add to the pool of potential solutions by applying the security in the last decade which assumes “that adversaries are al- 3 generic features extracted in Section 2 to both use cases. We ready inside the system, and therefore imposes strict access and au- also emphasize novel caveats. Concerning the first use case thentication requirements” [Collier and Sarkis, 2021] seems highly of “fake CTI” SEA AI attacks, the straightforward thought to appropriate in this increasingly complex security landscape. counterfactual layer on top of a slow creative human- worked out a transdisciplinary and pragmatic 3-layered epis- run defensive layer steering a very fast hybrid-active-AI- temically motivated security framework composed of adver- aided automated cyber defense layer. Important caveats sarial, defensive and hybrid-active-AI-aided elements with are that such a framework: 1) can be resilient but not im- two major caveats: 1) it can be resilient but not immune, 2) it mune, 2) can not and should not be entirely automated. can not and should not be entirely automated. In both cases, a proactive exposure to synthetic AI-generated material could Defense for Science Writing Use Case and Caveats foster critical thinking. Vitally, the existence of truth stays 1. Explanation-anchored instead of data-driven: A prac- a legitimate raison d’être for science. It is only that in ef- tical challenge for SEA AI attacks may seem the need fect, one is not equipped with a direct acces to truth, all ob- for scientists to agree on pragmatic criteria for “bet- servations are theory-laden and what one think one knows is ter” explanations (but widely accepted cases are e.g. the linked to what is co-created in one’s collective enactment of preference for “simpler”, “more innovative” and “more a world with other entities shaping and shaped by physical interesting” ones). Also, due to automated disconcer- reality. Thereby, one can craft explanations to try to improve tion, reviewers could always suspect that a paper was one’s active grip on a field of affordances but it stays an eter- AI-generated (potentially at the detriment of human lin- nal mental tightrope walking of creativity. In view of this guistic statistical outliers). However, this is not a suffi- inescapable epistemic dizziness, the main task of explanation- cient argument since explanation-anchored science and anchored science is then neither to draw a line between truth criticism focus on content and not on source or style. and falsity nor between the trusted and the untrusted. Instead, 2. Trust-disentangled instead of trust-dependent: Via it is to seek to robustly but provisionally separate better from trust-disentanglement, a paper generated by a present- worse explanations. While this steadily renewed societally day AI would not only be rejected on provenance relevant act does not yield immunity against AI-aided epis- grounds but due to its merely imitative and non- temic distortion, it enables resiliency against at-present think- explanatory content. Though, an important asset is able SEA AI attacks. To sum up, the epistemic dizziness of the review process which if infiltrated by imitative conjecturing that one could always be wrong could stimulate AI-generated content could slow down explanation- intellectual humility, but also unbound(ed) (adversarial) ex- anchored criticism if not thwarted fastly. A zero-trust planatory knowledge co-creation. Future work could study scheme could mitigate this risk time-efficiently (e.g. via how language AI – which could be exploited for future SEA a consortium blockchain for review activities). Another AI attacks e.g. instrumental in performing cyber(crime) and zero-trust method would be to taxonomically monitor information operations – could conversely serve as transfor- SEA AI attack events at an international level e.g. via mative tool to augment anthropic creativity and tackle the an AI incident base [McGregor, 2020] tailored to these SEA AI threat itself. For instance, language AI could be used attacks and complemented by adversarial retrospective to stimulate human creativity in future AI and security design counterfactual risk analyses [Aliman et al., 2021] and fictions for new threat models and defenses. In retrospective, defensive solutions. The monitoring can be AI-aided (or AI is already acting as a catalyst since the very defenses hu- in the future hybrid-active-AI-aided) but human analysts manity now crafts can broaden, deepen and refine the scope of are indispensable for a deep semantic understanding [Al- explanations i.a. also about better explanations – an unceas- iman et al., 2021]. In short, also here, we suggest an in- ing but also potentially strengthening safety relevant quest. terconnected 3-layered epistemic framework with adver- sarial, defensive and hybrid-active-AI-aided elements. References 3. Adversarial instead of (self-)compliant: As advanced [Aliman and Kester, 2020] Nadisha-Marie Aliman and Leon adversarial strategy which would also require respon- sible coordinated vulnerability disclosures [Kranenbarg Kester. Facing Immersive “Post-Truth” in AIVR? Philoso- et al., 2018], one could perform red teaming, penetra- phies, 5(4):45, 2020. tion tests and (adaptive) attacks employing AI-generated [Aliman et al., 2021] Nadisha-Marie Aliman, Leon Kester, “fake data and experiments”, “fake papers” and “fake re- and Roman Yampolskiy. Transdisciplinary AI Observa- views” [Tallón-Ballesteros, 2020]. Candidates for a blue tory—Retrospective Analyses and Future-Oriented Con- team are e.g. reviewers and editors. Concurrently, urgent tradistinctions. Philosophies, 6(1):6, 2021. AI-related plagiarism issues arise [Dehouche, 2021]. [Aliman, 2020a] Nadisha-Marie Aliman. Hybrid Cognitive- Affective Strategies for AI Safety. PhD thesis, Utrecht Uni- 4 Conclusion and Future Work versity, 2020. For requisite variety, we introduced a complementary generic [Aliman, 2020b] Nadisha-Marie Aliman. Self-Shielding epistemic defense against not yet prevalent but technically Worlds. https://nadishamarie.jimdo.com/clipboard/, 2020. feasible SEA AI attacks. This generic approach fore- Online; accessed 23-November-2020. grounded explanation-anchored, trust-disentangled and ad- versarial features that we instantiated within two illustrative [Amodei et al., 2016] Dario Amodei, Chris Olah, Jacob use cases involving language models: AI-generated samples Steinhardt, Paul Christiano, John Schulman, and Dan to fool security engineering practices and AI-crafted contents Mané. Concrete problems in AI safety. arXiv preprint to distort scientific writing. For both use cases, we compactly arXiv:1606.06565, 2016. [Ashby, 1961] W Ross Ashby. An introduction to cybernet- [Collier and Sarkis, 2021] Zachary A Collier and Joseph ics. Chapman & Hall Ltd, 1961. Sarkis. The zero trust supply chain: Managing supply [Ashby, 2020] Mick Ashby. Ethical regulators and super- chain risk in the absence of trust. International Journal ethical systems. Systems, 8(4):53, 2020. of Production Research, pages 1–16, 2021. [Dehouche, 2021] Nassim Dehouche. Plagiarism in the age [Ashkenazy and Zini, 2019] Adi Ashkenazy and Shahar of massive Generative Pre-trained Transformers (GPT-3). Zini. Attacking Machine Learning – The Cylance Case Ethics in Science and Environmental Politics, 21:17–23, Study . https://skylightcyber.com/2019/ 2021. 07/18/cylance-i-kill-you/Cylance%20-%20Adversarial% 20Machine%20Learning%20Case%20Study.pdf, 2019. [Deutsch, 2011] David Deutsch. The beginning of infinity: Skylight; accessed 24-May-2020. Explanations that transform the world. Penguin UK, 2011. [Baris and Boukhers, 2021] Ipek Baris and Zeyd Boukhers. [Deutsch, 2016] David Deutsch. The logic of experimental ECOL: Early Detection of COVID Lies Using Content, tests, particularly of Everettian quantum theory. Studies in Prior Knowledge and Source Information. arXiv preprint History and Philosophy of Science Part B: Studies in His- arXiv:2101.05499, 2021. tory and Philosophy of Modern Physics, 55:24–33, 2016. [Barrett, 2017] Lisa Feldman Barrett. Functionalism cannot [Fallis, 2020] Don Fallis. The Epistemic Threat of Deep- save the classical view of emotion. Social Cognitive and fakes. Philosophy & Technology, pages 1–21, 2020. Affective Neuroscience, 12(1):34–36, 2017. [Fickinger et al., 2020] Arnaud Fickinger, Simon [Benz and Chatterjee, 2020] Michael Benz and Dave Chat- Zhuang, Dylan Hadfield-Menell, and Stuart Russell. terjee. Calculated risk? A cybersecurity evaluation tool Multi-principal assistance games. arXiv preprint for SMEs. Business Horizons, 63(4):531–540, 2020. arXiv:2007.09540, 2020. [Floridi et al., 2018] Luciano Floridi, Josh Cowls, Monica [Boneh et al., 2019] Dan Boneh, Andrew J Grotto, Patrick Beltrametti, Raja Chatila, Patrice Chazerand, Virginia McDaniel, and Nicolas Papernot. How relevant is the Tur- Dignum, Christoph Luetge, Robert Madelin, Ugo Pagallo, ing test in the age of sophisbots? IEEE Security & Privacy, Francesca Rossi, et al. AI4People—an ethical framework 17(6):64–71, 2019. for a good AI society: opportunities, risks, principles, and [Bostrom, 2017] Nick Bostrom. Strategic implications of recommendations. Minds and Machines, 28(4):689–707, openness in AI development. Global policy, 8(2):135– 2018. 148, 2017. [Floridi, 2018] Luciano Floridi. Artificial intelligence, deep- [Brown et al., 2020] Tom B Brown, Benjamin Mann, Nick fakes and a future of ectypes. Philosophy & Technology, Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhari- 31(3):317–321, 2018. wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, [Frederick, 2020] Danny Frederick. Against the Philosoph- Amanda Askell, et al. Language models are few-shot ical Tide: Essays in Popperian Critical Rationalism. learners. arXiv preprint arXiv:2005.14165, 2020. Critias Publishing, 2020. [Brundage et al., 2018] Miles Brundage, Shahar Avin, Jack [Hartmann and Giles, 2020] Kim Hartmann and Keir Giles. Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Al- The Next Generation of Cyber-Enabled Information War- lan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Fi- fare. In 2020 12th International Conference on Cyber lar, et al. The malicious use of artificial intelligence: Conflict (CyCon), volume 1300, pages 233–250. IEEE, Forecasting, prevention, and mitigation. arXiv preprint 2020. arXiv:1802.07228, 2018. [Hartmann and Steup, 2020] Kim Hartmann and Christoph [Bufacchi, 2021] Vittorio Bufacchi. Truth, lies and tweets: Steup. Hacking the AI - the Next Generation of Hijacked A consensus theory of post-truth. Philosophy & Social Systems. In 2020 12th International Conference on Cy- Criticism, 47(3):347–361, 2021. ber Conflict (CyCon), volume 1300, pages 327–349. IEEE, [Burden and Hernández-Orallo, 2020] John Burden and José 2020. Hernández-Orallo. Exploring AI Safety in Degrees: Gen- [Heyvaert et al., 2019] Pieter Heyvaert, Ben De Meester, erality, Capability and Control. In SafeAI@ AAAI, pages Anastasia Dimou, and Ruben Verborgh. Rule-driven in- 36–40, 2020. consistency resolution for knowledge graph generation [Carlini et al., 2019] Nicholas Carlini, Anish Athalye, Nico- rules. Semantic Web, 10(6):1071–1086, 2019. las Papernot, Wieland Brendel, Jonas Rauber, Dimitris [Ho et al., 2020] Shirley S Ho, Tong Jee Goh, and Yan Wah Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Leung. Let’s nab fake science news: Predicting sci- Kurakin. On evaluating adversarial robustness. arXiv entists’ support for interventions using the influence of preprint arXiv:1902.06705, 2019. presumed media influence model. Journalism, page [Chesney and Citron, 2019] Bobby Chesney and Danielle 1464884920937488, 2020. Citron. Deep fakes: A looming challenge for privacy, [Jakubowski, 2019] G Jakubowski. What’s not to like? So- democracy, and national security. Calif. L. Rev., 107:1753, cial media as information operations force multiplier. Joint 2019. Force Quarterly, 3:8–17, 2019. [Jobin et al., 2019] Anna Jobin, Marcello Ienca, and Effy [Radford et al., 2019] Alec Radford, Jeffrey Wu, Rewon Vayena. The global landscape of AI ethics guidelines. Na- Child, David Luan, Dario Amodei, and Ilya Sutskever. ture Machine Intelligence, 1(9):389–399, 2019. Language models are unsupervised multitask learners. [Kaloudi and Li, 2020] Nektaria Kaloudi and Jingyue Li. OpenAI blog, 1(8):9, 2019. The AI-based Cyber Threat Landscape: A Survey. ACM [Raji et al., 2020] Inioluwa Deborah Raji, Timnit Gebru, Computing Surveys (CSUR), 53(1):1–34, 2020. Margaret Mitchell, Joy Buolamwini, Joonseok Lee, and [Kindervag, 2010] John Kindervag. Build security into your Emily Denton. Saving face: Investigating the ethical con- network’s DNA: The zero trust network architecture. For- cerns of facial recognition auditing. In Proceedings of the rester Research Inc, pages 1–26, 2010. AAAI/ACM Conference on AI, Ethics, and Society, pages 145–151, 2020. [Kirat et al., 2018] Dhilung Kirat, Jiyong Jang, and Marc [Ranade et al., 2021] Priyanka Ranade, Aritran Piplai, Sudip Stoecklin. Deeplocker–concealing targeted attacks with AI locksmithing. Blackhat USA, 2018. Mittal, Anupam Joshi, and Tim Finin. Generating Fake Cyber Threat Intelligence Using Transformer-Based Mod- [Kranenbarg et al., 2018] Marleen Weulen Kranenbarg, els. arXiv preprint arXiv:2102.04351, 2021. Thomas J Holt, and Jeroen van der Ham. Don’t shoot [Sahlgren and Carlsson, 2021] Magnus Sahlgren and the messenger! A criminological and computer science perspective on coordinated vulnerability disclosure. Fredrik Carlsson. The Singleton Fallacy: Why Current Crime Science, 7(1):1–9, 2018. Critiques of Language Models Miss the Point. arXiv preprint arXiv:2102.04310, 2021. [Leike et al., 2017] Jan Leike, Miljan Martic, Victoria [Seymour and Tully, 2016] John Seymour and Philip Tully. Krakovna, Pedro A Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. AI safety grid- Weaponizing data science for social engineering: Auto- worlds. arXiv preprint arXiv:1711.09883, 2017. mated E2E spear phishing on Twitter. Black Hat USA, 37:1–39, 2016. [Mahlangu et al., 2019] Thabo Mahlangu, Sinethemba Jan- [Smith et al., 2021] Ryan Smith, Karl Friston, and Christo- uary, Thulani Mashiane, Moses Dlamini, Sipho Ngob- eni, Nkqubela Ruxwana, and Sun Tzu. Data Poisoning: pher Whyte. A Step-by-Step Tutorial on Active Inference Achilles Heel of Cyber Threat Intelligence Systems. In and its Application to Empirical Data. PsyArXiv, 2021. Proceedings of the ICCWS 2019 14th International Con- [Tallón-Ballesteros, 2020] AJ Tallón-Ballesteros. Exploring ference on Cyber Warfare and Security: ICCWS, 2019. the Potential of GPT-2 for Generating Fake Reviews of Re- [Makri, 2017] Anita Makri. Give the public the tools to trust search Papers. Fuzzy Systems and Data Mining VI: Pro- scientists. Nature News, 541(7637):261, 2017. ceedings of FSDM 2020, 331:390, 2020. [McGregor, 2020] Sean McGregor. Preventing Repeated [Tully and Foster, 2020] Philip Tully and Lee Foster. Repur- Real World AI Failures by Cataloging Incidents: The posing Neural Networks to Generate Synthetic Media for AI Incident Database. arXiv preprint arXiv:2011.08512, Information Operations. https://www.blackhat.com/ 2020. us-20/briefings/schedule/, 2020. Session at blackhat USA 2020; accessed 08-August-2020. [ÓhÉigeartaigh et al., 2020] Seán S ÓhÉigeartaigh, Jess [Van Noorden, 2014] Richard Van Noorden. Publishers with- Whittlestone, Yang Liu, Yi Zeng, and Zhe Liu. Overcom- ing barriers to cross-cultural cooperation in AI ethics and draw more than 120 gibberish papers. Nature News, 2014. governance. Philosophy & Technology, 33(4):571–593, [Vinnakota, 2013] Tirumala Vinnakota. A cy- 2020. bernetics paradigms framework for cyberspace: Key lens [Ozkan et al., 2021] Bilge Yigit Ozkan, Sonny van Lingen, to cybersecurity. In 2013 IEEE International Conference on Computational Intelligence and Cybernetics (CYBER- and Marco Spruit. The Cybersecurity Focus Area Maturity NETICSCOM), pages 85–91. IEEE, 2013. (CYSFAM) Model. Journal of Cybersecurity and Privacy, 1(1):119–139, 2021. [Wahle et al., 2021] Jan Philip Wahle, Terry Ruas, Norman [Pistono and Yampolskiy, 2016] Federico Pistono and Ro- Meuschke, and Bela Gipp. Are neural language models good plagiarists? A benchmark for neural paraphrase de- man V Yampolskiy. Unethical Research: How to Create tection. arXiv preprint arXiv:2103.12450, 2021. a Malevolent Artificial Intelligence. arXiv e-prints, pages arXiv–1605, 2016. [Zeadally et al., 2020] Sherali Zeadally, Erwin Adi, Zubair [Popper, 1996] Karl Popper. In search of a better world: Baig, and Imran A Khan. Harnessing artificial intelli- gence capabilities to improve cybersecurity. IEEE Access, Lectures and essays from thirty years. Psychology Press, 8:23817–23837, 2020. 1996. [Zhao et al., 2021] Bo Zhao, Shaozeng Zhang, Chunxue Xu, [Popper, 2014] Karl Popper. Conjectures and refutations: Yifan Sun, and Chengbin Deng. Deep fake geography? The growth of scientific knowledge. routledge, 2014. When geospatial data encounter Artificial Intelligence. Car- [Prier, 2017] Jarred Prier. Commanding the trend: Social tography and Geographic Information Science, pages 1– media as information warfare. Strategic Studies Quarterly, 15, 2021. 11(4):50–85, 2017.