Ghosts in the AI Emanuele Fulvio Perri1, †, Elio Grande2, † 1, 2 University of Pisa, Largo Bruno Pontecorvo, 3, 56127, Pisa, Italy Abstract This work regards the social side of trustworthiness in the context of Large Language Models (LLMs) according to two congruent shades. Indeed, the first paragraph, drawing aid from a passage of The Science of Logic by G. W. F. Hegel, proposes a qualitative and semantic interpretation of the origin of the so-called “emergent abilities” of LLMs, which are deemed something more complex than a trivial deceit. The second paragraph rather concerns the topic of trustworthiness and responsibility of LLMs from an ethical and phenomenological perspective, proposing a parallelism between the issue of extended mind and the generative transformers as a cognitive extension. The focus lies on the repercussions for the intensive utilization, which can be summarized in the concepts of cognitive depletion and digital dementia, leading to a debasement of precious human qualities – creativity, attention, interpretational ability. Our suggestion, then, first of all trusting—because we have to trust—the critical sense of human users, is directed towards some kind of ethics of AI to introduce in the K-12 category. Our aim remains the wished for design of a pacific coexistence. Keywords Generative AI, emergent abilities, extended mind, hallucinations, cognitive depletion.1 1. Introduction will follow some suggestions regarding the origin of the so-called “emergent abilities” of Large Language A deviation had occurred at the last mile, in the long Models (LLMs)2, developing them through some run of the approval of the Artificial Intelligence Act, considerations about the extensions of the mind. If because of an unexpected technological evolution: the there is a character which is bearer of risk in LLMs, it so-called Foundation Models, generative artificial is their everyday pervasiveness. From Una domanda intelligence devices made of deep neural networks impossibile ad Artemisia Gentileschi [“An impossible good enough to elaborate coherent responses to input question to Artemisia Gentileschi”], the Turing test on prompts, concerning many typologies of data and a sample of more than 1200 participants distributed particularly processing natural language within by various age and education, jointly conceived in diverse conceptual and linguistic domains. The 2023 by the Departments of Computer Science and definitive text of the AI Act – see in particular article Civilization and Forms of Knowledge of the University 51 and annex XIII – provides some criteria of of Pisa, it has emerged that 31,5% of participants was “systemic risk” for general purpose models, among fooled by ChatGPT 3.5 in case of listening, while even other things, in the number of parameters of the 43,5% in case of reading [6]3, when trying to models, in the quality and dimension of datasets and recognize which written composition had been above all in the necessary compute for training, fixing produced by a human. The point, however, is not so the plausible risk threshold to 10^25 FLOPs [16]. We much if to give confidence, but rather how and why. It Ital-IA 2024: 4th National Conference on Artificial Intelligence, organized by CINI, May 29-30, 2024, Naples, Italy † These authors contributed equally. § 2 is written by E. Grande; § 3 is written by E. F. Perri. emanuele.perri@phd.unipi.it (E. F. Perri); elio.grande@phd.unipi.it (E. Grande) 0009-0001-3906-498X (E. F. Perri); 0009-0008-2896-5900 (E. Grande) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 This topic was treated, in nuce, in [8]. 3 The quoted text will be published in the month of May or June 2024 on the journal «Mondo Digitale». We thank the authors for their courtesy. CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings will not be proposed here a general design model to Nothing, however, would let us think that artificial adequately mitigate the systemic risk produced by intelligence presents the fundamental property (that LLMs: a too hard task. We will rather go for hunting was) of the soul, «a being which in conformity with its ghosts, attempting to get closer to the nature of kind of being is suited to “come together with any deception, hoping to make a little step further towards being whatsoever» [10] (p. 12), so much as the trustworthy modes of utilization of currently unpredictable phenomenon of the emergent abilities available devices. of LLMs. Wei et al. [21] define the “emergence” with the Nobel Prize-winner Philip Anderson as qualitative 2. Ars Artificialiter scribendi4 mutations in a system arising from quantitative mutations [2]. Usually, they write, scaling laws allow In The Gutenberg Galaxy [14], noting with to foresee scale effects on systems’ performance. Umberto Boccioni how we were (and still are, we add However, at least with respect to some downstream here) primitives of a new culture – the organic one of tasks, putting the LLMs’ scale on the x-axis (measured the electronic age which would have dulled the human by compute, but also quantity of parameters and consciousness in the period of its first interiorization dataset dimension are useful indexes thereof) and – Marshall McLuhan remembered that the first name performances on the y-axis, the curve does not grow of the typographic printing press was “ars artificialiter up gradually but undergoes sudden variations once a scribendi” (p. 187). Weren’t it for Latin, it seems certain threshold has been passed. «Note» – key point coined yesterday. A way of writing, then, an art, a – «that the scale at which an ability is first observed to practical acting in the same domain of manual writing, emerge depends on a number of factors and is not an which nonetheless had the taste of an artifice. An art immutable property of the ability». Under the of the artificial or, better, an art of elaborating a category of few-shot prompting – that is, tasks certain kind of data – in this case, alphabetic apparently learned after a very small number of input characters – in an artificial manner. instructions in the guise of teachings – comes for If the printing press replaced in fact the inkpot, in example the ability to reply in a truthful way or to map the corporeal movements of the hand although not in conceptual domains. Some performance measures, the intentions, developing LLMs is instead an “ars according to more than one metric, are reported by artificialiter scribendi” whose products appear to take Wei et al. with respect to various typologies of LLMs up the alphabet itself, producing dialogical writing or (LaMDA, GPT-3, Gopher etc.), and the phenomenon of even paradoxically oral. It would seem to be, given emergence appears multiple times, but not always, that we can hardly help ascribing personality, of a fine with a threshold comprised between 10^22 and seduction strategy. Simone Natale [15] reminds us of 10^25 FLOPs. They are certainly tasks akin to human Eliza, the chatbot invented in the Sixties by Joseph intellectual capabilities. However, the missing Weizenbaum, underlining the dramaturgical design, steadiness and univocity, with respect to different according to some “script”, in the responses of new architectures, of the threshold to cross for an ability to chatbots talking about trivial deceit, because it is not emerge, lets us suspect that the emergence of new perceived as such and is plunged into everyday life. qualities in the behavior of such models be, yes, However, it is not just this. Three technological correlated with quantitative increments of compute, breakthroughs allowed the birth of LLMs: the parameters etc., but not by them strictly caused. There representation of the meaning of words through is a semantic threshold beyond which the parts of a embedding, an attention mechanism to catch collection (the ancient Greek would have used here connections among the words themselves, and the the term pân) are subsumed, harmonizing, in a whole implementation of transformers [3]. So, either some (in Greek: olòn) where every branch, every connection mathematics of language does exist, such that LLMs finds a proper meaning. A qualitative, or at least not take possession of meaning – which therefore stops quantitative threshold, as it was in the sorites paradox being «structured by fore-having, fore-sight, and fore- by Eubulides of Miletus: a gap between different conception, […] the upon which of the project in terms dimensions. It might be perhaps useful to reflect, so as of which something becomes intelligible as to make the point on this logical mechanism, on a something» [10] (p. 142) – or it would regard a passage from The Science of Logic by G. W. F. Hegel: correlate of language itself on a parallel platform. «Whenever all the conditions of a fact are completely 4 Thanks to our friend Simone Farinella, PhD in history of philosophy, for the precious advice about the choice of the passage from the Hegelian work reported in this paragraph. present, the fact is actually there; the completeness of standard framework (transparency, fairness, privacy, the conditions is the totality as in the content […]. In etc.), it is appropriate to ask whether a stable social the sphere of the conditioned ground, the conditions trust in such technologies is not promptly impeded have the form (that is, the ground or the reflection that due to a misconception of generative artificial stands on its own) outside them, and it is this form intelligence itself. Clark and Chalmers, in their well- that makes them moments of the fact and elicits known work The Extended Mind, bring up the example concrete existence in them» [9] (p. 483). His aim was of “Otto’s notebook”: Otto is a patient with to rationalize the accidentality (nowadays we could Alzheimer’s disease who, to cope with daily talk about data to correlate) within unique schemes, mnemonic challenges, relies on a bloc-notes on which the “things”, make “real” some things which are just he’s used to jot down and retrieve information that he possible. A dimensional gap, indeed, born by the is no longer aware of, due to his disease. The “analog” crossing of a quantitative threshold – the relationship between Otto and his notebook pours completeness of the conditions, which by themselves into dependence—a blind reliance; Otto’s life remain accidental. The problem of the memories are scattered around in the pages of his representativity of data lies behind the corner. notebook, which is the only acceptable resource for Can an extended net of sequences, like for example reporting on a past and being aware of the present. the hypertext (obviously, simplifying) called “the The phenomenology of the notebook lies in its being web”, overcome that critical mass and reflect, much more than an external resource while retaining adequate itself to a systematic whole, a semantic olòn, its original ontological status: the notebook is a a complex of signifiers? We would be tempted to reply cognitive extension, a ramification of Otto’s mind and, positively: the web is our Zeitgeist. It contains even, a supplement to his memory. Kim Sterelny picks analogies, additions in column, sentiments, errors: the up on Clark and Chalmers by introducing what is a patterns recognized by the emergent abilities of LLMs. full-fledged fair corrective: the notebook, being Supposing to train a model – like a transformer physically outside the body, cannot extend cognitive endowed with 175 billion parameters – on such a net capacities while also guaranteeing the same degree of of sequences as dataset, won’t such patterns or sub- reliability as the resource it replaces (that is, memory) patterns emerge? Without, among other things, real and, therefore, its function is somewhat to support learning: the model runs in inference mode. it—to scaffold it [20]. In other words: external However, it was said that conditions – translated: (informational, datal, executive, …) resources should correlations among data – have their ground outside not be considered reliable to the same extent as themselves. The model just computes. It has only a internal resources since, even though external ones surrogate intelligence and even a large number of collaborate in dense mental associations, they are parameters can’t produce such improvement in disembodied and indirectly managed. Certainly, due quality. But might it be good enough to mirror the to mental plasticity, there are several pros of improvement in quality originally lying in data incorporating external adjuvant resources within the semantics? If so, we could perhaps explain why, to cognitive system—the notebook supplants memory, whom reads on the screen, a string will seem a reply, the cane mitigates claudication, the lens enhances two a discourse, and a thousand a writer, although the vision, etc.—, but the cons, on a risk-benefit scale, are LLM actually speaks alone, according to a hierarchy of significant: (1) reliance on the external resource is the most probable terms. inherently fallacious, since the same degree of integrity as the internal resource cannot be 3. “Somatization” of LLMs: rethinking guaranteed; (2) exposure to the risk of sabotage of the external resource is substantial, both in the sense of ethics of generative AI from a environmental conditioning and in the (rarer, but not phenomenological perspective negligible) sense of targeted attacks; (3) in cases of substitution of the internal resource with an external Continuing the use of the ethical-philosophical one, an acceleration of the depletion of the already lens to study the implications of irresponsible use of damaged internal system can be expected, causing its LLMs (such as GPT-x, LaMDA, LLaMA, Gemini, etc.), it ultimate downfall. In this frame the relationship seems interesting and above all useful to fetch Andy between internal and external environment and the Clark and David Chalmers’ brilliant phenomenological environmental niche is designed—under the same formulations of the concept of extended mind [4] and risky conditions under which sentient beings gain a Kim Sterelny’s concept of scaffolded mind [20]. being-in-the-world [10]. The reflections advanced Thinking of responsible LLMs according to the thus far soon make sense if we reimagine the (progressively obsolescent) concept of human- effect [...] of disempowerment, according to a dynamic machine interaction (HMI) from a phenomenological already known to popular wisdom when it is said that perspective: an environmental niche hinged on the “the muscle that is not used, atrophies.” [...] we have relationship between digital system (a computer, a called this danger “epistemic sclerosis,” meaning [...] model, etc.) and organic system. LLMs, according to the risk of losing the habit of exploring the unknown this interpretation, are the external resource—so and managing, also understood in terms of awareness, appealing, so addictive, so affordable—with which we tolerance and even appreciation, the uncertainty that compensate major “humanliest flaws”—executory affects all our evaluations, estimates, predictions»5 promptness, memory capacity, mundane (pp. 80, 85). transiency—at the risk of self-causing depletion. Cabitza’s is not an apologia for slow-working, nor Very related to this point is the risk of an only is ours meant to be an oracle-like dystopian invective apparently reliable AI: the cognitive depletion against GAI: it is, rather, about recognizing the triggered by a gradual (and not totally voluntary) implications of LLMs on the future of creativity, renunciation of creative and cognitive capacities, information, cultural production, and learning. which today goes hand in hand with the so-called Cognitive depletion [17] arises not from balanced deskilling; we fall into what Manfred Spitzer [18] calls coexistence with technology, but from replacement by digital dementia: an over-reliance on technology that technology, as Adriano Fabris points out at UCSI, on shows potential to replace human capacities can the topic of journalism and AI: induce a decrease in cognitive capacities for information processing and creative production «[...] at best, a deskilling [...], and at worst, (think imagination), implying symptoms close to prospectively, a replacement of what these can do by those of dementia and that regress very slowly by what the AI program can do faster and more fully»6 suspending the use of that given technology. Spitzer (§2) [5]. writes in Information technology in education: risks and side effects [19] about neuroplasticity and the use Just as the notebook, referred to by Clark and of technology in learning: Chalmers, throws Otto into a relationship of absolute dependence and, virtually, worsens his memory «Given what we know about neuroplasticity, i.e., (sparing him the stresses of exertion), LLMs, with learning and the brain, it is hard to believe that some their features simulating Gestaltic qualities, drag education practitioners and policy makers still believe users into a relationship of dependence that affects that reducing cognitive load is beneficial for the not only the most time-consuming mechanical learner. Quite the opposite is the case: The more effort activities, but also the most human and light ones you have to take, the better the learning outcome» (p. (drafting an e-mail, replying to a message, ...); what are 84). the long-term effects of such a dependence of this extent? At the beginning of the paragraph we made a What Spitzer remarks is the value of direct reference to the fundamental unacceptability of the experience, of concrete and hard doing, for a stable external resource when it has function of cognitive imprint of the information; the full experience, extension, given three key cores; those same three moreover, means taking the needed time—a cores can be repurposed to contribute to a new permission that our postmodern society “of framework for responsible and reliable GAI; in the impatience” often does not grant. In short: doing, present case, for example, considering a multimodal taking the necessary time, on the one hand; outsource transformer as an external resource (with a function for all at once, on the other. The difference between of cognitive extension that is, extended mind), it will, the two approaches is quali-quantitative and lies in if heavily used, necessarily have to produce adjuvant the permanence of the result, as well as in the result effects—it will be notebook, will be cane, will be lens, itself. A similar warning comes from Stefano Cabitza …—and other “castrating” ones: (a) in being an who writes about epistemic sclerosis [7]: external resource, it will not guarantee continuous accessibility, (b) it will be subject to environmental «[...] machines AI, initially conceived to enhance conditioning or manipulation—especially since peculiar capacities of men “for the benefit of men” [...], datasets are generally neither personal nor personally [have ended up] paradoxically to produce an opposite inspectable/customizable (except for sparse 5 English translation provided by the authors. 6 English translation provided by the authors. instances of RLHF like temporary slight changes in and reliable” GAI. The feeling is that we cannot see the model behavior based on user-expressed preferences wood for the trees: the problem lies elsewhere, via A/B testing) —, (c) it will worsen cognitive outside the development and usage patterns of AI capabilities, which are already compromised [13] and systems; the biases are in the training data since they there will be instances of outright dependency. It is mirror what our society has produced to date. To put evident, as the last decades of pocket electronics, it another way: writing a prompt to a chatbot asking phenomenology and philosophy of mind teach (also for the writing of a text à la D.A.F. de Sade and ending showing us several cases of so-called adaptive up complaining about a bias for the degrading phenotypic plasticity), that whatever technology representation of women versus that of a violently shows the prerequisites for cognitive extension is in dominant man is laughable. It would seem right, the long run pejorative of cognitive abilities and, by somewhat, to accept the biases for what they are: extension, of the-being-in-the-world respecting the reflections of what we have been; then, a GAI is all the physiological sharing/reserve alternation. In order to more reliably “responsible and trustworthy” when it build lasting social trust and ensure a healthy transparently represents a state of affairs, not when it coexistence with generative AI and whatever other works of embellishment. The new front in the struggle technology will result—this is also the EU’s approach7 for transparent AI is demystifying the fight against [1]—it’s crucial to talk about ethics: while it is bias; it has to do with the exercise of moral posture, necessary to ensure an ethics in AI, it seems more with confrontation (even unpleasant, so be it), with important to work on an ethics of AI: introducing the history and characterial ideal types [11]—in the teaching of ethics (in general) and AI ethics as early as Weberian sense of simplified idealization. While the K-12 [12] is the only way to lay the groundwork for a difference between character ideal type, persona (as a truly accountable and reliable GAI. Admittedly, the unique combination of attributes defining a certain utterly interdisciplinary nature of such an endeavor is individual), figural restitution and bias is sub sole, it is well-known by this time; it remains, however, that not as clear (to many AI ethicists, but not only) that ethics and law are the only two cartridges to foster the the goals of transparency and trustworthiness are not desired healthy coexistence. Given the “position pursuable by purging bias: only a generalized paper” nature of this contribution, it is worth sensitivity to the use and consequences of generative repeating that the writers’ intent is to emphasize the systems will be able to avert the big issues on the importance of introducing ethics from the earliest horizon. years of schooling: at stake is the replacement of human creativity with generative sterility resulting 4. Conclusion from the statistical prediction of language—P=(W|h), if we talk about word-embedding in NLP—, to disrupt This paper has sought to explore the social side of not only the field of culture, but also the very criteria reliability and accountability with respect to the use of aesthetic-artistic evaluation of written opuses. A of large language models, providing a qualitative and separate parenthesis is to be opened in regard of semantic reading of the origin of the so-called biases management in generative AI—a hot topic in “emergent abilities” of such generative models. The the area of responsible AI practices. “GAI bias” means analysis was supported by parallels between the systematic trend of a generative model to return extended mind and AI-based transformers, winking at outputs biased toward certain responses; the reasons a more phenomenological approach to the problem of why this happens can be attributed to the dataset used GenAI misuses. Even if for a few lines only, we went for training, to implicit assumptions during the “ghost hunting” motivated to investigate in the nature training itself, or even to biases inherent in our society of these systems neither more nor less than what they and thus reflected in the “answers” given by the are. system. That of bias in transformers is often considered a Acknowledgement problem that we still need to solve interdisciplinarily, "FAIR - Future Artificial Intelligence Research" - a problem that undermines the path to “responsible Spoke 1 "Human-centered AI", funded by the 7 «L’approccio etico dell’Unione europea alla intelligenza artificiale è volto a sollecitare una riflessione etico-umanistica sul progresso tecnologico mondiale». (Alpini, 2019, 6); Transl. by the authors: «The European Union's ethical approach to artificial intelligence is intended to prompt ethical-humanistic reflection on global technological progress». European Commission under the NextGeneration EU [12] I. Lee, S. Ali, H. Zhang, D. DiPaola, C. Breazeal, programme, PNRR. Developing middle school students' AI literacy, in Proceedings of the 52nd ACM technical References symposium on computer science education, 2021. pp. 191-197. [1] A. Alpini, Sull'approccio umano-centrico [13] L. A. Manwell, M. Tadros, T. M. Ciccarelli, R. all'intelligenza artificiale. Riflessioni a margine Eikelboom, Digital dementia in the internet del “Progetto europeo di orientamenti etici per generation: excessive screen time during brain una IA affidabile”, «Comparazione e diritto development will increase the risk of Civile», 2019, 2, 1-9. Alzheimer's disease and related dementias in [2] P. W. Anderson, More is different: Broken adulthood, «Journal of Integrative symmetry and the nature of the hierarchical Neuroscience», 2022, 21(1), 028. structure of science, «Science», 1972, [14] M. McLuhan, La Galassia Gutenberg. Nascita 177(4047):393–396. dell’uomo tipografico, transl. S. Rizzo, Armando http://www.lanais.famaf.unc.edu.ar/cursos/e Editore, Roma 1976. m/Anderson-MoreDifferent-1972.pdf (accessed [15] S. Natale, Macchine ingannevoli. Comunicazione, 21/04/2024). tecnologia, intelligenza artificiale, transl. D. A. [3] G. Attardi, Il Bello, il Brutto e il Cattivo dei LLM, Gewurz, Giulio Einaudi Editore, 2022. «Mondo Digitale», 2023, June, 1-16. [16] Parlamento Europeo (2024). Emendamenti del [4] A. Clark, D. Chalmers, The extended mind, Parlamento Europeo alla proposta della «Analysis», 1998, 58(1), 7-19. Commissione. Regolamento (UE) 2024/… del [5] A. Fabris, Giornalismo e intelligenza artificiale: Parlamento Europeo e del Consiglio del ... che la questione etica di cui parla Adriano Fabris. stabilisce regole armonizzate sull'intelligenza Unione Cattolica della Stampa Italiana, artificiale… (legge sull'intelligenza artificiale), 10/02/2024. URL: (COM(2021)0206 – C9-0146/2021 – https://www.ucsi.it/news/opinioni/14595- 2021/0106(COD)), 06/03/2024, giornalismo-e-intelligenza-artificiale-la- https://www.europarl.europa.eu/doceo/docu questione-etica-di-cui-parla-adriano- ment/A-9-2023-0188-AM-808-808_IT.pdf fabris.html (accessed 21/04/2024). (accessed 21/04/2024). [6] A. Fabris, P. Ferragina, I. Horvat, D. Morelli, G. [17] E. F. Perri, Generative artificial intelligence and Prencipe, Filosofia interroga Arte, creative-cognitive depletion: an ethical issue. Drammaturgia sfida IA. Due testi, due podcast, Use and abuse of GAIs and GPTs in the field of per rispondere alla domanda: scrittura umana o culture and education. IA, educación y medios de artificiale?, «Mondo Digitale», 2024 comunicación: modelo TRIC, Dykinson S.L., [forthcoming]. Madrid 2024, (preprint). [7] L. Floridi, F. Cabitza, Intelligenza artificiale: L'uso [18] M. Spitzer, Demenza digitale. Come la nuova delle nuove machine, Bompiani, Milano 2021. tecnologia ci rende stupidi, Corbaccio, 2013. [8] E. Grande, LLMs: il surrogato dello Spirito del [19] M. Spitzer, Information technology in education: mondo, «Fondazione Leonardo – Civiltà delle Risks and side effects, «Trends in Neuroscience Macchine», 18/01/2024, and Education», 3(3-4), 2014, 81-85. https://www.civiltadellemacchine.it/it/news- [20] K. Sterelny, Minds: extended or scaffolded?. and-stories-detail/-/detail/llms-surrogato- «Phenomenology and the Cognitive Sciences», spirito (accessed 19/01/2024). 9(4), 2010, 465-481. [9] G. W. F. Hegel, The Science of Logic, transl. G. Di [21] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Giovanni, Cambridge University Press, 2010. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D: [10] M. Heidegger, Being and Time. A translation of Metzler, E.H. Chi, T. Hashimoto, O. Vinyals, P. Sein und Zeit, transl. J. Stambaugh, State Liang, J. Dean., W. Fedus, Emergent Abilities of University of New York Press, 1996. Large Language Models, in Transactions on [11] B. Hibou, M. Tozy, Ragionare per idealtipi. Machine Learning Research, August 2022, Comprendere con Weber lo Stato arXiv:2206.07682v2 [cs.CL]. contemporaneo in Marocco… e altrove, «Cambio. Rivista sulle Trasformazioni Sociali», 2021, 10(20), 65-83.