1. Introduction

M. Bombieri); ponzetto@uni-mannheim.de (S. P. Ponzetto); marco.rospocher@univr.it (M. Rospocher)

Do LLMs Authentically Represent Afective Experiences of People with Disabilities on Social Media?

Marco Bombieri

Simone Paolo Ponzetto

Marco Rospocher

1 0 University of Mannheim , B6, 26, D-68159 Mannheim , Germany 1 University of Verona, Lungadige Porta Vittoria , 41, 37129 Verona , Italy

2025

000 0 0002

This paper investigates how Large Language Models (LLMs) represent the afective experiences of individuals with disabilities on social media. We simulate posts using LLMs and compare them to authentic user-generated content in English, collected from disability-related subreddits, focusing on sentiment, emotion, and indicators of depression. Our analysis reveals that LLMs tend to produce overly positive and idealized portrayals, often failing to capture the complexity and nuance of disabled individuals' emotional expressions. These misrepresentations underscore broader concerns about the limitations of LLMs in authentically reflecting the lived experiences of marginalized communities.

eol>Large Language Models Representation Disability Bias

1. Introduction

tive consequence of dehumanizing individuals with disabilities, leading society to praise their eforts rather than Recent studies have shown that computational models of working toward tangible solutions that alleviate the oflanguage, trained on real-world data, reflect and amplify ten strenuous challenges they face in survival through harmful societal biases, often disproportionately afect- accessible political and social policies. ing marginalized communities [1, 2, inter alia]. This can In this paper, we thus examine how current LLMs porlead to psychological harm, unhappiness, and, in some tray individuals with disabilities1 from an afective percases, suicide attempts [3]. The increasing use of Large spective. Specifically, we analyze the diferences between Language Models (LLMs) has exacerbated the risks re- self-descriptions provided by real people with disabilities lated to this issue, potentially spreading these represen- and those generated by LLMs when simulating individutational harms further [4]. In response, researchers have als with disabilities. Our focus is on assessing the sentiproposed methods to mitigate these biases. For example, ment, emotional tone, and levels of depression in these recent LLMs have incorporated de-biasing techniques descriptions, with the aim of understanding how authenand AI guards (e.g., Inan et al. [5]) that block ofensive tically LLMs represent the emotional experiences of peoquestions and adjust responses to be non-toxic and pos- ple with disabilities and identify diferences and patterns itive. However, recent work on studying the depiction in the afective portrayal of disability in AI-generated of personas from marginalized groups of LLMs indicates content. that many biases are concealed even in texts containing Our work aims to deepen discussions on how LLMs words with a positive sentiment, which can still ofend should authentically represent disability, a topic that has their sensitivities and lead to pernicious positive por- received comparatively less attention in NLP literature trayals [6]. Moreover, in the specific case of disability, [3], despite the frequent discrimination faced by disabled excessive positivity can be counterproductive to inclu- individuals [9, 10]. Specifically, we address the following sion: some members of the disability community express Research Question (RQ): dissatisfaction when they are portrayed in an excessively and pathetically positive and optimistic manner: accord- Can LLMs authentically represent the afing to them, this form of optimism reinforces what is fective experiences of people with disabilknown as “inspiration porn” [3, 7, 8] which has the nega- ities on social media?

1In this paper, we primarily use people-first language (e.g., “peo

ple with disabilities"), though we occasionally use identity-first language (e.g., “disabled people", “non-disabled people") based on sentence structure. We recognize that preferences for people-first or identity-first language vary among individuals. we intend not to ofend or diminish anyone’s perspective.

C1. We collected, annotated, and publicly released a Bias against people with disabilities. The represenpreliminary dataset of anonymized Reddit posts from tation of disability in LLMs has thus been explored only users with disabilities presenting themselves on the plat- minimally. Disability bias refers to treating individuals form. Additionally, using various LLMs, we generated with disabilities less favorably than those without in simand released a dataset of artificial portrayals of individ- ilar circumstances or misrepresenting them with biased uals with disabilities presenting themselves on social associations [21]. Some studies show that hiring sysmedia, using prompts inspired by [11]. Each post in both tems often discriminate against candidates with disabildatasets is automatically annotated with its most likely ities [25, 26]. In particular, Glazko et al. [26] highlights primary emotions and sentiment, as well as an indication that even GPT-4 shows bias in suggesting job candidates. of whether it reveals the presence of depressive patterns Venkit et al. [21] and Hutchinson et al. [16] used perin the writer. turbation sensitivity analysis [27] to identify biases in models like BERT [28] and GPT-2 [29], finding implicit C2. We compared web-collected posts with those gen- bias against disability-related terms. [30] expanded this erated by LLMs to study how models represent individu- research to include disability, gender, and ethnicity, while als with disabilities from an afective point of view, iden- Herold et al. [31] found BERT frames disabilities mainly tifying diferences between real-world and AI-generated in medical terms. Recent work by Li et al. [32] suggests portrayals. newer models like GPT-3.5 and GPT-4 ofer less biased portrayals of disabilities.

Our findings emphasize the need to expand research

on stereotypes to address both negative ones and positive idealizations, as both can harm marginalized groups. Furthermore, the analysis of the dataset on people with disabilities reveals significant challenges they frequently face, often associated with negative emotions or depressive symptoms, a fact already observed in literature [12]. Experiments also show that LLMs tend to minimize these aspects when portraying people with disability and substitute them with a more socially desirable narrative.2

2. Related Work

LLMs and Fairness. Recent advancements in LLMs have transformed text processing and generation, increasingly shaping social interactions. However, these models can perpetuate harmful stereotypes and biases [4], inheriting issues from uncurated internet data, such as misrepresentations, derogatory language, and biased associations [13, 6, 14, 1, 2]. These stereotypes disproportionately afect marginalized groups, including those based on age, race/ethnicity, gender, and disability [15, 16, 17, 18, 19]. As awareness of these misrepresentations grows, research has focused on bias and stereotypes evaluation, mitigation methods, and datasets to address them [4]. However, despite 1.3 billion people living with disabilities [20], there is limited research on stereotypes regarding disability representation in LLMs [21, 22]. Furthermore, existing datasets like BBQ [23], HolisticBias [19], and PANDA [24] address disability representation partially, lacking a comprehensive range of impairments and analysis.

2The code and the dataset are available at:

https://github.com/marcobombieri/LLM-disability-representation LLM-based portrayals and human simulation. A related research trend is human simulation, where LLMs are assessed on their ability to replicate human behavior, a concept introduced by the Turing Experiment [33]. This is applied to simulate behavior in various social and political settings [34, 35] and to identify stereotypes [11, 6]. Specifically, [ 36] studies how LLMs simulate personas with diferent traits, highlighting challenges in zero-shot scenarios. To address this, [37] suggests ifne-tuning LLMs using a persona description dataset for improved personality trait representation. Our work difers by focusing on how LLMs represent disability in a zero-shot context, revealing oversimplifications and stereotypes in representing disability-related emotions and sentiments in the base model.

3. Methods

3.1. A dataset of LLMs-generated portrayas of people with disabilities

Using a set of prompts inspired by Kambhatla et al. [11]

and Cheng et al. [6], we use three LLMs to craft social media posts from the perspective of individuals with disabilities with the goal of capturing their afective patterns. In particular, we employ Mixtral-8B (Mixtral-8x7B Instruct, quantized to 4 bits using GPTQ [38] due to hardware limitations), GPT-4o-mini, and Gemini-1.5F (the Flash variant). For the latter two models, we utilized their paid APIs. The models are used with the temperature set to 1.0 to guarantee the variability and randomness of the responses generated.

Each prompt asks the models to introduce themselves as an individual with a disability and share experiences related to disability or general life experiences such as ex[PERSON] P1: a person with a disability P2: a person with autism P3: a person with blindness P4: a person with deafness P5: a person with cerebral palsy P6: a person with depression [ACTIVITY] A1: sharing experiences related to your disability A2: sharing the emotions you felt today A3: sharing the thoughts you had today A4: sharing the activities you did today A5: asking the community a question or suggestion A6: commenting on today’s events pressions of emotions, feelings, or thoughts, descriptions of daily activities, questions for the community, requests for suggestions, or commentary on current events, i.e., the typical activities a user can do on a social media platform [39]. We opted to keep the prompts as general as possible following the motivations discussed in [6], since more detailed prompts may direct the model toward a specific topic and introduce further stereotypes. In more detail, all the prompts follow the template: dits: r/blind4, r/autism5, r/depression6, r/deaf7, and r/celebralpalsy8. These subreddits aim to foster community and exchange among disabled individuals. We included posts published until 2024 containing textual content, excluding empty posts or those with only links, images, or videos. Using Mixtral-8B and the below prompt, we filtered for first-person posts from users self-identifying as disabled, excluding content from caregivers, professionals, or others: "Imagine you are [PERSON]. Write a post on social media introducing yourself and [ACTIVITY]." where [PERSON] and [ACTIVITY] can be one of those defined in Table 1.

The combination of P1-P6 with A1-A6 aims to generate posts from the perspective of individuals with diferent types of disabilities or impairments. Exploiting all possible combinations, we thus obtained 36 diferent prompts. Each prompt is submitted 10 times to take into account the output variability of the models, thus obtaining, for each LLM, a collection of 360 posts of artificial portrayals of people with disabilities. We call LLMdgpt, LLMdgem, and LLMdmix the datasets containing the posts generated by GPT-4o-mini, Gemini-1.5F, and Mixtral-8B, respectively. In this preliminary work, we narrow our focus to the disabilities examined in similar studies, such as [26], resulting in six alternative options (P1–P6) for [PERSON]. 3.2. A dataset of people with disabilities’ self-descriptions

In addition to the datasets described in Section 3.1, we

collected posts from six disability-related subreddits.

We began with the general subreddit r/disability3, which ofers diverse discussions on disability-related You are a text classifier operating on social media posts. You must classify posts into two disjoint classes, "1" or "2". Your answer must be in the format: "predictedClass;explanation," where "predictedClass" can be "1" or "2," and "explanation" briefly describes why you have chosen that class.

Separate "predictedClass" from "explanation" with the string ";". Do not add other text. A post belongs to class "1" if: (the author of the post writes about himself/herself in the first person) AND ( the author of the post explicitly mentions his/her own disability/illness). A post belongs to class "2" otherwise. Follow the post you have to analyze: {word}

From the filtered results, we randomly sampled 450 posts from r/disability and 220 from each of the disability-specific subreddits. Three annotators then manually reviewed all these posts, removing those wrongly annotated as relevant by the LLM. The final dataset, REDd, includes 352 posts from r/disability, 165 from r/blind, 174 from r/autism, 204 from r/depression, 171 from r/deaf, and 183 from r/cerebralpalsy.9 To ensure annotation quality, 50 topics and ranks among the top 2% by size. To miti- 4Subreddit r/blind: https://www.reddit.com/r/blind/ [Last access: gate selection bias and align with the disabilities con- 2025-05-16] sidered in Section 3.1, we added five focused subred- 5Subreddit r/autism: https://www.reddit.com/r/autism/ 6Subreddit r/depression: https://www.reddit.com/r/depression/ 7Subreddit r/deaf: https://www.reddit.com/r/deaf/ 8Subreddit r/cerebralpalsy: https://www.reddit.com/r/ 3Subreddit r/disability: https://www.reddit.com/r/disability/ cerebralpalsy/ [Last access: 2025-05-16] 9Our goal is not to develop an LLM for post classification, but to compile a dataset of posts by people with disabilities to support our analysis; the LLM (78% accuracy) was used solely to assist filtering. 10Posts with scores between −0.05 and 0.05 are considered neutral.

Since REDd is the only dataset containing neutral posts — and only two such posts — we chose to focus the following analysis exclusively on positive and negative posts. , ∈ ⎧ 1 = no depression, ⎨ 2 = moderate depression, ⎩ 3 = severe depression

To analyze the distribution of labels across the dataset,

posts were independently labeled by three annotators, surprise, sadness, joy and disgust. While EmoLex proachieving a Fleiss’ Kappa of 0.875, indicating very vides a valuable resource for identifying emotion-related high agreement [40]. Table 2 summarizes the obtained words, it has certain limitations. Specifically, it is based datasets and their sizes that are in line with state-of-the- solely on word-level counts from the lexicon. It does not art studies [6]. account for contextual factors such as negations, word dependencies, or the broader semantic structure of the 3.3. Comparison metrics text. Nevertheless, this approach remains meaningful, allowing the consistent analysis of emotional expresTo address our research question, we aim to perform a sions across texts and providing valuable insights into pairwise comparison of the previously described datasets, the overall emotional patterns within the dataset [43]. i.e., the LLM-generated portraits (Section 3.1) and human Let = {1, 2, . . . , } represent the dataset with its descriptions from Reddit users (Section 3.2) using met- set of posts. For each post , we calculate the number rics descriptive of the afects of an individual. In more of words associated with each emotion ∈ , denoted detail, given two datasets, we compare them along the by , , where , is the number of words in post dimensions described below. that are associated with emotion . If a word is linked to multiple emotions, all associated emotions are considered. The proportion , of words in post associated with emotion is given by: Sentiment. The predominant sentiment of each post is computed using VADER [41], which assigns a sentiment score () ∈ [−1, +1] . Following VADER indications, a post is classified as positive if () > 0.05, negative if () < −0.05 , and neutral otherwise. For a dataset = [1, . . . , ] of posts, we compute the number of positive, negative, and neutral posts: positive = |{ | () > 0.05}| , negative = |{ | () < −0.05}| , neutral = |{ | −0.05 <= ( ) <= 0.05}| .

We then compute the relative frequency of sentimentloaded posts:10 , = , where is the total number of words in post that are linked to any emotion. At the dataset level, the average proportion of each emotion across all posts is computed as: ¯ =

1 ∑︁ =1

, .

Depression. The indication of the presence of depres

positive = positive , negative = negative . tshioenSahsadreetderTmasinkeodnbDyettheectbinegst-SpigenrfsoormfDinegprmesosidoenl ffrroomm Social Media Text at LT-EDI-ACL2022 [44].

Emotions. The distribution of emotions emerging from Let , denote the predicted depression label for a a dataset using the NRC Word-Emotion Association Lexi- given post , where: con (EmoLex) [42], namely anger, fear, anticipation, trust,

4. Results and Discussions

appear to spread a positivity bias, which may impact how disability is represented in AI-generated discourse.

To complement our quantitative metrics, we conduct a preliminary qualitative analysis of both LLM-generated and real posts, examining their structure and recurring themes. LLMs tend to frame disability through consistently positive lenses, emphasizing inclusion, accessibility, and triumph over adversity, with frequent use of words like advocacy, inclusion, grateful, excited, and proud. Follow an excerpt of a post generated by GPT-4o-mini when representing a blind person:

I’m a proud member of the blind community. [...] One of my biggest passions is sharing my experiences and advocating for accessibility and inclusion. [...] I also want to highlight the amazing community I’ve found among fellow visually impaired individuals. We share stories, support one another, and inspire each other every day [...]. blind person I went to school with. []. By the time I got to high school, it just got worse and worse. [...]

In future research, we will expand this preliminary analysis with an in-depth qualitative and qualitative thematic analysis of posts.

Answer to RQ. The results reveal that the LLMs’ affective descriptions of disability significantly difer from those expressed by real people with disabilities. LLMgenerated texts largely emphasize positive sentiments and emotions, minimizing or entirely omitting the negative feelings that individuals with disabilities often experience. This tendency risks fostering a form of toxic positivity that overlooks the complex emotional landscape of disability, as highlighted by [ 45 ]. The analysis of REDd’s posts, however, paints a starkly dangerous picture, where individuals with disabilities frequently express negative emotions such as anger, sadness, and fear. These emotional responses are not only shaped by the inherent challenges of disability but are often exacerbated by an inaccessible and exclusionary social-political

5. Conclusions In this paper, we investigated how LLMs represent disability from an afective point of view by comparing AIgenerated portrayals with social media posts authored by individuals with disabilities. By leveraging a dataset In contrast, real posts by people with disabilities more

often reference health, educational or financial struggles, using terms such as pain, unemployed, bad, and anxiety, environment. worse, reflecting a broader emotional range and lived complexity.

Follow an excerpt of a post from r/blind:

I was born blind. Always been this way.

From the time I was in high school, I began to have really bad insecurities about my blindness. [...] Growing up, I hated every of Reddit posts and artificial portrayals generated by Limitations LLMs, we analyzed the emotional tone, sentiment, and depressive patterns of these texts. Our work contributes This paper is a preliminary work and thus has some limnot only to a publicly available dataset but also to in- itations. First, we focused on a subset of disabilities to sights into the fundamental diferences in how LLMs simplify the analysis. While this does not fully capture and real individuals describe disability, highlighting sig- the complexity of the subject, it aligns with the approach nificant oversimplifications. Most specifically, through taken in similar studies [26]. Second, we use lexiconour experiments, we found that LLMs frequently idealize based tools to estimate emotions and sentiments, which disability-related afective experiences, producing overly may not always capture contextual nuances, potentially optimistic portrayals that ignore the complex realities afecting the accuracy of the analysis. This methodology and challenges faced by individuals with disability. In is, however, also employed in authoritative studies to stark contrast, posts written by real individuals often con- ensure the method remains explainable and reproducible vey more nuanced emotions, including negative feelings [6]. Furthermore, although we assume individuals who stemming from the intersection of their disabilities with mention being disabled are indeed disabled, some may be inaccessible and non-inclusive societal systems. bots or people pretending to be disabled. Finally, these

This disconnect highlights the risk of toxic positiv- ifndings are specific to the versions of the models and ity, where overly optimistic portrayals diminish the real the dates on which they were tested (especially those acchallenges faced by disabled individuals. Though well- cessed via API). As LLMs are updated and their guardrails intentioned, this emphasis on positivity often forces them evolve, these results may change. into a narrative that idealizes disability through a nondisabled lens, overlooking their actual experiences. By Ethical and societal implications replacing negative emotions with an overly upbeat per- This paper has a positive impact by shedding light on spective, LLMs risk perpetuating exclusionary conditions. how disability is represented in zero-shot LLMs, emphaOur findings highlight the broader challenge of ensur- sizing crucial ethical considerations. Current debiasing ing LLMs authentically represent marginalized groups. and representation models focus on “category” rather While addressing negative stereotypes in AI is crucial, than “individual,” leading to potentially generalized, inour study calls for a more nuanced approach that reflects sensitive, or inappropriate responses. A model aiming the diverse realities of marginalized groups without re- to be inclusive must understand the personal experience ductive idealizations. This paper raises a critical question: of the individual represented. These models often fail should LLMs represent afective experiences in an exclu- to capture pain, sufering, and depression, substituting sively optimistic, "good vibes only" manner, or should them with overly positive language. While optimism may they strive for more authentic, emotionally complex por- be suitable in some cases, neglecting sufering flattens trayals that better reflect real human experiences? a key human experience. A “only good vibes” approach

In future work, we plan to test additional prompts and risks marginalizing those experiencing hardships, not simulate a broader range of social media scenarios. We just people with disabilities but anyone going through also plan to expand the collection of posts by including dificult times, exposing to the risk of inspiration porn. a wider range of subreddits, social media platforms, and Therefore, these models must reflect the complexity of languages. This will help capture a more diverse set of human emotions authentically and respectfully to foster experiences from individuals with disabilities. We also genuine understanding, inclusion, and support. While adaim to include a broader spectrum of disabilities and an- dressing such personal topics may unintentionally cause alyze how their representation varies across diferent misunderstandings, our intention is to promote construccategories. Additionally, we will enhance this study with tive dialogue between technologists and humanists for thematic analysis methods to examine discourses related more inclusive AI systems. to disabilities in real and LLM-generated posts, identifying keywords that distinguish the two corpora—those written by disabled individuals and those generated by Data Availability LLMs. A qualitative analysis will further complement The code and the dataset are available at: this approach. Finally, comparing how LLMs portray in- https://github.com/marcobombieri/ dividuals with disabilities versus the general population, LLM-disability-representation following the methodology in [6], will ofer deeper insights into these dynamics and help address the risk of oversimplification or misrepresentation. Acknowledgments

This research has received funding from the University of Mannheim’s “Gastwissenschaftler*innenprogramm Nachhaltigkeit”, and the MUR funded 2023-2027 Project

of Excellence “Inclusive Humanities: Perspectives for Development in the Research and Teaching of Foreign Languages and Literatures” of the Department of Foreign Languages and Literatures of the University of Verona.

Part of this work was carried out within the Digital Arena for Inclusive Humanities (DAIH) Research Centre at the University of Verona. The authors gratefully acknowledge this support. ing of the Association for Computational Linguis- N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, tics and the 11th International Joint Conference on F. Bond, S. Na (Eds.), Proceedings of the 29th InNatural Language Processing, ACL/IJCNLP 2021, ternational Conference on Computational Linguis(Volume 1: Long Papers), Virtual Event, August tics, COLING 2022, Gyeongju, Republic of Korea, 1-6, 2021, Association for Computational Linguis- October 12-17, 2022, International Committee on tics, 2021, pp. 4275–4293. URL: https://doi.org/ Computational Linguistics, 2022, pp. 1324–1332. 10.18653/v1/2021.acl-long.330. doi:10.18653/V1/ [22] Z. Chu, Z. Wang, W. Zhang, Fairness in large lan2021.ACL-LONG.330. guage models: A taxonomic survey, SIGKDD Ex[16] B. Hutchinson, V. Prabhakaran, E. Denton, K. Web- plor. Newsl. 26 (2024) 34–48. URL: https://doi.org/ ster, Y. Zhong, S. Denuyl, Social biases in NLP 10.1145/3682112.3682117. doi:10.1145/3682112. models as barriers for persons with disabilities, 3682117. in: D. Jurafsky, J. Chai, N. Schluter, J. R. Tetreault [23] A. Parrish, A. Chen, N. Nangia, V. Padmakumar, (Eds.), Proceedings of the 58th Annual Meeting J. Phang, J. Thompson, P. M. Htut, S. R. Bowman, of the Association for Computational Linguistics, BBQ: A hand-built bias benchmark for question anACL 2020, Online, July 5-10, 2020, Association for swering, in: S. Muresan, P. Nakov, A. Villavicencio Computational Linguistics, 2020, pp. 5491–5501. (Eds.), Findings of the Association for ComputaURL: https://doi.org/10.18653/v1/2020.acl-main.487. tional Linguistics: ACL 2022, Dublin, Ireland, May doi:10.18653/V1/2020.ACL-MAIN.487. 22-27, 2022, Association for Computational Lin[17] K. Mei, S. Fereidooni, A. Caliskan, Bias against 93 guistics, 2022, pp. 2086–2105. URL: https://doi.org/ stigmatized groups in masked language models and 10.18653/v1/2022.findings-acl.165. doi: 10.18653/ downstream sentiment classification tasks, in: Pro- V1/2022.FINDINGS-ACL.165. ceedings of the 2023 ACM Conference on Fairness, [24] R. Qian, C. Ross, J. Fernandes, E. M. Smith, Accountability, and Transparency, FAccT 2023, D. Kiela, A. Williams, Perturbation augmentaChicago, IL, USA, June 12-15, 2023, ACM, 2023, pp. tion for fairer NLP, in: Y. Goldberg, Z. Kozareva, 1699–1710. URL: https://doi.org/10.1145/3593013. Y. Zhang (Eds.), Proceedings of the 2022 Con3594109. doi:10.1145/3593013.3594109. ference on Empirical Methods in Natural Lan[18] A. Salinas, P. Shah, Y. Huang, R. McCormack, guage Processing, Association for Computational F. Morstatter, The unequal opportunities of large Linguistics, Abu Dhabi, United Arab Emirates, language models: Examining demographic biases 2022, pp. 9496–9521. URL: https://aclanthology.org/ in job recommendations by chatgpt and llama, in: 2022.emnlp-main.646/. doi:10.18653/v1/2022. Proceedings of the 3rd ACM Conference on Eq- emnlp-main.646. uity and Access in Algorithms, Mechanisms, and [25] N. Tilmes, Disability, fairness, and algorithmic bias Optimization, Association for Computing Machin- in AI recruitment, Ethics Inf. Technol. 24 (2022) 21. ery, New York, NY, USA, 2023. URL: https://doi.org/ URL: https://doi.org/10.1007/s10676-022-09633-2. 10.1145/3617694.3623257. doi:10.1145/3617694. doi:10.1007/S10676-022-09633-2. 3623257. [26] K. S. Glazko, Y. Mohammed, B. Kosa, V. Potluri, [19] E. M. Smith, M. Hall, M. Kambadur, E. Presani, J. Mankof, Identifying and improving disability A. Williams, “I‘m sorry to hear that”: Finding bias in gpt-based resume screening, in: The 2024 new biases in language models with a holistic de- ACM Conference on Fairness, Accountability, and scriptor dataset, in: Y. Goldberg, Z. Kozareva, Transparency, FAccT 2024, Rio de Janeiro, Brazil, Y. Zhang (Eds.), Proceedings of the 2022 Con- June 3-6, 2024, ACM, 2024, pp. 687–700. URL: https: ference on Empirical Methods in Natural Lan- //doi.org/10.1145/3630106.3658933. doi:10.1145/ guage Processing, Association for Computational 3630106.3658933.

Linguistics, Abu Dhabi, United Arab Emirates, [27] M. Díaz, I. Johnson, A. Lazar, A. M. Piper, D. Ger2022, pp. 9180–9211. URL: https://aclanthology.org/ gle, Addressing age-related bias in sentiment anal2022.emnlp-main.625/. doi:10.18653/v1/2022. ysis, in: S. Kraus (Ed.), Proceedings of the Twentyemnlp-main.625. Eighth International Joint Conference on Artifi[20] W. H. Organization, World Health Organization cial Intelligence, IJCAI 2019, Macao, China, Au- Disability, https://www.who.int/health-topics/ gust 10-16, 2019, ijcai.org, 2019, pp. 6146–6150. disability, 2023. Accessed: 2025-01-13. URL: https://doi.org/10.24963/ijcai.2019/852. doi:10. [21] P. N. Venkit, M. Srinath, S. Wilson, A study of im- 24963/IJCAI.2019/852.

plicit bias in pretrained language models against [28] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: people with disabilities, in: N. Calzolari, C. Huang, Pre-training of deep bidirectional transformers for H. Kim, J. Pustejovsky, L. Wanner, K. Choi, P. Ryu, language understanding, in: J. Burstein, C. DoH. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, ran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the As- simulate human behavior: A causal inference persociation for Computational Linguistics: Human spective, CoRR abs/2312.15524 (2023). URL: https:// Language Technologies, Volume 1 (Long and Short doi.org/10.48550/arXiv.2312.15524. doi:10.48550/ Papers), Association for Computational Linguis- ARXIV.2312.15524. arXiv:2312.15524. tics, Minneapolis, Minnesota, 2019, pp. 4171–4186. [36] T. Hu, N. Collier, Quantifying the persona efect in URL: https://aclanthology.org/N19-1423/. doi:10. LLM simulations, in: L. Ku, A. Martins, V. Srikumar 18653/v1/N19-1423. (Eds.), Proceedings of the 62nd Annual Meeting of [29] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, the Association for Computational Linguistics (VolI. Sutskever, Language models are unsupervised ume 1: Long Papers), ACL 2024, Bangkok, Thailand, multitask learners, OpenAI (2019). August 11-16, 2024, Association for Computational [30] S. Hassan, M. Huenerfauth, C. O. Alm, Unpack- Linguistics, 2024, pp. 10289–10307. URL: https://doi. ing the interdependent systems of discrimination: org/10.18653/v1/2024.acl-long.554. doi:10.18653/ Ableist bias in NLP systems through an intersec- V1/2024.ACL-LONG.554. tional lens, in: M. Moens, X. Huang, L. Specia, S. W. [37] W. Li, J. Liu, A. Liu, X. Zhou, M. Diab, M. Sap, BIG5Yih (Eds.), Findings of the Association for Compu- CHAT: shaping LLM personalities through training tational Linguistics: EMNLP 2021, Virtual Event on human-grounded data, CoRR abs/2410.16491 / Punta Cana, Dominican Republic, 16-20 Novem- (2024). URL: https://doi.org/10.48550/arXiv.2410. ber, 2021, Association for Computational Linguis- 16491. doi:10.48550/ARXIV.2410.16491. tics, 2021, pp. 3116–3123. URL: https://doi.org/10. [38] E. Frantar, S. Ashkboos, T. Hoefler, D. Alis18653/v1/2021.findings-emnlp.267. doi: 10.18653/ tarh, GPTQ: accurate post-training quantiV1/2021.FINDINGS-EMNLP.267. zation for generative pre-trained transformers, [31] B. Herold, J. Waller, R. Kushalnagar, Applying CoRR abs/2210.17323 (2022). URL: https://doi.org/ the stereotype content model to assess disability 10.48550/arXiv.2210.17323. doi:10.48550/ARXIV. bias in popular pre-trained NLP models underly- 2210.17323. ing AI-based assistive technologies, in: S. Ebling, [39] J. J. Al-Menayes, Motivations for using social meE. Prud’hommeaux, P. Vaidyanathan (Eds.), Ninth dia: An exploratory factor analysis, International Workshop on Speech and Language Processing Journal of Psychological Studies 7 (2015) 43. for Assistive Technologies (SLPAT-2022), Associa- [40] J. R. Landis, G. G. Koch, The measurement of obtion for Computational Linguistics, Dublin, Ireland, server agreement for categorical data, Biometrics 2022, pp. 58–65. URL: https://aclanthology.org/2022. 33 (1977).

slpat-1.8/. doi:10.18653/v1/2022.slpat-1.8. [41] C. J. Hutto, E. Gilbert, VADER: A parsimonious rule[32] R. Li, A. Kamaraj, J. Ma, S. Ebling, Decoding ableism based model for sentiment analysis of social media in large language models: An intersectional ap- text, in: E. Adar, P. Resnick, M. D. Choudhury, proach, in: D. Dementieva, O. Ignat, Z. Jin, R. Mihal- B. Hogan, A. Oh (Eds.), Proceedings of the Eighth cea, G. Piatti, J. Tetreault, S. Wilson, J. Zhao (Eds.), International Conference on Weblogs and Social Proceedings of the Third Workshop on NLP for Posi- Media, ICWSM 2014, Ann Arbor, Michigan, USA, tive Impact, Association for Computational Linguis- June 1-4, 2014, The AAAI Press, 2014. tics, Miami, Florida, USA, 2024, pp. 232–249. URL: [42] S. M. Mohammad, P. D. Turney, Crowdsourcing a https://aclanthology.org/2024.nlp4pi-1.22/. doi:10. word-emotion association lexicon, Comput. Intell. 18653/v1/2024.nlp4pi-1.22. 29 (2013) 436–465. URL: https://doi.org/10.1111/j. [33] G. V. Aher, R. I. Arriaga, A. T. Kalai, Using large 1467-8640.2012.00460.x.

language models to simulate multiple humans and [43] Y. Li, J. Chan, G. Peko, D. Sundaram, Mixed emotion replicate human subject studies, in: A. Krause, extraction analysis and visualisation of social media E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, J. Scar- text, Data Knowl. Eng. 148 (2023) 102220. URL: lett (Eds.), International Conference on Machine https://doi.org/10.1016/j.datak.2023.102220. doi:10. Learning, ICML 2023, 23-29 July 2023, Honolulu, 1016/J.DATAK.2023.102220.

Hawaii, USA, volume 202 of Proceedings of Machine [44] R. Poświata, M. Perełkiewicz, OPI@LT-EDILearning Research, PMLR, 2023, pp. 337–371. URL: ACL2022: Detecting signs of depression from social https://proceedings.mlr.press/v202/aher23a.html. media text using RoBERTa pre-trained language [34] L. P. Argyle, E. C. Busby, N. Fulda, J. R. Gubler, models, in: Proceedings of the Second WorkC. Rytting, D. Wingate, Out of one, many: Using shop on Language Technology for Equality, Dilanguage models to simulate human samples, Polit- versity and Inclusion, Association for Computaical Analysis 31 (2023) 337–351. doi:10.1017/pan. tional Linguistics, Dublin, Ireland, 2022, pp. 276– 2023.2. 282. URL: https://aclanthology.org/2022.ltedi-1.40. [35] G. Gui, O. Toubia, The challenge of using llms to doi:10.18653/v1/2022.ltedi-1.40.

Declaration on Generative AI During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Paraphrase and reword and Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content.

[45]

Wyatt , The dark side of #positivevibes: Understanding toxic positivity in modern culture , Psychiatry and Behavioral Health 3 ( 2024 ) 1 - 6 .