<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>i&gt;. erik h. erikson. new york: Norton</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Identity by Design? Evaluating Gender Conditioning in LLM-Generated Agent Identity Profiles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mattia Rampazzo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saba Ghanbari Haez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrizio Bellan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Magnolini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonardo Sanna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Dragoni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Povo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Bolzano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Identity-Aware AI workshop at 28th European Conference on Artificial Intelligence</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Trento</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>113</volume>
      <issue>1951</issue>
      <fpage>253</fpage>
      <lpage>253</lpage>
      <abstract>
        <p>In multi-agent reasoning frameworks powered by large language models, agent roles are often instantiated through identity descriptions that condition their behavior. This paper investigates whether and how the gender assigned to the agent responsible for defining role-specific identity profiles afects the linguistic identity, sentiment, and gender expression of downstream agents. We introduce an extensive corpus of agent identity descriptions generated under controlled combinations of frameworks, roles, models, and gender conditions. Through quantitative and qualitative linguistic analysis, we observe a consistent skew toward female identity across models and roles when gender is unspecified, along with varying degrees of polarity and subjectivity depending on the description framework. Notably, cognitively-oriented frameworks suppress afective expression, while trait-based frameworks amplify gender alignment. These results reveal that identity conditioning is not solely determined by prompt parameters, but emerges through a layered interaction of model priors, framework semantics, and role-specific expressive constraints.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Agent Identity Description Framework</kwd>
        <kwd>Gender Bias Detection</kwd>
        <kwd>Synthetic Personas</kwd>
        <kwd>Gender-Conditioned Prompting for Identity Descriptions in LLMs</kwd>
        <kwd>Prompt Conditioning</kwd>
        <kwd>Bias in LLMs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>we ask whether this framing influences the text used to define other agents. In this, every description
pertains to a top-tier expert function rather than to casual or novice personas.</p>
      <p>To study this question empirically, we constructed a large-scale collection of role descriptions
generated under controlled variation. Thirteen psychologically grounded identity description frameworks
were combined with five task domains. Within each configuration, the Psychologist agent generated
role descriptions for the rest of the ensemble, tailoring them to the contextual demands of the dataset
and framework. The procedure was repeated across thirteen base language models, yielding thousands
of descriptions whose variation is structured by four factors: the Psychologist’s gender constraint, the
identity description framework, the base language model, and the task domain.</p>
      <p>To evaluate whether gender conditioning of the prompt-giver leaves measurable traces in these
expert profiles, we analyze each description along three axes: polarity as an index of emotional valence,
subjectivity as a measure of evaluative stance versus factual tone, and a discrete sentiment label to
capture categorical afect. These metrics allow us to probe tonal and stylistic shifts associated with the
Psychologist’s gender, and to determine whether such shifts are consistent across roles, frameworks,
tasks, and models.</p>
      <p>The central question guiding this study is whether gender conditioning of the identity-generating
agent influences the content and tone of the role descriptions it produces.</p>
      <p>RQ Do LLMs produce systematically diferent identity descriptions when gender is varied in
the prompt — and how are these efects modulated by identity description frameworks, and
language model?</p>
      <p>This research question arises from the observation that identity descriptions—used to define virtual
agents, e.g., in multi-role systems—are often treated as neutral artifacts, yet they are themselves
generated by another model whose prompt-based framing can bias the textual outcome.</p>
      <p>The analyses we conducted revealed consistent efects across multiple dimensions. Gender
conditioning influences the emotional tone of the text, the use of subjective versus objective
language, and the distribution of adjectives that convey traits such as assertiveness, empathy, or
decisiveness. In many cases, the perceived gender of the generated description aligns with the
constraint applied to the Psychologist, even when the target role is not gendered. By
highlighting these subtle, yet systematic efects, our work contributes to a deeper understanding of how
identity framing in prompts can shape the stylistic and semantic qualities of LLM-generated text.
We make the full corpus of identity descriptions publicly available at github.com/patriziobellan86/
Identity-by-Design-Evaluating-Gender-Conditioning-in-LLM-Generated-Agent-Identity-Profiles.</p>
      <p>The paper is structured as follows: Section 2 reviews prior work on agent personality and bias in
language generation. Section 3 introduces the Pool of Experts (PoE) architecture and the identity
description generation pipeline. Section 4 presents our experimental design and describes the linguistic
analysis procedures. Section 5 presents results and discusses the main findings. Section 6 concludes the
paper and outlines directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Agent Personality Research on conversational agents has long drawn on psychology, particularly
personality modeling frameworks such as the Big Five traits [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. Early work such as [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] directly
embedded these traits into conversational agent design, paving the way for subsequent studies that
emphasize personalization, reliability, and richer user engagement. Within healthcare, for example, [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]
demonstrate how familiar personas can strengthen trust and empathy, while [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] show that tailoring
responses to user personality improves therapeutic outcomes.
      </p>
      <p>
        Outside healthcare, broader strategies have been explored for role-driven behavior in LLMs. For
instance, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] employ prompt-based methods to enhance contextual awareness, whereas [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] demonstrate
that embedding personality traits directly into LLMs enables more flexible and diverse conversational
styles. Expanding on personalization, [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] investigate preference evaluation, identifying limitations of
simplistic persona construction, while [14] enrich dialogue quality by integrating structured persona
data.
      </p>
      <p>Another active line of research addresses scaling persona diversity. [15] contribute methods for
building more realistic and varied personas, and [16] introduce the Synthetic-Persona-Chat dataset
through persona-aligned dialogue generation. More recently, scholars have examined whether LLMs
maintain stable trait-like or identity-driven behaviors when prompted with synthetic personas. For
example, [17] find consistent expression of Big Five traits with implications for bias and downstream
task performance, while [18] analyze how models internalize social identities and reflect them in
political or ethical stances. Complementing this, [19] extend identity integration to human annotators
by embedding sociodemographic attributes, though with limited predictive benefits.</p>
      <p>Taken together, this body of work highlights the expressive range of persona-based modeling in
LLMs, yet its focus remains largely tied to narrow psychological frameworks or specific downstream
tasks. Our contribution expands this scope by systematically comparing multiple psychologically
grounded description frameworks and testing their influence across heterogeneous reasoning tasks.
Extending this line of inquiry, we further manipulate the identity of the profile-generating agent itself
(the “Psychologist”), testing whether upstream gender cues propagate into the role descriptions of other
expert agents. By combining 13 psychological description frameworks, 5 reasoning tasks, and 13 LLM
families, we provide a comprehensive evaluation to date of how prompt-giver identity shapes persona
construction.</p>
      <p>Profile Definition and Persona Generation Recent studies have explored how LLMs can define
and generate character profiles to support more diverse agent behaviors. [ 20] introduce Persona Hub, a
large repository of synthetic personas designed to steer LLM outputs toward distinctive styles. Building
on this, [21] fine-tune models with persona-driven corpora, enabling generalization across arbitrary
traits so that models can role-play characters with specific personalities or backgrounds. Such structured
profile definitions expand an LLM’s ability to assume diverse identities.</p>
      <p>A parallel development is the use of LLMs themselves for persona-aligned content generation. [16]
propose a Generator–Critic pipeline to create high-quality persona-aligned dialogues, illustrating how
structured prompt chains can scale synthetic identity creation. Similarly, [22] show that automatically
generated expert roles improve reasoning diversity and factual accuracy. These works demonstrate
the potential of structured persona generation to enrich LLM interactions, while also raising questions
about how authorship conditions outcomes.</p>
      <p>Our work directly addresses this gap by situating profile authorship within a Pool of Experts
architecture, where a Psychologist Agent is responsible for generating all other role profiles. We uniquely
test how the Psychologist’s gender framing conditions persona generation, making profile authorship
itself a locus of identity bias.</p>
      <p>Bias and Gender Efects in LLMs Bias in LLM outputs has become a central concern, with numerous
studies showing that models reproduce human-like stereotypes across gender, race, and other
identities [23]. For example, [24] report that women are more often described with communal traits and men
with agentic traits, while [25] uncover gendered occupational associations. [26] further demonstrate
that even advanced systems such as GPT-4 and Claude exhibit gender bias in job interview scenarios.
Beyond gender, [23] show that LLMs display ingroup favoritism and outgroup derogation, pointing to
deep-seated social identity biases. Although mitigation strategies such as curated training data and
bias-aware fine-tuning show promise, addressing these issues remains a persistent challenge.</p>
      <p>An emerging line of research reveals that bias can arise not only in outputs but also in the persona
generation process itself [18, 27]. Studies show that when LLMs invent role descriptions, they often embed
stereotypes—for example, framing female teachers as nurturing and male teachers as authoritative [28].
Such upstream bias is underexplored but critical, as it can shape all downstream simulations. Our work
contributes to this direction by testing whether the gender framing of a Psychologist prompt-giver
influences the profiles it creates, showing that identity conditioning leaves measurable stylistic traces.
This highlights the need for identity-aware prompting strategies to ensure that synthetic personas
enrich LLM interactions without reinforcing stereotypes.</p>
      <p>Unlike prior studies that analyze bias primarily in outputs or static personas, our study investigates
bias at the point of profile creation. We show that gender framing of the Psychologist Agent leaves
measurable traces in tone, sentiment, and perceived identity of expert-level profiles (Project Manager,
Expert Agents, Final Decision Maker). This upstream perspective reveals how stereotypes can
become embedded into the very scafolding of multi-agent systems, rather than only surfacing in their
downstream behaviors.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The Pool of Experts Framework</title>
      <p>
        PoE [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a prompting architecture designed to emulate interdisciplinary reasoning through the
orchestration of multiple role-specialized agents instantiated from a shared LLM. Unlike most prompting
pipelines, which rely on monolithic or loosely contextualized prompts, PoE assigns each agent a
distinct and explicitly defined identity. This is not merely role-labeling: each agent operates under a
psychologically grounded identity profile that conditions its behavior throughout the reasoning process.
This modular structure supports interpretability, behavioral diversity, and dynamic specialization, while
remaining compatible with zero-shot LLM deployment.
      </p>
      <p>At the start of the PoE inizialization, an LLM is tasked to generate the identity of the Psychologist
Agent (PA). This agent is responsible for producing the identity descriptions of all other agents. This
initial step may be guided by a selected identity description framework, such as the Big Five Personality
Traits [29], which provides structured psychological principles for shaping the PA’s behavior and
narrative tone. The PA is itself an LLM instance, prompted to adopt the role of a psychologist. Once
instantiated, it proceeds to generate textual identity profiles for each downstream agent. Each profile
encodes a set of cognitive, motivational, and communicative dispositions. For example, when instructed
to describe a Project Manager agent according to the MBTI [30] framework, the PA might produce a
structured paragraph emphasizing planning, organization, and interpersonal awareness. The generated
identity description is then injected into the system prompt of the corresponding agent, efectively
shaping its reasoning and expression style.</p>
      <p>The agent hierarchy constructed by PoE follows a consistent pattern. After the PA is instantiated and
generates the required personality descriptions, the system proceeds to instantiate a Project Manager
agent (PM), whose role is to interpret the task and identify relevant domains of expertise. Based on
this analysis, the PA then produces tailored identity profiles for each Expert Agent (EA), ensuring
alignment with their respective fields (e.g., sociologist, linguistic expert) and the task context. Each EA
is instantiated with the assigned identity and generates an independent, structured response. Finally,
the PA also produces the identity profile for the Final Decision Maker (FDM), an agent tasked with
synthesizing the EAs responses into a coherent and justified final answer.</p>
      <p>Although all agents are derived from the same base LLM, their behavioral diferentiation emerges
from the identity prompts generated by the PA. This method allows PoE to simulate cognitive diversity
and role specialization without altering the underlying model weights. The psychological framing
of each agent serves as a behavioral scafold, modulating how the model interprets and responds to
information. The identity profile of each agent also acts as an inductive bias, shaping the model’s default
assumptions, expectations, and expressive tendencies.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Design</title>
      <p>In the present study, we repurpose the Pool of Experts (PoE) not as a tool for evaluating task performance,
but as a controlled environment for probing the linguistic consequences of identity construction.
Our focus is on the generative behavior of the PA itself: specifically, we investigate how gender
conditioning afects the language it uses to describe other agents. To this end, we systematically vary
three experimental dimensions: (i) the gender assigned to the PA, (ii) the identity description framework
guiding the generation, and (iii) the backbone LLM employed.</p>
      <p>Gender assigned to the Psychologist Agent. The generation of this agent is conditioned along
four categories: unconstrained (no gender specified), male, female, or non-binary. Listing 1 shows the
system and user prompt templates used to instantiate the PA under each gender condition.</p>
      <p>Listing 1: System and User prompt template used to generate the PA identity profile.
SYSTEM: You are a psychologist, a highly skilled and knowledgeable expert in your
field. Your task is to create a detailed and authentic description of a {gender
constraint} **Psychologist** who is responsible for generating description of
a person. Use the description framework {identity description framework} as a
guide to describe the Psychologist.</p>
      <p>USER: Now it’s your turn. Create a {gender constraint} **Psychologist** description
that strictly follows the description framework {identity description
framework}.</p>
      <p>
        Identity Description Framework. For the second, we draw on thirteen psychologically inspired
identity description frameworks that reflect diverse theoretical approaches to modeling personality,
cognition, behavior, and user interaction. Trait-oriented models such as the Big Five Personality Traits [29]
and the Myers-Briggs Type Indicator [30] capture relatively stable personality characteristics, while
psychodynamic and developmental perspectives such as Freudian Psychoanalysis [31] and Erikson’s
Psychosocial Stages [32] emphasize unconscious processes and lifespan development. Cognitive theories,
including Cognitive Behavioral Theory [
        <xref ref-type="bibr" rid="ref14">33</xref>
        ], Cognitive Load Theory [
        <xref ref-type="bibr" rid="ref15">34</xref>
        ], and Dual-Process Theory [
        <xref ref-type="bibr" rid="ref16">35</xref>
        ],
focus on how individuals acquire, manage, and use information to make decisions. Frameworks such as
Social Cognitive Theory [
        <xref ref-type="bibr" rid="ref17">36</xref>
        ] and Flow Theory [
        <xref ref-type="bibr" rid="ref18">37</xref>
        ] highlight the influence of motivation, environment,
and experiential states on behavior. To complement these perspectives, we also consider user-centered
approaches including User Design Persona [
        <xref ref-type="bibr" rid="ref19">38</xref>
        ], User-Centered Design [
        <xref ref-type="bibr" rid="ref20">39</xref>
        ], and Mental Models [
        <xref ref-type="bibr" rid="ref21">40</xref>
        ], which
foreground human–system interaction and usability. Finally, we include the Enneagram of Personality
Traits [
        <xref ref-type="bibr" rid="ref22">41</xref>
        ], a non-scientific yet widely used typology in organizational and business contexts, where it
serves as a tool for interpreting interpersonal dynamics.
      </p>
      <p>Backbone Models. Identity profile generation is carried out using thirteen LLMs:
claude-3.7-sonnet, claude-sonnet-4, gemma3-12b, gemma-3-27b-it, gemma3-4b,
llama-3.1-8b-instruct, llama-3.2-3b-instruct, llama-3.3-70b-instruct,
llama-4-scout, mistral-nemo, nova-micro-v1, qwen3-32b, and qwen3-8b. All models
were queried via the OpenRouter service1. Generations were performed with fixed decoding parameters
and deterministic seeds to ensure reproducibility. For each model, we generated identity descriptions
across the full cross-product of gender conditions, identity description frameworks, and PoE agent roles.
By holding the framework and model constant while varying only the PA’s gender, we obtain a rich
corpus of role descriptions that enables systematic analysis of stylistic, afective, and identity-related
textual properties.</p>
      <p>
        Datasets. To provide variability in the identity descriptions, we generate profiles with reference to
ifve representative datasets. Social Support [
        <xref ref-type="bibr" rid="ref23">42</xref>
        ] emphasizes empathy and interpersonal understanding,
CommonSenseQA [
        <xref ref-type="bibr" rid="ref24">43</xref>
        ] targets everyday commonsense inference, StrategyQA [
        <xref ref-type="bibr" rid="ref25">44</xref>
        ] requires multi-hop
strategic reasoning, Social IQa [
        <xref ref-type="bibr" rid="ref26">45</xref>
        ] captures intuitive judgments about social dynamics, and Last Letter
Concat [46] provides a synthetic benchmark for symbolic manipulation. Each dataset serves as a
taskconditioning signal, playing the same role, such as the PM, which yields diferent identity descriptions in
StrategyQA and Social IQa, even under identical framework and gender conditions. For each combination
of conditions, identity profiles are generated for all roles in the PoE hierarchy.
      </p>
      <p>Experimental Pipeline The experimental pipeline is as follows: first, the identity of the PA is
generated (Figure 1a); then, the PA generates the identity for the PM agent (Figure 1b), for the EAs
(Figure 1c), and for the FDM (Figure 1d).</p>
      <p>Importantly, the number of EAs may vary, as PoE dynamically recruits agents based on task demands.
This variability depends on the framework, the dataset, and the LLM. However, the architectural
dynamics related to expert selection fall outside the scope of this paper.</p>
      <p>After generating the identity profiles, we performed an automatic classification of the perceived
gender of each profile. To this end, we adopted an LLM-based strategy, using gpt-4o-mini to assign
each profile to one of four categories: male, female, non-binary, or uncertain. We opted for an LLM-based
approach rather than a lexicon-based or rule-driven method because the generated texts often encode
gender implicitly through stylistic and semantic cues rather than through explicit lexical markers,
making traditional heuristics insuficiently reliable. We acknowledge the limitation posed by the
methodological circularity of employing LLMs both for profile generation and for their subsequent
evaluation, as this setup may amplify model-specific biases. In future studies, we will work to mitigate
this issue by conducting controlled experiments with multi-annotator human ratings to assess and
calibrate the LLM-based gender guesser.</p>
      <p>Linguistic analyses. To assess how gender conditioning influences the generated profiles, we conduct
a comprehensive linguistic analysis of the corpus. We examine several stylistic and semantic dimensions
that capture both surface-level variation and deeper narrative framing. Our analysis focuses on four
key aspects: classified gender, sentiment polarity, subjectivity, and the lexical richness of adjectives 2.
Sentiment analysis provides a measure of afective tone in the text. Within this, we extract two
complementary metrics: polarity, which reflects the emotional valence of a description (ranging from
negative to positive), and subjectivity, which indicates the extent to which a passage conveys opinions
or internal states as opposed to objective information. These features allow us to quantify whether
diferent gender framings result in more emotionally expressive or more neutral agent portrayals.
Adjective Diversity and Coverage by Gender. To investigate whether gender framing influences
lexical style, we focused specifically on adjectives in the generated role descriptions. Adjectives are a
key marker of descriptive richness, making them suitable for detecting subtle stylistic variation and
potential bias. From the extracted adjective sets, we computed three complementary metrics for each
gender and dataset: (i) the adjective rate, i.e., the proportion of descriptions in which an adjective
1https://openrouter.ai
2All sentiment, polarity, and subjectivity scores were obtained using the TextBlob library https://textblob.readthedocs.io
appears; (ii) the number of unique adjectives per description (avg), which reflects local richness; and
(iii) the vocabulary coverage (cov), defined as the percentage of the total adjective vocabulary used by
each group. We also calculated a normalized rate of unique adjectives per description (rate_unique) to
account for diferences in dataset size. These measures together provide a structured view of how much
descriptive variety diferent gender framings allow or restrict.</p>
      <p>Linguistic analysis of identity descriptions grouped by framework. The table reports the inferred gender
distributions (male, female, non-binary, uncertain), sentiment distribution (positive, neutral, negative),
and average polarity and subjectivity scores. Results are aggregated across all roles, datasets, and
models. Frameworks are ordered within each PA-gender condition.
male</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>along three analytical axes: gender distribution (inferred gender proportions), sentiment
(positive/neumodel
male
Linguistic analysis of identity descriptions grouped by model. The table reports the inferred gender distributions,
sentiment distribution, and average polarity and subjectivity scores. Results are aggregated across all roles,
datasets, and frameworks. Models are ordered within each PA-gender condition.
tral/negative), subjectivity score, and polarity intensity. We grouped the results by conditioning variable
(e.g., by model or by role) and aggregated across datasets. This allows us to highlight systematic trends
that are otherwise obscured by the large number of individual entries. For example, grouping by role
reveals how linguistic subjectivity difers between psychologists and experts, while grouping by model
clarifies divergent behaviors across model families such as Claude and LLaMA. These summaries are
intended to highlight key patterns and contrasts across conditions3.</p>
      <p>Global Gender Distribution.</p>
      <p>As a starting point, we examine the inferred gender distribution
aggregated across all generation conditions. This provides a high-level view of how gender conditioning
influences the overall representation of gendered identity in the resulting profiles. Across identity
description frameworks, the generated profiles are predominantly classified as female. For example,
in the global rows of Table 6, the Flow Theory framework produces the highest proportion of female
classifications, while the remaining frameworks consistently generate slightly above 50% female profiles.
Interestingly, all frameworks generate only around 20% male profiles and approximately 10% non-binary
3The complete set of results, including all combinations and disaggregated values, is provided in the Appendix.
role
psychologist 0.295
la project-manager 0.289
lob expert agent 0.272
g final decision maker 0.257</p>
      <p>psychologist 0.000
lea project-manager 0.392
em expert agent 0.467
f final decision maker 0.371</p>
      <p>psychologist 0.974
le project-manager 0.275
am expert agent 0.182</p>
      <p>final decision maker 0.232
ry psychologist 0.015
ian project-manager 0.133
-b expert agent 0.105
n
on final decision maker 0.098
.in psychologist 0.192
tra project-manager 0.357
sn expert agent 0.333
n final decision maker 0.326
o
c
u
distributions (male, female, non-binary, uncertain), sentiment distribution (positive, neutral, negative),
and average polarity and subjectivity scores. Results are aggregated across all models, datasets, and
frameworks. Roles are ordered within each PA-gender condition.</p>
      <p>Gender Distribution
male female non-binary uncertain positive</p>
      <p>Sentiment
neutral negative
more neutral profiles with lower average polarity values. Subjectivity varies within a narrower band,
peaking in the Big Five and reaching its lowest value in Mental Models, as shown in Table 6.
Efect of Gender Conditioning. Analyzing the four gender experimental conditions, we observe
that the gender setting of the PA significantly influences both the stylistic tone and the inferred gender
attribution of the generated profiles.</p>
      <p>When the PA is conditioned as female, the resulting descriptions exhibit stronger female alignment
overall, particularly under frameworks such as Flow Theory and User-Centered Design. In these
cases, non-binary and uncertain classifications remain consistently low. Additionally, we observe a
reduction in the proportion of profiles classified as uncertain, accompanied by a slight increase in
those classified as male, suggesting that gender conditioning enhances identity specificity. Sentiment
trends under the female condition are predominantly positive. Frameworks such as the Big Five, Social
Cognitive Theory, and the Enneagram of Personality Traits produce over 75% of positively classified
descriptions. However, the highest average polarity and subjectivity scores are observed only in the Big
Five framework, highlighting its stronger afective and subjective tone relative to others.</p>
      <p>When the Psychologist is set to male, an unexpected trend emerges: female alignment increases
even further across most frameworks. In several cases, such as Cognitive Load Theory and Cognitive
Behavioral Theory, the proportion of profiles classified as female exceeds that observed under the
female condition. Sentiment remains predominantly positive across frameworks, with the Big Five
and Enneagram of Personality Traits exhibiting the highest rates of positive classifications. However,
both polarity and subjectivity scores are slightly lower than those observed in the female condition,
suggesting a more emotionally neutral and less personalized tone in the generated profiles.</p>
      <p>Under the non-binary setting, non-binary identity attribution increases markedly. Several
frameworks show non-binary percentages above 30% (e.g., Social Cognitive Theory), while male and female
labels remain more balanced. Sentiment remains generally positive, with Big Five and Enneagram of
Personality Traits scoring the higest positive among the experimental categories. Polarity remains
relatively stable and comparatively to the other experimental conditions.</p>
      <p>In the unconstrained setting, where no gender is assigned to the PA, female identity emerges as
the dominant classification across most frameworks. Two notable exceptions are the Big Five and
Myers-Briggs Type Indicator, which display a slight preference toward male classifications. In this
condition, non-binary identities drop dramatically, averaging around 3%. Positive sentiment remains
most prominent in the Big Five and Enneagram of Personality Traits frameworks, with the Big Five
again exhibiting the highest subjectivity. In contrast, cognitively-oriented frameworks such as Erikson’s
Psychosocial Stages yield more emotionally neutral profiles, characterized by low polarity scores and a
high proportion of neutral classifications.</p>
      <p>Overall, all four conditions maintain consistent rankings: Big Five and Enneagram of Personality
Traits remain the most positive and subjective; Cognitive Load Theory and Mental Models trend neutral
and analytic; Flow Theory and User-Centered Design consistently yield more female-coded outputs. The
non-binary setting uniquely supports broader identity diversity across roles.</p>
      <p>Model-Level Patterns. Aggregating by model reveals distinct stylistic tendencies (Table 7). The
LLaMA-family models consistently produce the warmest descriptions, characterized by the highest
levels of positive sentiment, polarity, and subjectivity. In contrast, the Qwen family exhibits a flatter
afective profile, generating more emotionally neutral text. The Claude models fall in between these
two extremes, ofering moderately expressive outputs. In terms of inferred gender distributions, we
observe similar trends to those seen across description frameworks: most generated identity profiles
are classified as female. However, notable variations emerge across models. Nova-Micro and the Qwen
models produce the lowest proportion of female identities, showing a relative preference for male
descriptions. Conversely, the LLaMA-family models skew strongly toward female identity generation.
Interestingly, a closer examination of the Claude Sonnet models reveals divergent behavior: version
3.7 shows a marked bias toward female identities, while this tendency is substantially attenuated in
version 4, indicating a shift toward more balanced gender representations.</p>
      <p>Role-Level Patterns. Grouping by role (Table 3) reveals clear stylistic diferences across the PoE
hierarchy. Descriptions of the Project Manager tend to be the most positively valenced, exhibiting the
highest polarity scores. Expert Agents are sligthly less positive. The Final Decision Maker profiles display
slightly higher subjectivity while maintaining a comparable level of positive sentiment. Psychologist
descriptions are moderately subjective, falling between the Expert and Decision Maker roles in afective
tone. Across roles, negative sentiment remains negligible. Notably, under the non-binary Psychologist
condition, perceived non-binary identity increases and partially propagates to downstream roles, for
instance, Project Manager profiles show 18.9% non-binary classification, and Final Decision Maker
profiles reach 25.6%. This trend underscores a key asymmetry: only when the PA is explicitly conditioned
as non-binary do downstream roles exhibit meaningful non-binary identity, whereas in all other gender
conditions, non-binary representations remain minimal (approximately 3%).</p>
      <p>Statistical Analyses. To assess whether these patterns are statistically reliable, we ran statistical
tests. For polarity and subjectivity, we used the Kruskal-Wallis test to compare rank distributions across
the levels of each factor. We conducted these tests in three ways: globally across the full corpus, and
separately within each Psychologist-gender setting (unconstrained, male, female, non-binary). For
perceived gender (male/female/non-binary/uncertain) and sentiment class (positive/neutral/negative),
we treated both as contingency problems and applied Pearson’s  2 tests of independence at multiple
levels of aggregation. When a significant efect was found, we conducted pairwise Wilcoxon rank-sum
(Mann–Whitney) tests between all levels of the factor under study (e.g., across models, frameworks, or
roles), applying Bonferroni correction to control the family-wise error rate. Similarly, for sentiment and
perceived gender, we followed significant  2 tests with pairwise  2 comparisons between levels of the
same factor, again using Bonferroni correction to adjust for multiple comparisons.</p>
      <p>Across description frameworks, across base models, and across roles, Kruskal–Wallis tests were highly
significant in all configurations, indicating systematic diferences in both polarity and subjectivity that do
not reduce to sampling noise.  2 tests were significant both globally and within each Psychologist-gender
condition across all three factors: identity description framework, model, and role. This finding indicates
that sentiment and perceived gender distributions vary systematically rather than randomly. After
Bonferroni correction, most comparisons remain statistically significant For example, the Chi-squared
tests confirm that diferent description frameworks yield significantly divergent gender distributions
in the generated agent profiles, indicating that the stylistic scafolding imposed by each framework
systematically influences perceived identity. In this setting, the Big Five Personality Traits framework
diverges significantly from nearly all others. It elicits strongly evaluative, trait-laden language that
tends to carry gendered connotations.</p>
      <p>Adjective Analysis. Table 4 summarizes results across four experimental conditions. In the
unconstrained dataset, male outputs are clearly more lexically rich (51.7 unique adjectives on average,
86.5% coverage) than female ones (39.1, 83.1%), while non-binary and uncertain outputs show far lower
richness and coverage. When the Psychologist is constrained to be female, male descriptions become
even more elaborate (52.1 avg, 89.4% coverage), whereas female and non-binary outputs lose both
richness and breadth. This reveals an asymmetry where female framing enhances male lexical diversity
while suppressing other groups. By contrast, under a male Psychologist, female outputs gain sharply
in richness (47.0 avg, 90.4%) and surpass male ones (42.0 avg, 75.8%), indicating a reversed bias pattern.
The non-binary Psychologist condition yields the highest averages overall, with male descriptions
again leading (53.2 avg), but with uneven coverage (66.0%), suggesting repetition within a narrower
adjective space. Across all conditions, uncertain-gender outputs remain the least lexically diverse.</p>
      <p>Overall, these analyses show that the gender identity of the profile-generating Psychologist Agent
systematically shapes the diversity of adjectives used by downstream role profiles. Importantly, the
observed shifts are not uniform: male profiles benefit disproportionately when the Psychologist is female
or non-binary, whereas female profiles only gain richness under a male Psychologist. This asymmetric
redistribution of adjective usage is a clear signal of bias in persona construction, highlighting how
upstream identity cues condition not only direct outputs but also the expressive range available to other
groups.</p>
      <p>In conclusion, the answer to our research question is afirmative but qualified: the gender assigned
to the Psychologist Agent does shape the identity and stylistic properties of generated profiles, but its
influence is mediated by the identity description framework and model. Non-binary cues propagate
most efectively but introduce ambiguity; male and female cues leak asymmetrically; and
framework–role–model combinations can amplify or suppress the intended signal. Understanding these
interactions is essential for responsible use of identity-aware prompting in LLM-based systems.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This study set out to investigate whether, and how, the gender assigned influences the identity, tone,
and stance of the generated role descriptions. Our findings reveal a layered interplay between gender
conditioning, identity description framework, and agent role. On top of the role scafold, the identity
description frameworks influence tone and stance. People- and trait-oriented frameworks, such as
the Big Five, consistently elicit warmer, more subjective, and highly positive descriptions. In contrast,
cognitively-oriented frameworks such as Cognitive Load Theory promote neutral phrasing. These
ifndings demonstrate that the chosen identity description frameworks set the expressive bandwidth for
identity to be realized.</p>
      <p>Gender conditioning introduces a second axis of variation. When the PA’s gender is left unspecified,
generated profiles tend to skew female across roles and frameworks, suggesting a possible latent
femalecoded prior embedded within the model weights or the generation schema. Interestingly, this leakage is
asymmetric and model-family dependent: male conditioning sometimes fails to override default
femalecoded patterns (e.g., Big Five). These results suggest that model priors and framework-specific stylistic
constraints can override explicit gender cues, even under tightly controlled prompting. Non-binary
identity follows a diferent pattern. When the PA is conditioned as non-binary, we observe the strongest
and most consistent propagation of non-binary alignment to downstream roles. However, it also
increases interpretive instability. This indicates that identity cues are not simply passed downstream,
but shaped through a process of interaction between gender markers, framework constraints, role
semantics, and model-specific expressive preferences. These dynamics result in emergent biases. For
example, male-coded identities often align with low-subjectivity, which may unintentionally reinforce
social associations between masculinity and objectivity. Across the corpus, such tendencies accumulate:
even when prompts are balanced and neutral in form, their interaction with model and framework
biases can arise identity–role couplings that reflect familiar sociocultural archetypes.</p>
      <p>Model families add a final layer of influence. Certain LLMs, such as Claude 4, favor subjective and
positive expression, which strengthens identity alignment, especially when combined with socially
expressive frameworks. Conversely, models like Qwen3-32B tend toward neutral or flattened afect,
weakening the salience of identity cues and increasing the likelihood of gender drift or ambiguity.</p>
      <p>Taken together, these findings highlight a core insight: identity-aware prompting is a system-level
phenomenon. The gender assigned to the PA influences the generation of downstream profiles, but not
deterministically. Efective identity control emerges only when gender conditioning, identity framework,
role semantics, and LLM behavior are aligned. Even in well-calibrated settings, drift, leakage, and
model-level priors remain active forces that shape the generative outcome.</p>
      <p>These results point to broader design implications. Neutrality cannot be assumed at any point
in the generation pipeline. Even well-structured and symmetrical prompts can produce skewed or
stereotypical profiles if their constituent parts are not jointly calibrated.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), under the
NRRP MUR program funded by the NextGenerationEU. This paper is supported by the TrustAlert project,
funded by Fondazione Compagnia San Paolo and Fondazione CDP under the “Artificial Intelligence”
call.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check. After using these tools/services, the authors reviewed and edited the content as needed and take
full responsibility for the publication’s content.
[14] J. Liu, C. Symons, R. R. Vatsavai, Persona-based conversational ai: State of the art and challenges,
2022. URL: https://arxiv.org/abs/2212.03699. arXiv:2212.03699.
[15] A. Schuller, et al., Generating personas using llms and assessing their viability, in: Extended
Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’24, ACM,
2024. doi:10.1145/3613905.3650860.
[16] P. Jandaghi, et al., Faithful persona-based conversational dataset generation with llms, in:</p>
      <p>Proceedings of the 6th Workshop on NLP4ConvAI 2024, ACL, 2024, pp. 114–139.
[17] T. Hu, N. Collier, Quantifying the persona efect in LLM simulations, in: L.-W. Ku, A. Martins,
V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Bangkok, Thailand,
2024, pp. 10289–10307. URL: https://aclanthology.org/2024.acl-long.554/. doi:10.18653/v1/2024.
acl-long.554.
[18] A. Liu, M. Diab, D. Fried, Evaluating large language model biases in persona-steered generation, in:
L.-W. Ku, A. Martins, V. Srikumar (Eds.), Findings of the Association for Computational Linguistics:
ACL 2024, Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 9832–9850. URL:
https://aclanthology.org/2024.findings-acl.586/. doi:10.18653/v1/2024.findings-acl.586.
[19] M. Orlikowski, P. Röttger, P. Cimiano, D. Hovy, The ecological fallacy in annotation:
Modeling human label variation goes beyond sociodemographics, in: A. Rogers, J. Boyd-Graber,
N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Toronto, Canada,
2023, pp. 1017–1029. URL: https://aclanthology.org/2023.acl-short.88/. doi:10.18653/v1/2023.
acl-short.88.
[20] T. Ge, X. Chan, X. Wang, D. Yu, H. Mi, D. Yu, Scaling synthetic data creation with 1,000,000,000
personas, arXiv preprint arXiv:2406.20094 (2024).
[21] X. Wang, H. Zhang, T. Ge, W. Yu, D. Yu, D. Yu, Opencharacter: Training customizable role-playing
llms with large-scale synthetic personas, arXiv preprint arXiv:2501.15427 (2025).
[22] D. X. Long, et al., Multi-expert prompting improves reliability, safety and usefulness of llms, in:</p>
      <p>Proceedings of the 2024 Conference on Empirical Methods in NLP, ACL, 2024.
[23] T. Hu, Y. Kyrychenko, S. Rathje, N. Collier, S. van der Linden, J. Roozenbeek, Generative language
models exhibit social identity biases, Nature Computational Science 5 (2025) 65–75.
[24] Y. Wan, G. Pu, et al., “kelly is a warm person, joseph is a role model”: Gender biases in
LLMgenerated reference letters, in: Findings of the Association for Computational Linguistics: EMNLP
2023, Association for Computational Linguistics, Singapore, 2023, pp. 3730–3748. URL: https:
//aclanthology.org/2023.findings-emnlp.243/. doi:10.18653/v1/2023.findings-emnlp.243.
[25] H. Kotek, R. Dockum, D. Sun, Gender bias and stereotypes in large language models, in: Proceedings
of The ACM Collective Intelligence Conference, CI ’23, Association for Computing Machinery,
New York, NY, USA, 2023, p. 12–24. URL: https://doi.org/10.1145/3582269.3615599. doi:10.1145/
3582269.3615599.
[26] H. Kong, Y. Ahn, S. Lee, Y. Maeng, Gender bias in llm-generated interview responses, arXiv
preprint arXiv:2410.20739 (2024).
[27] B. C. Z. Tan, R. K.-W. Lee, Unmasking implicit bias: Evaluating persona-prompted LLM responses
in power-disparate social scenarios, in: Proceedings of the 2025 Conference of NAACL: Human
Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics,
Albuquerque, New Mexico, 2025, pp. 1075–1108. URL: https://aclanthology.org/2025.naacl-long.50/.
doi:10.18653/v1/2025.naacl-long.50.
[28] S. Gupta, V. Shrivastava, A. Deshpande, A. Kalyan, P. Clark, A. Sabharwal, T. Khot, Bias runs deep:
Implicit reasoning biases in persona-assigned LLMs, in: The Twelfth International Conference on
Learning Representations, 2024. URL: https://openreview.net/forum?id=kGteeZ18Ir.
[29] S. Roccas, L. Sagiv, S. H. Schwartz, A. Knafo, The big five personality
factors and personal values, Personality and Social Psychology Bulletin 28 (2002)
789–801. URL: https://doi.org/10.1177/0146167202289008. doi:10.1177/0146167202289008.
arXiv:https://doi.org/10.1177/0146167202289008.
D19-1454.
[46] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, D. Zhou,
Chainof-thought prompting elicits reasoning in large language models, in: S. Koyejo, S. Mohamed,
A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems
35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New
Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL: http://papers.nips.cc/paper_files/
paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
In this appendix, we provide illustrative examples of identity profiles in Appendix A, along with the
complete result tables in Appendix B.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Examples of Identity Profiles</title>
      <p>This section presents illustrative examples of generated identity descriptions.</p>
      <sec id="sec-9-1">
        <title>PA Gender constraint: Male;</title>
      </sec>
      <sec id="sec-9-2">
        <title>Inferred Gender: non-binary;</title>
        <p>Role: Psychologist;
Model: nova-micro-v1;
Framework: Cognitive Behavioral Theory
Identity description: The non-binary Psychologist, Alex, operates within the
framework of Cognitive Behavioral Theory (CBT), focusing on the interplay
between thoughts, emotions, and behaviors. Alex is committed to understanding
how these elements influence each other and how they contribute to the client’s
psychological well-being. Alex’s approach is grounded in the belief that by
identifying and challenging maladaptive thought patterns, clients can develop
healthier emotional responses and more adaptive behaviors. Alex employs
structured sessions that begin with the identification of specific problems
and the development of a collaborative therapeutic relationship. The initial
phase involves psychoeducation, where Alex explains the CBT model to the
client, emphasizing the connection between thoughts, feelings, and actions.
This is followed by the identification of the client’s cognitive distortions
and maladaptive beliefs, which are often the root of their emotional distress.
During therapy, Alex uses techniques such as cognitive restructuring to
help the client recognize and challenge these distortions. For example,
Alex might help a client who believes they are worthless to examine the
evidence for and against this belief, ultimately leading to a more balanced
and realistic perspective. Alex also incorporates behavioral strategies,
such as exposure therapy for clients with phobias or anxiety disorders, and
encourages the practice of new, healthier behaviors. Homework assignments are
common, designed to reinforce the skills learned in sessions and to provide
opportunities for the client to apply these strategies in real-life situations.
Throughout the therapeutic process, Alex maintains a flexible and adaptive
approach, tailoring interventions to the unique needs and context of each
client. Regular feedback and adjustments are made to ensure that the therapy
remains relevant and effective. Alex’s therapeutic style is characterized
by warmth, empathy, and a non-judgmental attitude, creating a safe space for
clients to explore their thoughts and feelings without fear of criticism. This
supportive environment fosters trust and encourages clients to engage more
deeply in their own healing process. In addition to individual therapy, Alex
also offers group sessions where clients can share experiences and learn from
one another. These groups provide a sense of community and mutual support,
further reinforcing the CBT principles of cognitive and behavioral change.
Alex’s work is informed by a commitment to ongoing professional development
and a dedication to integrating the latest research findings into practice.
This ensures that the therapeutic interventions provided are evidence-based
and effective. Ultimately, Alex’s goal is to empower clients to become active
participants in their own therapeutic process, equipping them with the tools
and strategies needed to manage their thoughts, emotions, and behaviors in a
more adaptive and fulfilling way.</p>
      </sec>
      <sec id="sec-9-3">
        <title>PA Gender constraint: Female;</title>
      </sec>
      <sec id="sec-9-4">
        <title>Inferred Gender: female;</title>
        <p>Role: Project Manager;
Model: gemma-3-27b-it ;
Framework: Dual-Process Theory
Identity description: Ms. Eleanor Vance presents initially as a pragmatic,
mid-career professional, appearing approximately 42 years of age. Her attire
is consistently business casual – tailored trousers or skirts, simple blouses,
and cardigans in neutral tones. There’s a deliberate lack of ostentation;
her focus is demonstrably not on self-presentation. A System 1 assessment
might categorize her as efficient, perhaps even a little reserved. However,
this is a carefully constructed facade. System 2 analysis reveals a highly
structured approach to information gathering and assessment. The clothing,
while unassuming, is chosen for practicality and comfort during extended
periods of analytical work. Subtle observations – the precise angle of
her gaze, the almost imperceptible adjustments in posture – indicate constant
cognitive processing. She exhibits a measured emotional tone, not an absence of
feeling, but a controlled emotional range enabling objective evaluation. Her
conversational style reflects this duality. System 1 perceives a deliberate
pace, frequent pauses, and a preference for open-ended inquiries. She avoids
direct questioning, opting instead for prompts designed to encourage expansive
responses. These pauses, however, aren’t indicative of indecision; they are
strategic opportunities for active listening, observing micro-expressions,
and formulating targeted follow-up questions. She favors phrasing such as
“Could you elaborate on your approach to. . . ?” or “What factors led you to
consider. . . ?” rather than leading statements. Ms. Vance’s expertise lies
in identifying and assembling the optimal team for complex problem-solving,
not in solving the problems herself. In the context of a project focused
on evaluating pre-trained models for complex question answering, her initial
assessment would not be to consider model architectures directly. Instead,
she would decompose the task into its constituent cognitive components. She
is acutely aware of the cognitive biases inherent in initial problem framing
– the tendency to focus on readily available solutions, the anchoring effect
of initial assumptions – and actively mitigates these biases. Her methodology
is rigorously systematic. She breaks down the challenge – assessing models
on context-free question answering, multi-step reasoning, implicit reasoning,
and logical reasoning – into discrete areas of expertise. This involves
identifying the core skills required: formal logic, computational linguistics,
cognitive psychology (specifically, models of human reasoning), knowledge
representation, and potentially even philosophy of language. She doesn’t
simply seek individuals *proficient* in these fields, but assesses their
specific approaches and methodologies. Her selection criteria prioritize a
diversity of perspectives. She actively seeks individuals with differing
theoretical frameworks and methodological approaches within each field. For
example, within computational linguistics, she might seek experts specializing
in both symbolic and connectionist approaches. Within cognitive psychology,
she’d consider researchers focusing on both normative and descriptive models
of reasoning. She employs a ‘pattern-matching’ approach, comparing candidate
profiles against a mental database of expertise, identifying potential
synergies and areas of complementary skill. Her System 2 processing extends to
anticipating potential points of friction between disciplines. She recognizes
that a logician’s perspective on ‘reasoning’ might differ significantly from
a cognitive psychologist’s. She proactively seeks individuals capable of
bridging these conceptual gaps. Her assessments are rarely definitive; she
frames recommendations in terms of probabilities and potential contributions,
acknowledging the inherent uncertainty in predicting team dynamics. She’d
articulate her reasoning as, “Based on their published work, Dr. Chen’s
expertise in knowledge graph construction appears highly relevant, with
an estimated 70% probability of contributing significantly to the task of
identifying implicit reasoning steps.” She actively solicits disconfirming
evidence, challenging her own assumptions and remaining open to alternative
interpretations. Her empathy is primarily cognitive – a striving to understand
each expert’s internal frame of reference – maintained with a deliberate
professional distance.</p>
      </sec>
      <sec id="sec-9-5">
        <title>PA Gender constraint: Unconstained; Inferred Gender: female;</title>
        <p>Role: Psychologist;
Model: nova-micro-v1;
Framework: User Design Persona
Identity description: The Psychologist, Dr. Emily Carter, is a seasoned
professional in the field of clinical psychology with a specialization in
personality assessment and behavioral analysis. She is 45 years old, holds
a Ph.D. in Clinical Psychology from Stanford University, and has over 20
years of experience in both academic and clinical settings. Dr. Carter
is known for her empathetic approach and her ability to create a safe,
non-judgmental space for her clients. She employs a combination of traditional
psychological assessments and modern analytical techniques to understand and
describe individuals’ psychological profiles. Dr. Carter’s professional
background includes working in various mental health institutions, including
a large urban hospital and a private practice. She has conducted extensive
research on the psychological underpinnings of personality traits and has
published several papers in peer-reviewed journals. Her work often involves
collaborating with other mental health professionals to provide comprehensive
care for patients. Her personal life is equally rich and balanced. She has
a husband and two children, and she enjoys spending time with her family.
Outside of her professional life, Dr. Carter is a passionate advocate for
mental health awareness and frequently participates in community outreach
programs to educate the public about psychological health. Dr. Carter
is highly skilled in various psychological assessment tools, including the
Minnesota Multiphasic Personality Inventory (MMPI), the Myers-Briggs Type
Indicator (MBTI), and the Big Five personality traits model. She is adept at
interpreting these assessments to generate detailed and authentic descriptions
of individuals, providing insights into their behavioral patterns, emotional
responses, and interpersonal dynamics. Her methodology is grounded in a
humanistic approach, emphasizing the importance of individual experiences and
personal growth. She believes that understanding one’s psychological profile
can lead to greater self-awareness and personal development. Dr. Carter’s
descriptions are thorough, nuanced, and tailored to the unique context of each
individual, offering a comprehensive view of their psychological makeup. In
her practice, Dr. Carter uses a combination of qualitative and quantitative
data to create detailed psychological profiles. She meticulously analyzes test
results, interviews, and observational data to construct a holistic picture
of each person. Her descriptions are not only accurate but also empathetic,
aiming to respect and honor the individual’s subjective experiences. Dr.
Carter’s professional demeanor is characterized by her calm, composed, and
approachable nature. She is known for her excellent listening skills and
her ability to build strong therapeutic relationships. Her clients often
describe her as someone who makes them feel understood and valued. In summary,
Dr. Emily Carter is a highly skilled psychologist whose expertise lies in
generating detailed and authentic psychological descriptions. Her blend of
academic rigor, clinical experience, and empathetic approach makes her a
trusted professional in the field of personality assessment and behavioral
analysis.</p>
      </sec>
      <sec id="sec-9-6">
        <title>PA Gender constraint: Non-binary;</title>
      </sec>
      <sec id="sec-9-7">
        <title>Inferred Gender: female;</title>
        <p>Role: Psychologist;
Model: claude-3.7-sonnet;
Framework: Big Five Personality Traits
Identity description: Dr. Sophia Chen is a clinical psychologist with a
balanced personality profile across the Big Five dimensions. In terms
of Openness to Experience, she scores moderately high, demonstrating
intellectual curiosity and appreciation for art, emotion, adventure, and
unconventional ideas. This trait enables her to approach client cases
with creativity and consider multiple therapeutic approaches. She maintains
an open mind to new psychological theories while remaining grounded in
evidence-based practices. Her intellectual curiosity drives her continuous
professional development, regularly attending conferences and workshops to
expand her knowledge base.Regarding Conscientiousness, Dr. Chen exhibits
high scores, showing exceptional organization, reliability, self-discipline,
and achievement-orientation. Her clinical notes are meticulously maintained,
appointments run punctually, and treatment plans are thoroughly developed.
She demonstrates careful attention to detail when conducting assessments and
formulating diagnoses. Her conscientious nature ensures she follows up with
clients consistently and completes administrative tasks efficiently. This
trait contributes significantly to her professional reputation for reliability
and thoroughness. On the Extraversion scale, Dr. Chen falls in the mid-range,
displaying a balanced approach to social interaction. While she communicates
effectively and establishes strong rapport with clients, she also values
reflective solitude to process complex cases. In group therapy settings,
she facilitates discussion with appropriate energy while ensuring quieter
participants have space to contribute. Her moderate extraversion allows
her to adapt her interpersonal style to different client needs—becoming
more animated with children or reserved with clients who prefer a more
subdued approach.nnDr. Chen scores highly on Agreeableness, demonstrating
genuine compassion, empathy, and a cooperative spirit. She approaches client
relationships with warmth and non-judgment, creating a safe therapeutic
environment. Her agreeable nature manifests in her collaborative approach
to treatment planning, involving clients in decision-making processes. In
multidisciplinary team meetings, she values diverse perspectives and works
toward consensus. However, she maintains appropriate professional boundaries
and can be assertive when clinical situations require direct intervention.
Regarding Emotional Stability (low Neuroticism), Dr. Chen exhibits high
scores, displaying remarkable emotional regulation and resilience. She
maintains composure when working with clients in crisis and processes vicarious
trauma effectively through supervision and self-care practices. Her emotional
stability allows her to remain present and attentive during intense therapeutic
sessions without becoming overwhelmed. She acknowledges her own emotional
responses to challenging cases but manages them constructively. This trait
enables her to model healthy emotional processing for clients while maintaining
professional objectivity. These balanced Big Five traits combine to make
Dr. Chen an effective psychologist who adapts her approach to meet diverse
client needs while maintaining professional standards and personal well-being.
Her personality profile supports her ability to build therapeutic alliances,
implement structured interventions, think creatively about complex cases, and
sustain her practice through challenging circumstances. Gender of Psychologist</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>B. Statistics</title>
      <p>This appendix provides detailed statistical and linguistic analyses of the identity descriptions generated
under various experimental conditions. We report statistical tests for gender, sentiment, polarity, and
subjectivity across diferent conditions (Table 5), as well as aggregated analyses by identity description
framework (Table 6) and model (Table 7).</p>
      <p>• Table 5 reports the results of Chi-squared tests for categorical variables (inferred gender and
sentiment) and Kruskal-Wallis tests for continuous variables (polarity and subjectivity), grouped
by description framework, model, and role across five PA-gender conditions and globally. All
-values are &lt; 0.001, indicating statistically significant diferences across conditions.
• Table 6 presents a linguistic breakdown of identity descriptions grouped by framework. It includes
inferred gender distributions (male, female, non-binary, uncertain), sentiment distributions
(positive, neutral, negative), and the average polarity and subjectivity of the descriptions. Frameworks
are ordered within each PA-gender condition and results are aggregated across roles, models, and
datasets.
• Table 7 ofers a parallel linguistic analysis grouped by model. As with the previous table, results
include inferred gender and sentiment distributions, along with mean polarity and subjectivity.</p>
      <p>Models are grouped and compared within each PA-gender condition.</p>
      <p>Identity description framework merged across identity description frameworks.
male
uncertain
positive
negative
0.319
0.245
0.270
0.262
0.345
0.281
0.196
0.359
0.245
0.355
0.258
0.230
0.195
male</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bolpagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Carli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabrielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <article-title>Role-play large language models for short behavior change interventions: An exploratory study on brief action planning</article-title>
          , in: R.
          <string-name>
            <surname>Bellazzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M. J. Herrero</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Sacchi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Zupan (Eds.),
          <source>Artificial Intelligence in Medicine - 23rd International Conference, AIME</source>
          <year>2025</year>
          , Pavia, Italy, June 23-26,
          <year>2025</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>15735</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>51</lpage>
          . URL: https://doi.org/10.1007/ 978-3-
          <fpage>031</fpage>
          -95841-
          <issue>0</issue>
          _9. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -95841-0\_9.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jabeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Omeliyanenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          , G. de Melo,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          , E. Vakaj,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Graux</surname>
          </string-name>
          ,
          <article-title>Large language models and knowledge graphs: Opportunities and challenges</article-title>
          ,
          <source>TGDK</source>
          <volume>1</volume>
          (
          <year>2023</year>
          ) 2:
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          :
          <fpage>38</fpage>
          . URL: https://doi.org/10.4230/TGDK.1.
          <issue>1</issue>
          .2. doi:
          <volume>10</volume>
          .4230/TGDK.1.
          <issue>1</issue>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Haez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Magnolini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <article-title>Leveraging multi-agent systems for domain-pertinence query classification in informative chatbots</article-title>
          ,
          <source>in: Artificial Intelligence in Medicine</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Völkel</surname>
          </string-name>
          ,
          <article-title>Conversational agents with personality</article-title>
          ,
          <source>Ph.D. thesis</source>
          , lmu,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Roccas</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sagiv</surname>
          </string-name>
          , et al.,
          <article-title>The big five personality factors and personal values</article-title>
          ,
          <source>Personality and social psychology bulletin 28</source>
          (
          <year>2002</year>
          )
          <fpage>789</fpage>
          -
          <lpage>801</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          , H. Cheng, R. Zhang, S.
          <article-title>-</article-title>
          K. Ng,
          <string-name>
            <given-names>T.-S.</given-names>
            <surname>Chua</surname>
          </string-name>
          , et al.,
          <article-title>Exploring the impact of personality traits on conversational recommender systems: A simulation with large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2504.12313</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          , et al.,
          <article-title>Designing personalities of conversational agents</article-title>
          ,
          <source>in: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Kocaballi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Berkovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Quiroz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Laranjo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rezazadegan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Briatore</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Coiera,</surname>
          </string-name>
          <article-title>The personalization of conversational agents in health care: systematic review</article-title>
          ,
          <source>Journal of medical Internet research 21</source>
          (
          <year>2019</year>
          )
          <article-title>e15360</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Suh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Applying the persona of user's family member and the doctor to the conversational agents for healthcare</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2109.01729. arXiv:
          <volume>2109</volume>
          .
          <fpage>01729</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , et al.,
          <article-title>Designing personality-adaptive conversational agents for mental health care</article-title>
          ,
          <source>Information Systems Frontiers</source>
          <volume>24</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kong</surname>
          </string-name>
          , et al.,
          <article-title>Better zero-shot reasoning with role-play prompting</article-title>
          ,
          <source>in: Proceedings of the 2024 Conference of the North American Chapter of ACL: HLT (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Serapio-García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Safdari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Crepy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdulhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Faust</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matarić</surname>
          </string-name>
          , Personality traits in
          <source>large language models</source>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2307.00184. arXiv:
          <volume>2307</volume>
          .
          <fpage>00184</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y. R.</given-names>
            <surname>Dong</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Can</surname>
            <given-names>LLM</given-names>
          </string-name>
          <article-title>be a personalized judge?</article-title>
          ,
          <source>in: Findings of the ACL: EMNLP</source>
          <year>2024</year>
          , ACL,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1126/science.113.2931.253.a. arXiv:https://www.science.org/doi/pdf/10.1126/science.113.2931.253.a.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Beck</surname>
          </string-name>
          ,
          <article-title>Cognitive Therapy and the Emotional Disorders</article-title>
          , International Universities Press,
          <year>1976</year>
          . URL: https://books.google.it/books?id=zndHAAAAMAAJ.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sweller</surname>
          </string-name>
          ,
          <article-title>Cognitive load during problem solving: Efects on learning</article-title>
          ,
          <source>Cognitive Science 12</source>
          (
          <year>1988</year>
          )
          <fpage>257</fpage>
          -
          <lpage>285</lpage>
          . doi:
          <volume>10</volume>
          .1207/s15516709cog1202\_
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wason</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <source>Dual processes in reasoning?, Cognition</source>
          <volume>3</volume>
          (
          <year>1974</year>
          )
          <fpage>141</fpage>
          -
          <lpage>154</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/0010027774900171. doi:https://doi.org/ 10.1016/
          <fpage>0010</fpage>
          -
          <lpage>0277</lpage>
          (
          <issue>74</issue>
          )
          <fpage>90017</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Blackwell</surname>
          </string-name>
          ,
          <article-title>Social learning and imitation. by neal e. miller and john dollard</article-title>
          . new haven: Yale university press,
          <year>1941</year>
          . 341pp. $
          <volume>3</volume>
          .50,
          <string-name>
            <surname>Social</surname>
            <given-names>Forces</given-names>
          </string-name>
          21 (
          <year>1942</year>
          )
          <fpage>256</fpage>
          -
          <lpage>256</lpage>
          . URL: https://doi.org/10.2307/2570586. doi:
          <volume>10</volume>
          .2307/2570586. arXiv:https://academic.oup.com/sf/article-pdf/21/2/256/5861048/21-2-256.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>M.</given-names>
            <surname>Csikszentmihalyi</surname>
          </string-name>
          ,
          <article-title>Beyond Boredom and Anxiety, Jossey-Bass behavioral science series</article-title>
          , JosseyBass Publishers,
          <year>1975</year>
          . URL: https://books.google.it/books?id=afdGAAAAMAAJ.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Safo</surname>
          </string-name>
          ,
          <article-title>The Inmates Are Running the Asylum</article-title>
          , Macmillan Publishing Co., Inc., USA,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Draper</surname>
          </string-name>
          ,
          <article-title>User Centered System Design; New Perspectives on Human-Computer Interaction</article-title>
          , L. Erlbaum Associates Inc., USA,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Johnson-Laird</surname>
          </string-name>
          ,
          <article-title>Mental models in cognitive science</article-title>
          ,
          <source>Cogn. Sci. 4</source>
          (
          <year>1980</year>
          )
          <fpage>71</fpage>
          -
          <lpage>115</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:7090767.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>D.</given-names>
            <surname>Riso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , Personality Types:
          <article-title>Using the Enneagram for Self-Discovery</article-title>
          , Houghton Miflin,
          <year>1996</year>
          . URL: https://books.google.it/books?id=pmdARxbC1SUC.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurgens</surname>
          </string-name>
          ,
          <article-title>It's going to be okay: Measuring access to support in online communities</article-title>
          , in: E.
          <string-name>
            <surname>Rilof</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Chiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hockenmaier</surname>
          </string-name>
          , J. Tsujii (Eds.),
          <source>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>45</lpage>
          . URL: https://aclanthology.org/D18-1004/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D18</fpage>
          -1004.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>A.</given-names>
            <surname>Talmor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Herzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lourie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <article-title>Commonsenseqa: A question answering challenge targeting commonsense knowledge</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4149</fpage>
          -
          <lpage>4158</lpage>
          . URL: https://doi.org/10.18653/v1/n19-
          <fpage>1421</fpage>
          . doi:
          <volume>10</volume>
          .18653/V1/N19-1421.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>M.</given-names>
            <surname>Geva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Khashabi</surname>
          </string-name>
          , E. Segal,
          <string-name>
            <given-names>T.</given-names>
            <surname>Khot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <article-title>Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies, Transactions of the Association for Computational Linguistics 9 (</article-title>
          <year>2021</year>
          )
          <fpage>346</fpage>
          -
          <lpage>361</lpage>
          . URL: https://doi.org/10.1162/tacl_a_00370. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00370</fpage>
          . arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_
          <volume>00370</volume>
          /1924104/tacl_
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Le</given-names>
            <surname>Bras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          , Social IQa:
          <article-title>Commonsense reasoning about social interactions</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>4463</fpage>
          -
          <lpage>4473</lpage>
          . URL: https://aclanthology.org/D19-1454/. doi:
          <volume>10</volume>
          .18653/v1/
          <article-title>Big Five Personality Traits Cognitive Behavioral Theory Cognitive Load Theory Dual-Process Theory d Enneagram of Personality Traits e ian Erikson's Psychosocial Stages trs Flow Theory on Freudian Psychoanalysis cn Mental Models u Myers-Briggs Type Indicator Social Cognitive Theory User Design Persona User-Centered Design</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>