Designing a Language-Model-Based Chatbot that
                         Considers User’s Personality Profile and Emotions To
                         Support Caregivers of People With Dementia⋆
                         Yeganeh Nasiri1 , Nancy Fulda2
                         1
                             Brigham Young University
                         1
                             Brigham Young University


                                       Abstract
                                       Chatbots driven by Artificial Intelligence (AI) systems are gaining widespread traction in industry, research, and
                                       education; however, many chatbot architectures operate only in the generalized case, without a personalized
                                       understanding of the specific user and contextual situation involved. This becomes particularly problematic in the
                                       domain of emotional support, which requires both understanding emotions, and the ability to properly respond
                                       to those emotions by considering the needs of the user. This work presents a conversational agent that uses a
                                       probabilistic model to localize the user’s personality type on the popular Myers-Briggs Type Indicator (MBTI)
                                       self-report inventory and create customized responses for different personalities. Results from the personality
                                       classifier are injected into an associated Knowledge Graph and are considered during text generation in order to
                                       create more personalized responses, and emotion detection is used to identify and react to the user’s current
                                       emotional state. We apply this model in a hypothetical scenario supporting caregivers of people with dementia,
                                       and augment a response generator trained on a custom dataset of scraped conversations among such caregivers
                                       with a dynamic knowledge graph that stores user information extracted from the conversation. We explore the
                                       efficacy of this system in a user study with N=24 participants and show that the MBTI personality classification
                                       and emotion modules were both noticeable to users and improved the user’s sense that the AI system was getting
                                       to know them as a person. Long-term, we hope this research will help create chatbots that provide emotional
                                       support for persons in socially isolated situations, including caregivers of people with dementia.

                                       Keywords
                                       Chatbot, Personality classifier, Knowledge graphs, Large language models


                         1. Introduction
                         Caring for a loved one with dementia creates many challenges for families and caregivers. People
                         with dementia struggle with memory problems and have difficulties with planning, thinking, and
                         even communicating. Family members caring for individuals with dementia at home often describe
                         the experience as ‘enduring stress and frustration’ [1]. As a result, caregivers are put in a vulnerable
                         situation and often need emotional support or assistance with their questions and tasks. Caregivers of
                         people with dementia face more depression, emotional distress, and physical strain than caregivers of
                         older adults with only physical disabilities, and frequently require more medical care than the dementia
                         patients themselves. One of the main problems for these caregivers is that they can not regularly leave
                         the house or their loved one with dementia, which can adversely affect their social life and activities.
                         Sometimes they cannot express their frustration to anyone because of fear of being judged. Taking all
                         these factors into account, a 2021 update from the CDC asserts that increased mortality risks from social
                         isolation and loneliness are comparable to those caused by smoking, obesity, and physical inactivity [2].
                            In this work, we attempt to amend this situation by building a conversational AI system with the type
                         of social and emotional awareness that would help these caregivers. This chatbot includes a knowledge
                         graph paired with three response generators trained on a dataset of information about taking care
                         of these patients. Furthermore, this chatbot is emotionally intelligent and uses an emotion classifier
                         to detect users’ current emotions by analyzing their input text. The chatbot adapts its responses by
                         classifying users’ personalities based on their conversations, to consider their preferences. Choosing the

                          The First international OpenKG Workshop: Large Knowledge-Enhanced Models, August 03, 2024, Jeju Island, South Korea
                          $ y.n.81191@gmail.com (Y. Nasiri); nfulda@cs.byu.edu (N. Fulda)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
right response is important because it can influence the feelings, thoughts, and behavior of the users. It
can help them overcome negative feelings such as sadness and anger by receiving empathetic responses
from the chatbot which intrigues them to talk about the problems that triggered these feelings, and in
case they already feel calm or positive, help them maintain that state. The proper response can make the
user feel engaged in the conversation and interested in talking openly about their moods and thoughts.
To help the chatbot remember facts from the conversations, it is equipped with the ability to extract
facts about the user in real-time during the conversation. These facts are saved to the knowledge graph
and can be re-used later during future conversations.
   This work is based on the premise that psychological care is not a one-size-fits-all phenomenon, and
that customization and personalization are essential in order to create a positive caregiver experience.
In addition, we use language model-based response generators to generate instant and non-repetitive
responses. We discuss existing literature around similar concepts. We then present the core contributions
of this work which are twofold: Firstly, we introduce a personality classifier that is able to identify the
user’s personality according to the Myers-Briggs Type Indicator [3] and use the identified personality to
influence the chatbot responses via embedded knowledge graph triples. Secondly, we explore the impact
on the user experience of deploying this personality classifier module along with a complementary
emotion recognition module.


2. Related works
Conversational Agents For People In Distress. Conversational agents have attracted the attention
of the natural language processing community due to their unique capabilities and availability. In
many recent works, researchers attempted to use chatbots for the specific task of helping people in
distress. [4] attempted to identify the key components of existing internet-based interventions designed
to support family caregivers of people with dementia. Their results indicated a positive response for the
use of internet-based interventions by caregivers. In another similar work, [5] have developed a care
guide system that provides individual care guides based on a knowledge model of caring for people
with dementia. Other works explored and developed conversational agents intending to help both
patients and caregivers, such as [6] and [7]. However, none of these works consider the users’ emotions
or personalities during the response generation process.
   Emotionally Intelligent Chatbots. One of the fundamental challenges in conversational AI is
producing a chatbot that is able to detect and react to emotions properly. Numerous researchers have
shown that empathetic systems can play key roles in contributing to a better user experience [8], [9],
[10]), but the application of these principles in a conversational AI framework is nontrivial. Emotional
affection and social belonging are fundamental needs for human beings [11]. Therefore, building social
chatbots to address these emotional needs is of great value to the society [12]. Previous works such
as [13] proposed an artificial intelligence-based cognitive model for emotion awareness in chatbots.
The proposed model can extract emotions from conversations, detect emotion transitions over time,
predict real-time emotions and intelligently profile human participants based on their distinct emotional
characteristics. In another similar approach, [14] aimed to understand the possibilities for users to
engage in personal relationships with chatbots via emotionally intelligent algorithms.
   Although these works accomplished their goals, their approaches lack the ability to save and recall
facts from the conversations. This can result in incoherency during longer conversations. We address
this issue by incorporating knowledge graph into the conversational system. Factual information is
extracted in real-time during each conversation and is saved for re-use within the knowledge graph. As
shown in Figure 1, this greatly improved the experience of users interacting with the system.
   Knowledge-Graph-Based Chatbots With the rapid progress of the semantic web, a huge amount
of structured data has become available on the web in the form of knowledge graphs. Knowledge
graphs represent a network of real-world entities and illustrate the relationship between them [15].
Using knowledge graphs in chatbots can help increase the coherency of the responses and adds the
ability to save and remember facts as the conversation continues. Recent studies such as [15] developed
conversational systems based on knowledge graphs to propose a machine learning approach based
on intent classification and natural language understanding to understand user intents and feelings.
In another work, [16] used knowledge graphs to establish relationships between stressors, speaker
expectations, responses, feedback, and effective states to identify responses that could have the best
impact on those under distress. In this work, we combine semantic knowledge, user personality profiling,
and emotional intelligence within a single dedicated arhitecture. This employs all the potentials of the
previous works and covers their weaknesses.


3. Methodology
For this work, we assume that considering both the user’s personality and emotions at the same time can
enhance the quality of the human-chatbot interaction. Based on this assumption, we equipped a chatbot
with features that can help caregivers of people with dementia, including language models fine-tuned
on specialized datasets and a knowledge graph that was able to store and retrieve personality-specific
information. To test our assumption, we first created a chatbot with certain baseline features that can
help caregivers. We then compared the baseline with our enhanced model by conducting a human
evaluation.

3.1. Baseline
For the baseline model of this project, we used BYU-EVE, an open-domain dialogue architecture
developed in BYU’s Dragn Lab [17], [18]. We used three different transformer-based language models:
DialoGPT [19], GPT-3 [20], and AI21 [21]. Using language models as response generators allows us to
generate more natural responses. However, this approach also has risks and limitations, such as lack of
coherency and the possibility of generating toxic or inappropriate text. These would definitely need to
be addressed in a production-ready system. Nevertheless, the neural generators function adequately
as a baseline to determine whether the implemented personality and emotion modules improved the
user experience. Our enhanced model uses a subset of BYU-EVE’s response evaluators to select the
highest-ranked response among the text generations from our three neural response generators. All
three response generators were trained on a dataset of information about taking care of patients with
dementia. This dataset was scraped from Reddit conversations among caregivers [22].

3.2. First Contribution: Personality Classifier
One of the main problems with the current chatbots is that they create new responses without con-
sidering the user’s personality. Personality is defined as “the characteristics of a person that uniquely
influence their cognitions, motivations, and behaviors in different situations.” [23]
   Studies have shown that people communicate better with those who have personality characteristics
that are similar to their own [25]. Accordingly, we designed a personality classifier that enables our
chatbot to gain information about the user by asking questions about their personality type and then
classifying them into one of the 16 personality types from the Myers-Briggs (MBTI) model [3]. The
MBTI is a four-factor model that allows people to describe themselves by four letters (e.g., ENTJ or
ISFP) that represent their particular type. The scale yields eight scores (one for each type) that can
be considered on four typological opposites. This technique contains 4 pairs of personality scales,
Introvert(I) vs Extroverts(E), Sensing(S) vs intuition(N), Thinking (T) vs Feeling (F), Judging (J) vs
Perceiving (P). (Fig.1). The four scales of the Myers-Briggs Type Indicator (MBTI) are scored by
computing a continuous preference score indicating the net preference for the two poles of each scale.
The chatbot has a list of questions Q for each set of personality scales. It constantly analyzes the user’s
input and the state of the conversation to determine if the user’s latest input is semantically similar
to one set of these questions. We calculate this semantic similarity using Sentence-BERT (SBERT)
[26]), a variation on the BERT network [27] designed to generate embeddings which facilitate semantic
Figure 1: The Myers-Briggs Type Indicator 4x4 Grid Structure [24]


comparisons. We find the cosine similarity between the SBERT embeddings of I (user’s input) and every
q ∈ Q using:

                                         arg max cos(𝑆𝐵(𝐼), 𝑆𝐵(𝑄))                                       (1)
                                              𝑞∈𝑄

    Where cos represents the cosine similarity function and 𝑆𝐵 is the application of the SBERT embedding
model.
    If the output of the cosine similarity is greater than a specific threshold (we used the value of 0.7 in
this work) it can be inferred that the user’s input is semantically similar to one set of questions. In this
case, the chatbot randomly selects a question from that set and appends it to the text generated by the
response generator for that round. The goal of calculating the cosine similarity is to ask questions at
the proper points of the chat, where they are connected to the conversation flow. These questions are
taken from the original Myers-Briggs test [3] and are set up as yes/no questions allowing the user to
reply positively or negatively. If the user’s response is ambiguous and unclassifiable, then none of the
personality scales get any score. The chatbot waits for the user’s response before determining whether
it was positive or negative using a positive/negative classifier. Based on that output, one characteristic in
the pair related to the category that the most recent question was drawn will have an increase in score.
(Fig.2). This process needs time so the chatbot can ask all of the questions and gather the information it
needs to match the user’s personality. By knowing this information about the user, offering a more
appealing response would be easier. Once the chatbot has successfully classified the user’s personality,
it adds a list of information related to that personality type to the user’s knowledge graph. For example,
if the user was classified as an ESTP, knowledge graph nodes would be added indicating that the user is
friendly, enjoys interacting with people, is action-oriented, and a risk taker. The chatbot constantly
monitors users’ inputs and compares them to the knowledge in the knowledge graph to generate new
responses. Therefore, by adding the list of information about the personality type to the knowledge
graph we can influence future responses in order to make them more personalized for the user.

3.3. Second Contribution: Emotion Detection
While there is a strong focus on building applications to assess health, there is scientific evidence
that making such applications empathetic plays a significant role in their acceptance and success and
improves user experience [12]. To address this issue, the chatbot uses an emotion classifier to classify
users’ feelings at the moment by analyzing their inputs. One of the response generators (AI21) can
show empathy by mimicking the user’s emotions. This feature has not been added to other response
        Figure 2: Personality classifier flowchart. The chatbot waits for the user’s input. It checks a flag value to see
        whether a question has been asked in the previous round or not. If no question has been asked (flag value = 0) it
        checks the semantic similarity of the input with all the questions in the list of questions by calculating the cosine
        similarity. If the input is semantically similar to one question (like the example on the left), the model picks a
        question and sends it to the response generator, and sets a flag value to 1 to remember that in the next turn, it must
        listen for the user’s response to the question. In the response generator, the model adds the question to the end of
        the generated response. In the following turn, the model knows it should wait for a response from the user because
        the flag value has been set to 1. the model classifies the user’s response as positive or negative, and based on that
        scores the personality characteristic of the user. The model continues this cycle until it found all four characteristics
        in the user’s personality. The next step would be adding a list of information about the user’s personality type to
        the knowledge graph.


generators to avoid overwhelming users with repeating emotions. For the task of emotion classification,
we used the emotion classifier from Hugging face [28], [29]. It allows classifying the text into one of
the following 6 emotions: Joy, love, surprise, Sadness, Anger, and Fear.

3.4. Knowledge Graphs and Fact Extraction
The chatbot also has the ability to extract important facts about the user during the conversation
and save them in the format of a knowledge graph to reuse them later during other conversations. A
knowledge graph, also known as a semantic network, is a knowledge base that uses a graph-structured
data model or topology to integrate data [30]. To extract and save information, we used the technique
offered in [31]. In this method, information is extracted from the conversations using the Stanford
Open Information Extraction (Open IE) model [32], which continuously extracts relevant facts and
entities from the conversation. These facts are typically in the form of head, relation, tail triples (e.g.,
"Alice - Likes - Books"). These extracted facts are used to build a knowledge graph. Each node in the
graph represents an entity (e.g., "Alice", "Books"), and each edge represents a relationship between
these entities (e.g., "Likes"). The graph is dynamically updated as the conversation is processed. The
knowledge graph functions as an external memory. When the model needs to generate a response, it
queries the knowledge graph to retrieve relevant information, enabling it to create more personalized
responses. The model finds this relevant information by using the technique from 3.2. The selected
information is then fed to the language model before generating the response.
   Additionally, after classifying the user’s personality, the chatbot enhances the knowledge graph with
Figure 3: Some sample pre-defined characteristics of the personality type ENTP in the Myers Briggs personality model (left),
and demonstration of a simple knowledge graph made by user’s information (right). When the model identifies a user as an
ENTP, it automatically incorporates these predefined characteristics into the user’s knowledge graph. This enhancement
allows for more personalized and contextually relevant responses.


information related to that personality type. For instance, if the user is classified as an ESTP, nodes would
be added to the knowledge graph to indicate that the user is friendly, enjoys interacting with people, is
action-oriented, and is a risk-taker. (Fig.3) By continuously monitoring user inputs and comparing them
to the knowledge graph, the chatbot can generate tailored responses. Thus, incorporating personality-
related information into the knowledge graph ensures that future interactions are more personalized
and engaging for the user.

3.5. Datasets
Publicly available emotional dialogue datasets such as EmpatheticDialogues [33], EmotionLines [34]
and EmoContext [35], mostly consist of daily conversations created in an artificial setting or curated
from movie/TV subtitles. Real counseling conversation datasets used to conduct research such as [36]
and [37] are often not publicly available due to ethical reasons. Therefore we created a new dataset
from scraping Reddit data, which contains dialogues among caregivers of people with dementia. We
chose Reddit to collect this data because it is publicly accessible and the conversations on that platform
are real talks between people who experienced taking care of people with dementia, therefore their
questions and concerns are common among people in that situation. We used the Pushshift API [38]
to gather this data from two related subreddits: Dementia and Alzheimer’s. We cleaned the data by
dropping unrelated responses (such as advertisements) and cleaning the data from a list of offensive
words. Among the three response generators we used for this model, only DialoGPT is trained on the
full dataset. We used a smaller version of this dataset for performing few-shot learning on GPT3-based
and AI2-based response generators which used larger language models.


4. Evaluation
In order to better understand our model’s performance, we conducted a human evaluation. The chatbot
evaluation session took place on five consecutive days at Brigham Young University, with a total of 24
participants. The participants were recruited from students of Brigham Young University (male and
female, mostly college students) through posting flyers. The participants spent some time chatting with
our model (which consists of personality classification, knowledge graph, and emotion detection), and
Figure 4: Comparing two conversations between a participant and our model (left) and the baseline (right). The sentence
highlighted in red shows our model trying to identify the user’s personality by asking a question. The green highlight shows
when our model tries to use its knowledge about the user.


the baseline model which is designed very similar to our model without any features as described in
Section 3.1. Due to the time limitation, We opted to slightly change the personality classifier feature.
Hence, instead of waiting for the proper time during the conversation to ask a personality question, we
set the model to ask a question every 5 turns if it did not encounter enough semantic similarity. This
modification accelerates the process of getting to know the user, but on the downside, it makes the
conversation less coherent and the transition between the conversations less smooth.
    We recorded the conversations between the user and the chatbots without recording any personal
information about the user for further analysis. After this experience, the participants were asked to
fill out a survey to compare our model with the baseline and measure our model’s improvements. In
the survey, we asked questions like which version made the participants feel better after the conversa-
tion to compare the effect on emotions, which version made more human-like responses to measure
coherency, and which version got to know the user more and generated responses that better suited
their personalities to see if the personality classifier feature is doing its job. The participants could vote
for our model if they found it better, the baseline or "no difference". We calculated the t-test results for
these comparisons which can be seen in table 1, in addition to the percentage of participants who voted
for the enhanced model.
    For scoring the models, if the participant vote for the baseline model, the baseline model gets +1 score
and the enhanced model loses -1 score, and vice versa. If the participant votes for "no difference", which
means they believe both versions are equal in performance, then both models get 0 scores. Although the
majority of participants voted for the enhanced model for all questions, the t-value would be different
based on the number of votes for either the baseline or "no difference" option. In the last row in table 1,
the reason that the enhanced model got more votes from the participants, and still the t-value is so low
is because in scoring the models, we got more negative votes compared to other questions. We presume
the reason for this could be because of the changes we made to accelerate the personality classifying
process during the evaluation.
    We compared the lengths of the conversations in baseline and in our system and noticed a 12.25 %
       Question                                                        Our model      Baseline    T value
       Which version made you feel like it is getting to know you      70%              25%       3.136
       better?
       which version generated responses that better suited your       70%              30%       3.108
       personality?
       Which version generated more human-like responses?              67%              23%       3.278
       Which version helped you feel more positive emotions during     62%              25%       2.968
       the conversation?
       Which version do you like to use more?                          66%              34%        2.398

   Table 1
   Comparing the baseline and our model by calculating the T value. In the t-test, the null hypothesis is
   that the baseline and our model are the same in performance and there are little or no improvements,
   and the alternate hypothesis is that our model made improvements. We considered the T-critical value
   as 2.92 and if the t-value is greater than the t-critical value we can say that the alternate hypothesis is
   correct. Otherwise, the null hypothesis is correct. Bolded scores mean the alternate hypothesis is correct
   and our model made improvements. Our model’s score column shows the percentage of participants
   who voted for the enhanced model. Users were given the choice to vote for our enhanced model, the
   baseline model, or neither.


increase in the conversation length for the conversations with our model. This increase can show that
our model acted more successfully in engaging the participants in a conversation.
   Comparing the emotion transition during the conversations revealed some unexpected results.
Although we were expecting to see more positive emotions during the conversations with our model,
we had an increase in negative emotions (sadness and anger) 2. We have three possible interpretations.
First, the influence of the enhanced model may be subtle and, since the number of participants and
conversations were limited, was possibly not enough to show its influence over five days. Second, we
provided emotionally appropriate responses and added sympathy by mimicking the user’s emotions. For
this reason, whenever the user says something with negative emotion the chatbot preferably repeats that
emotion as well. This technique for adding empathy increases the number of negative emotions. Third,
a manual inspection of the participants’ conversations with our enhanced model and the baseline, we
noticed the users had more tendency to talk about their problems with the enhanced model. Naturally,
talking about these subjects makes them feel more negative emotions. We consider this as a positive
step forward because one of our main goals is to make this chatbot help caregivers talk about their
problems and emotions. Further studies are needed to get a deeper understanding of the long-term
impact of our model on improving users emotions. Two conversations between the participants and the
enhanced model and the baseline can be found in Fig 4.
   Despite having more negative emotions during the conversations, the participants reported that they
experienced more positive emotions after chatting with the enhanced model. One likely interpretation
is that, by drawing the user into a conversation about negative emotions, the chatbot provides a form
of catharsis, allowing the user to release their negative emotions by talking about them. Studies have
shown that simply talking about our problems and sharing our negative emotions with someone we
trust can be profoundly healing—reducing stress, strengthening our immune system, and reducing
physical and emotional distress [39].


5. Ethical Impacts
The deployment of conversational AI systems, including chatbots, for support of vulnerable populations
is fraught with ethical peril. We note in particular the well-justified concerns surrounding language
model bias [40], [41], [42], dataset imbalance [43], [44], and task alignment for large-scale language
models to specific user preferences [45], [46]. While recent innovations such as constitutional language
models [47] are helping to mitigate such concerns, we are far from having failsafe technologies in this
                                   Emotion    Baseline   Our pipeline
                                     Joy        74%          72%
                                    Love        3.6%         1.2%
                                   Surprise     0.4%         0.8%
                                     Fear       7.4%         6.9%
                                   Sadness       9%          9.4%
                                    Anger        5%          8.5%

Table 2
Comparing the emotion counts during the conversations with the baseline and the enhanced model. Bolded
results mean this emotion occurred more frequently. Although negative emotions slightly increased in the
enhanced model, by analyzing the conversations we noticed that the reason behind this increase is that our
model is more empathetic. It mimics users’ emotions and makes them feel more engaged in the conversation,
and feel more comfortable talking about their emotions and problems.


regard. In light of such factors, we emphatically assert that our research is intended to explore one
small factor (i.e. personality classification based on the MBTI self-report inventory) of a much larger
problem, and should not be viewed as an end solution in and of itself. Any attempt to leverage our
methods in a broader conversational AI context should include careful oversight from both medical
professionals and expert practitioners in large-scale language models, with a careful eye toward the
human impacts of such systems.
   Regarding our core contribution of personality classification within a contextually and emotionally
aware text generation system, we note that any attempt to classify users into subcategories includes
inherent risks such as stereotyping, pigeonholing, and reductionism. We feel that the use of a long-
established and well understood classification method (in this case, the MBTI system) mitigates many
of these risks, but care should still be taken in applying any conclusions made by such systems. In
particular, it is recommended that any system leveraged for user-specific personality classification be
open-ended and responsive to new developments in the user’s personality and preferences.


6. Conclusion
In this paper, we implemented a novel personality classification approach based on the Myers-Briggs
self-report inventory and examined the impact of this innovation on user responses to neural text
generations paired with targeted knowledge graph extractions. Over the course of 5 days, with N=24
participants, we found that our enhanced model created significant improvements in participants’ sense
that the chatbot was getting to know them, as well as generating more positive emotions as reported by
users. Interestingly, these self-reported positive emotions are correlated with an increased amount of
negative sentiment in the chat transcripts, which we attribute to a sense of catharsis due to the user’s
increased willingness to disclose negative emotions to the chatbot.
   Future work in this line of research should include a more detailed analysis of chatbot behavior and
user responses, as well as an extension of the user study to include a demographic group that is closer
to the long-term target population of dementia caregivers. In addition, we wish to improve the quality
of our generated emotion-informed responses by using emotional style transfer techniques.


References
 [1] H. K. Butcher, P. A. Holkup, K. C. Buckwalter, The experience of caring for a family member with
     alzheimer’s disease, Western journal of nursing research 23 (2001) 33–55.
 [2] Centers for Disease Control and Prevention, Loneliness and social isolation linked to serious
     health conditions, Alzheimer’s Disease and Healthy Aging (2020).
 [3] C. Coulacoglou, D. H. Saklofske, Psychometrics and psychological assessment: Principles and
     applications, Academic Press, 2017.
 [4] J. Hopwood, N. Walker, L. McDonagh, G. Rait, K. Walters, S. Iliffe, J. Ross, N. Davies, et al., Internet-
     based interventions aimed at supporting family caregivers of people with dementia: systematic
     review, Journal of medical Internet research 20 (2018) e9548.
 [5] G. Kim, H. Jeon, S. Park, Y. S. Choi, Y. Lim, Care guide system for caregivers of people with
     dementia, in: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine &
     Biology Society (EMBC), IEEE, 2020, pp. 5753–5756.
 [6] T. Le Xin, A. Arshad, Z. A. bin Abdul Salam, Alzbot-mobile app chatbot for alzheimer’s patient to
     be active with their minds, in: 2021 14th International Conference on Developments in eSystems
     Engineering (DeSE), IEEE, 2021, pp. 124–129.
 [7] S. Valtolina, M. Marchionna, Design of a chatbot to assist the elderly, in: International Symposium
     on End User Development, Springer, 2021, pp. 153–168.
 [8] K. Liu, R. W. Picard, Embedded empathy in continuous, interactive health assessment, in: CHI
     Workshop on HCI Challenges in Health Assessment, volume 1, 2005, p. 3.
 [9] A. Ghandeharioun, D. McDuff, M. Czerwinski, K. Rowan, Emma: an emotion-aware wellbeing
     chatbot, in: 2019 8th International Conference on Affective Computing and Intelligent Interaction
     (ACII), IEEE, 2019, pp. 1–7.
[10] D. Lee, K.-J. Oh, H.-J. Choi, The chatbot feels you-a counseling service using emotional response
     generation, in: 2017 IEEE international conference on big data and smart computing (BigComp),
     IEEE, 2017, pp. 437–440.
[11] S. McLeod, Maslow’s hierarchy of needs, Simply psychology 1 (2007).
[12] H.-Y. Shum, X.-d. He, D. Li, From eliza to xiaoice: challenges and opportunities with social chatbots,
     Frontiers of Information Technology & Electronic Engineering 19 (2018) 10–26.
[13] A. Adikari, D. De Silva, D. Alahakoon, X. Yu, A cognitive model for emotion awareness in industrial
     chatbots, in: 2019 IEEE 17th international conference on industrial informatics (INDIN), volume 1,
     IEEE, 2019, pp. 183–186.
[14] M. Portela, C. Granell-Canut, A new friend in our smartphone? observing interactions with
     chatbots in the search of emotional engagement, in: Proceedings of the XVIII International
     Conference on Human Computer Interaction, 2017, pp. 1–7.
[15] A. Ait-Mlouk, L. Jiang, Kbot: a knowledge graph based chatbot for natural language understanding
     over linked data, IEEE Access 8 (2020) 149220–149230.
[16] P. P. Anuradha Welivita, Heal: A knowledge graph for distress management conversations, In
     press (2022).
[17] N. Fulda, T. Etchart, W. Myers, D. Ricks, Z. Brown, J. Szendre, B. Murdoch, A. Carr, D. Wingate,
     Byu-eve: Mixed initiative dialog via structured knowledge graph traversal and conversational
     scaffolding, Proceedings of the 2018 Amazon Alexa Prize (2018).
[18] N. Fulda, C. Gundry, Conversational ai as improvisational co-creation-a dialogic perspective, ICCC
     (2022).
[19] Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, X. Gao, J. Gao, J. Liu, B. Dolan, Dialogpt:
     Large-scale generative pre-training for conversational response generation, arXiv preprint
     arXiv:1911.00536 (2019).
[20] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,
     G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information
     processing systems 33 (2020) 1877–1901.
[21] A. Labs, When Machines Become Thought Partners ai21 labs, 2020. URL: http://www.ai21.com.
[22] Reddit, Dementia, 2020. URL: https://www.reddit.com/r/dementia/.
[23] M. F. McTear, Z. Callejas, D. Griol, The conversational interface, volume 6, Springer, 2016.
[24] S. Ontoum, J. H. Chan, Personality type based on myers-briggs type indicator with text posting
     style by using traditional and deep learning, arXiv preprint arXiv:2201.08717 (2022).
[25] D. Byrne, Interpersonal attraction and attitude similarity., The journal of abnormal and social
     psychology 62 (1961) 713.
[26] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv
     preprint arXiv:1908.10084 (2019).
[27] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
     for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[28] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun-
     towicz, et al., Huggingface’s transformers: State-of-the-art natural language processing, arXiv
     preprint arXiv:1910.03771 (2019).
[29] HuggingFace, t5-base-finetuned-emotion, 2021. URL: https://huggingface.co/mrm8488/
     t5-base-finetuned-emotion.
[30] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. d. Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo,
     R. Navigli, S. Neumaier, et al., Knowledge graphs, ACM Computing Surveys (CSUR) 54 (2021)
     1–37.
[31] B. R. Andrus, Y. Nasiri, S. Cui, B. Cullen, N. Fulda, Enhanced story comprehension for large
     language models through dynamic document-based knowledge graphs, in: Proceedings of the
     AAAI Conference on Artificial Intelligence, volume 36, 2022, pp. 10436–10444.
[32] G. Angeli, M. J. J. Premkumar, C. D. Manning, Leveraging linguistic structure for open domain
     information extraction, in: Proceedings of the 53rd Annual Meeting of the Association for Com-
     putational Linguistics and the 7th International Joint Conference on Natural Language Processing
     (Volume 1: Long Papers), 2015, pp. 344–354.
[33] H. Rashkin, E. M. Smith, M. Li, Y.-L. Boureau, Towards empathetic open-domain conversation
     models: A new benchmark and dataset, arXiv preprint arXiv:1811.00207 (2018).
[34] S.-Y. Chen, C.-C. Hsu, C.-C. Kuo, L.-W. Ku, et al., Emotionlines: An emotion corpus of multi-party
     conversations, arXiv preprint arXiv:1802.08379 (2018).
[35] A. Chatterjee, U. Gupta, M. K. Chinnakotla, R. Srikanth, M. Galley, P. Agrawal, Understanding
     emotions in text using deep learning and big data, Computers in Human Behavior 93 (2019)
     309–317.
[36] T. Althoff, K. Clark, J. Leskovec, Large-scale analysis of counseling conversations: An application of
     natural language processing to mental health, Transactions of the Association for Computational
     Linguistics 4 (2016) 463–476.
[37] J. Zhang, C. Danescu-Niculescu-Mizil, Balancing objectives in counseling conversations: Advanc-
     ing forwards or looking backwards, arXiv preprint arXiv:2005.04245 (2020).
[38] J. Baumgartner, S. Zannettou, B. Keegan, M. Squire, J. Blackburn, The pushshift reddit dataset, in:
     Proceedings of the international AAAI conference on web and social media, volume 14, 2020, pp.
     830–839.
[39] J. W. Pennebaker, J. K. Kiecolt-Glaser, R. Glaser, Disclosure of traumas and immune function:
     health implications for psychotherapy., Journal of consulting and clinical psychology 56 (1988)
     239.
[40] M. Nadeem, A. Bethke, S. Reddy, StereoSet: Measuring stereotypical bias in pretrained language
     models, in: Proceedings of the 59th Annual Meeting of the Association for Computational
     Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume
     1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 5356–5371. URL:
     https://aclanthology.org/2021.acl-long.416. doi:10.18653/v1/2021.acl-long.416.
[41] A. Abid, M. Farooqi, J. Zou, Persistent anti-muslim bias in large language models, in: Proceedings
     of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 298–306.
[42] P. P. Liang, C. Wu, L.-P. Morency, R. Salakhutdinov, Towards understanding and mitigating social
     biases in language models, in: International Conference on Machine Learning, PMLR, 2021, pp.
     6565–6576.
[43] L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite,
     N. Nabeshima, et al., The pile: An 800gb dataset of diverse text for language modeling, arXiv
     preprint arXiv:2101.00027 (2020).
[44] E. M. Bender, T. Gebru, A. McMillan-Major, S. Shmitchell, On the dangers of stochastic parrots:
     Can language models be too big???, in: Proceedings of the 2021 ACM conference on fairness,
     accountability, and transparency, 2021, pp. 610–623.
[45] E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh,
     S. Günnemann, E. Hüllermeier, et al., Chatgpt for good? on opportunities and challenges of large
     language models for education, Learning and Individual Differences 103 (2023) 102274.
[46] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama,
     A. Ray, et al., Training language models to follow instructions with human feedback, Advances in
     Neural Information Processing Systems 35 (2022) 27730–27744.
[47] Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirho-
     seini, C. McKinnon, et al., Constitutional ai: Harmlessness from ai feedback, arXiv preprint
     arXiv:2212.08073 (2022).