<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshops,
Los Angeles, USA, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>PRIMER: An Emotionally Aware Virtual Agent</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carla Gordon</string-name>
          <email>cgordon@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Klassen</string-name>
          <email>e.klassen@cablelabs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anton Leuski</string-name>
          <email>leuski@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward Fast</string-name>
          <email>fast@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grace Benn</string-name>
          <email>benn@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matt Liewer</string-name>
          <email>Liewer@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arno Hartholt</string-name>
          <email>hartholt@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Traum</string-name>
          <email>traum@ict.usc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Virtual Reality, Virtual Agents, Spoken Dialogue Systems, Mixed-</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CableLabs</institution>
          ,
          <addr-line>Boulder, Colorado</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Initiative Dialogue</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>USC Institute for Creative, Technologies</institution>
          ,
          <addr-line>Los Angeles</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <abstract>
        <p>PRIMER is a proof-of-concept system designed to show the potential of immersive dialogue agents and virtual environments that adapt and respond to both direct verbal input and indirect emotional input. The system has two novel interfaces: (1) for the user, an immersive VR environment and an animated virtual agent both of which adapt and react to the user's direct input as well as the user's perceived emotional state, and (2) for an observer, an interface that helps track the perceived emotional state of the user, with visualizations to provide insight into the system's decision making process. While the basic system architecture can be adapted for many potential real world applications, the initial version of this system was designed to assist clinical social workers in helping children cope with bullying. The virtual agent produces verbal and non-verbal behaviors guided by a plan for the counseling session, based on in-depth discussions with experienced counselors, but is also reactive to both initiatives that the user takes, e.g. asking their own questions, and the user's perceived emotional state.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information Interfaces and Presentation (e.g. HCI) →
Miscellaneous; • Multimedia Information Systems → Artificial,
augmented, and virtual realities; • Natural Language Processing →</p>
      <sec id="sec-1-1">
        <title>Discourse.</title>
        <p>IUI Workshops’19, March 20, 2019, Los Angeles, USA
© 2019 for the individual papers by the papers’ authors. Copying permitted for private
and academic purposes. This volume is published and copyrighted by its editors.
1</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>The issue of bullying has been the focus of national media attention
in recent years. Although this is not a new issue, the problem has
been compounded by the rise of social media technology. Today
the problem is at an all-time high, with one out of five public
school children reporting being bullied, according to the National
Center for Education Statistics [6]. However, only 43% of bullying
victims say they have reported the incident to a school oficial [ 10],
indicating that victims of bullying may be uncomfortable disclosing
to adults who might be able to provide the support they need.</p>
      <p>Recent research has shown that many people feel more
comfortable talking about embarrassing or personal issues with virtual
agents than with strangers or even family or friends [8], [9].
Virtual agents could be a cost-efective means to provide entry-level
support and allow a counselor to reach more troubled students. A
virtual agent, in a VR environment, with monitoring by a (possibly
remote) counselor, might be able to provide such support for
children who may not know how to communicate their struggles with
the adults in their life, or know how best to respond to an ongoing
bullying situation at their school.</p>
      <p>Over the last decade, there has been a considerable amount
of success in creating interactive, conversational, virtual agents,
including Ada and Grace, a pair of virtual Museum guides at the
Boston Museum of Science [14], the INOTS and ELITE training
systems at the Naval Station in Newport and Fort Benning [2], and
the SimSensei system designed for healthcare support [4]. There
is also some precedence for the use of virtual agents in facilitating
bullying education, such as the FearNot! application developed by
Aylett et al. [1].</p>
      <p>In this paper, we introduce a system called PRIMER, that
attempts to explore the user interface issues in creating a tool that a
counselor could use to deploy virtual agents in helping bullied
students. The main student interface involves the student interacting
with a virtual character, who takes on the counselor role in guiding
the session, but listening and reacting to the student’s semantic and
emotional expressions. There is also a counselor interface that
provides information that the system has gathered from the interaction,
aggregated on a "dashboard" and allows intervention.</p>
      <p>In addition to the successes of previous virtual agents, immersive
VR environments have also been shown to be helpful in the context
of mental health services and resources. One such application is the
use of Virtual Reality Exposure Therapy (VRET) as a treatment for
soldiers sufering from PTSD. VR systems such as the Bravemind
system [11] have proven efective VRET tools, creating VR
environments that allow for the replication of traumatic events without
exposing the patient to any real physical danger. Based on the
efifcacy of these VR tools, it was decided that the PRIMER student
interface should be deployed in an immersive VR environment.</p>
    </sec>
    <sec id="sec-3">
      <title>USER EXPERIENCE</title>
      <p>The user experience (UX) for PRIMER was designed to be engaging
by creating a User Perceived State model (see section on Emotion
Tracking) that detects and tracks the user’s emotional state, and
drives the system’s reactive responses. Inside the HMD, users
become immersed in the application’s virtual environment, in this
case it is a counselor’s ofice (see Figure 1). Analyzed user input
afects Ellie’s behavior and dialogue responses, as well as the virtual
environment itself. Ellie is capable of displaying a range of diferent
reactive emotions, gestures, body language, and linguistic behavior,
depending on the user’s current perceived emotional state, and the
emotional connotations of their utterances.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Emotionally Adaptive Behaviors</title>
      <p>The system behavior was developed focusing on three core adaptive
behaviors for Ellie: facial expressions, posture, and gestures. Three
base posture poses (leaning back, sitting upright, leaning forward)
were created with transition animations to blend between them.
Gesture animations were created with three intensities (low, neutral,
high). Additionally, the facial expressions adapt to the perceived
emotional state of the user.</p>
      <p>Ellie was designed to express the primary seven emotions as
outlined by noted psychologist Paul Ekman: happiness, disgust,
fear, surprise, anger, sadness, and contentedness; as well as blend
between expressions. For example, if the system perceives the user
to be in a depressed mood, this would cause her to lean forward,
display a gentle encouraging facial expression, and move with small,
slow gestures. Ellie takes the initiative and drives the interaction
based on a pre-determined plan that can take a variety of paths
depending on the user’s responses and their overall emotional state.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Emotionally Adaptive Dialogue</title>
      <p>Ellie is also capable of being adaptive to the user’s perceived
emotional state by changing the tone of her dialogue responses. This
means that she will provide appropriate feedback to user
utterances, as well as appropriate prompting questions. For example,
in response to a very positive user utterance, she may say
something like "That’s great!", while a very negative user utterance may
prompt her to say something like "That’s rough." She is also
capable of providing very nuanced responses if a user’s perceived
emotional state shifts suddenly in the conversation. For example,
if the system senses the user is happy overall, but registers that a
user’s last utterance was indicative of sadness, Ellie can respond
with a highly nuanced response such as "I understand things are
worse for you this week, but I’m also sensing that you are in a good
mood. Tell me what’s going on."</p>
      <p>PRIMER takes a mixed-initiative approach to dialogue
interaction, meaning not only does Ellie respond to the user’s utterances,
and the perceived emotional connotations of those utterances, but
she will also take initiative to ask the user questions as well. In this
way, PRIMER can provide the semi-structured dialogue necessary
to emulate a counselling session with a bullied child, creating a
unique user experience.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Emotionally Reactive VR Environment</title>
      <p>Not only does Ellie herself adapt and respond to the user’s emotional
state, the virtual environment itself is also capable of changing in
a number of ways. Based on the user’s perceived emotional state,
changes to the environment work in tandem with Ellie. Users may
experience changes to lighting, background ambience, room color,
or even a total change of environment from an ofice to a park or
woods or any other desired environment. Pre-selected multimedia
options can be played and/or shown to help illustrate points or
teach lessons or for an infinite number of possibilities depending
on the application. This adaptive virtual environment is one of
the features that sets the PRIMER system apart from some of the
previously mentioned similar systems, such as SimSensei.</p>
    </sec>
    <sec id="sec-7">
      <title>SYSTEM ARCHITECTURE</title>
      <p>The PRIMER system is comprised of a number of components, all
of which are integrated and/or rendered in a Unity environment.
The following components were developed using the VH Toolkit:
• Virtual Agent: The virtual agent (Ellie) for the PRIMER
system was created using the USC ICT Virtual Human (VH)
Toolkit [5].
• NPCEditor: Utterance classification was carried out using
the NPCEditor, a component of the Virtual Human Toolkit.
NPCEditor is a text classification and dialogue management
system that serves as the core response selection component
of the system. The NPCEditor itself provides the text
classification, and integrates a Dialogue Manager script which
uses these classification results in the response selection
process (more about the DM below). NPCEditor is also a data
editor that allows us to collect, organize, and annotate the
linguistic data. The NPCEditor text classification algorithm
is based on cross-language relevance models and have been
used in a number of successfully deployed VH systems [7].
The NPCEditor is the only component of PRIMER which
is an external process, not fully encapsulated by the Unity
environment.</p>
      <p>In addition, the following new components were developed
specifically for PRIMER:
• ASR: Speech Recognition is handled by the Unity Dictation
Recognizer library, which is a light wrapper around the
Windows.Speech API, the same engine used by Cortana. This
makes the app Windows 10 specific.
• The VR Interface: The user interface was developed in
Unity (see section on User Experience.) This is the interface
the user interacts with, which displays Ellie (the Virtual
Agent) in the virtual environment.
• The Graphical User Interface, or "Dashboard": In
addition to the virtual environment in which the user interacts
with Ellie, a second interface, called the Dashboard, was also
developed in Unity. This is a separate interface designed to
be viewed by a third party observer, such as a school
counselor. This interface contains a host of information about
the user’s interaction and system decision making processes
(see section on Graphical User Interface).
• Dialogue Manager: The Dialogue Manager (DM) for PRIMER
makes use of a persistent User afect model and a
conversational model, as well as sensing of current afect and
linguistic input. In addition to implementing a rule-based dialogue
policy and selecting a next utterance for Ellie, the dialogue
manager’s information state can be updated by an observer
using the Dashboard and sends explanations of its decisions
to the Dashboard.
4</p>
    </sec>
    <sec id="sec-8">
      <title>SYSTEM FUNCTIONS</title>
      <p>In this section, we outline several interface developments that go
beyond previous systems using the virtual human toolkit: User
perceived emotion tracking, the "dashboard" observer interface,
and mixed initiative, emotion-sensitive dialogue processing.
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>Emotion Tracking: the User Perceived State</title>
      <p>The User Perceived State (UPS) is the metric by which PRIMER tracks
the user’s emotional state. The UPS is represented in the system as
an ordered pair of 2 values: Valence and Arousal. These values are
detected through the use of lexical sentiment analysis, in a process
that will be further elaborated below in the section on the Dialogue
System.</p>
      <p>Valence refers to the overall emotional polarity of a given
utterance, which can be positive, negative or neutral. Primer represents
this emotional valence as a value between -0.9 (very negative) and
0.9 (very positive), with a valence value of 0.0 being neutral. An
example of a positively valenced utterance would be "I’m feeling
great today", a negatively valenced utterance would be "I’m feeling
terrible", and a neutrally valenced utterance would be "I’m ok".</p>
      <p>Arousal refers to the level of physiological arousal represented by
a given utterance. In other words, Arousal is a measure of the energy
or intensity with which something is said. As with Valence, PRIMER
represents Arousal as a value between -0.9 (low energy) and 0.9
(high energy), with a valence of 0.0 being neutral. While
physiological arousal or afect can be dificult to predict from purely text-based
input, recent work suggests that lexical markers of arousal do exist,
based on psychological word norms [3].</p>
      <p>UPS values fall in 1 of the 4 quadrants of the UPS grid,
representing a simplified range of human emotions: Happy, Sad, Angry
and Content (see Figure 3). In this figure, the circles represent a
history of the UPS values for all utterances so far, allowing a third
party observer to track the user’s emotional state as the dialogue
progresses.</p>
      <p>PRIMER tracks the UPS at 2 levels: the utterance level, which
represents the UPS of a given utterance, and the global level, which
is representative of the user’s overall emotional state. Global UPS
is calculated based on the functions in Equations 1 and 2. It is this
tracking of both the Global UPS and utterance level UPS that allows
PRIMER to exhibit the kind of emotionally nuanced responses
previously mentioned.</p>
      <p>
        Arousalnew = α ∗ Arousalold + (1 − α ) ∗ Arousalut t
V alencenew = α ∗ V alenceold + (1 − α ) ∗ V alenceut t
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>
        Like [12], we started with a lexical approach to sentiment
analysis in order to detect a user’s perceived emotional state (UPS).
However, the system is designed so that other more sophisticated
means of emotional detection could be introduced, and the range of
emotions the system is designed to detect could be expanded,
allowing for more nuanced responses to the user’s perceived emotional
state.
In addition to the VR environment described above, PRIMER
includes a separate Graphical User Interface, referred to as the
"Dashboard" (see Figure 4). The goal in designing the Dashboard
visualization was to aid a third party, in this case a counselor or teacher,
in understanding the emotional state of the user as well as inform
them as to how the system derived that information. It is a visual,
easy-to-understand, look behind the curtain of the system. The
Dashboard was designed to be monitored in real-time, and allow
for real-time adjustments to be made on the fly.
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) Manual Override sliders to afect the user’s valence and
arousal scores (see Figure 8)
      </p>
      <p>The UPS was designed to be calculated from both the user’s
input, in this case a bullied child, as well as by manual input, in
this case a counselor or teacher. For this initial prototype, it was of
particular importance for a trained, experienced clinician to have
the ability to include observations of the user’s emotional state in
the calculation of the UPS.
5</p>
    </sec>
    <sec id="sec-10">
      <title>DIALOGUE PROCESSING</title>
      <p>The dialogue system for PRIMER was developed using the
NPCEditor. As mentioned above, the NPCEditor is a text classification and
dialogue management system. It receives the text of the user’s
utterance transcribed by the ASR, and produces a ranked list of potential
responses for each classification domain specified in the training
data. NPCEditor is capable of classifying any given response in a
number of diferent classification domains, and for PRIMER these
classification domains included two sentiment classifiers (valence
and arousal), and a response classifier. Each classifier is trained on
a corpus of richly annotated training data. Training data consists
of a set of potential user utterances (inputs), which are linked to
appropriate system outputs, and may also be annotated with
additional information (more on the process of crafting the training
data corpus in a later section).</p>
      <p>For PRIMER, the three separate text classifiers were trained
for each phase of the dialogue using NPCEditor (see section on
Dialogue Structure). The first two (sentiment classifiers) use the
text content of the user’s utterances to establish the utterance’s
emotional content, i.e., Valence and Arousal. The third classifier
(response classifier) ranks all system responses based on their
appropriateness to the user’s utterance. In this way, for any given user
utterance, the NPCEditor produces a ranked list of system outputs
for each classifier.</p>
      <p>The NPCEditor also incorporates a customizable rule-based
dialogue manager (DM). For PRIMER, the DM combines the outputs
from all three classifiers to decide which response is the most
appropriate to present to the user. The DM models a conversation as
a finite state chart with the individual chart states corresponding
to dialogue progress. It tracks the user’s progress by switching
between the states as it detects or initiates shifts in the dialogue. It
also tracks the Global UPS by incrementally updating the valence
and arousal values using the sentiment classifiers’ outputs. The
response selection is based on 1) the response appropriateness score
from the response classifier; 2) the emotional state of the
conversation; 3) the current dialogue state and the set of state-specific rules
associated with it.
6</p>
    </sec>
    <sec id="sec-11">
      <title>SYSTEM DEVELOPMENT</title>
      <p>In addition to the development of the Virtual agent, environment,
and interfaces detailed above, the development of the PRIMER
system involved content development with subject matter experts,
the development of the dialogue structure, and the development of
the training data for the classifier and DM.
6.1</p>
    </sec>
    <sec id="sec-12">
      <title>Content Development</title>
      <p>In order to inform the development of a virtual agent who could
be adaptive to the needs of children who have been the victims
of bullying, the content development process for PRIMER began
with subject matter expert (SME) consultations with a psychologist
and school counselor who were familiar with the issue of school
bullying. The SMEs provided invaluable insight into the specific
issues surrounding providing outreach to children who have been
the victims of bullying, and helped inform the training data and
dialogue structure of the system.</p>
      <sec id="sec-12-1">
        <title>Speaker</title>
      </sec>
      <sec id="sec-12-2">
        <title>Ellie</title>
      </sec>
      <sec id="sec-12-3">
        <title>Taylor</title>
      </sec>
      <sec id="sec-12-4">
        <title>Ellie</title>
      </sec>
      <sec id="sec-12-5">
        <title>Ellie</title>
      </sec>
      <sec id="sec-12-6">
        <title>Taylor</title>
      </sec>
      <sec id="sec-12-7">
        <title>Ellie</title>
        <p>
          Utterance
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Sounds like you’re feeling angry.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) I’m not angry, I just don’t get it, I just
wish things would go back to how they were
last year.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) That’s understandable.
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) What do you do to feel better when
you feel sad?
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) What do you mean?
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) People do all diferent things to help
themselves feel better.
        </p>
        <p>Is there anything you do to make yourself
feel better?</p>
        <p>The system content for PRIMER is divided into 3 sections: user
utterances, system responses, and valence and arousal scores. The
core content for this system consisted of a demo script developed
to be representative of a typical counseling session between a high
school student and a school counselor. This script represents the
user utterances and the system responses. For the purposes of this
demo, the script was written to represent the 2nd session between
Ellie and Taylor. An excerpt of this script can be found in Table 1.</p>
        <p>The user utterances consist of a small set of 99 utterances, most of
which were hand authored during the script development process
to be representative of how a young person might talk to the system.
Additionally, some utterances were authored during the system
testing process, and tended to be slight variations or rephrasings of
the user utterances taken from the demo script. These variations
were added to the system content in order to add robustness to the
UPS detection. Table 1 shows some examples of the user utterances
in the training data.</p>
        <p>The system responses consist of a set of 97 utterances which were
authored during the script development process as responses to
the user utterances (refer again to Table 1 for examples). The set
of possible system responses was designed to enable Ellie not only
to respond appropriately to the user, given their current UPS, but
also to allow her to lead the user through the planned dialogue
phases, achieving the mixed-initiative dialogue interaction that is
so vital to the user experience PRIMER aims to provide. Once the
demo script was finalized the system responses were recorded, so
that Ellie would have a real emotionally nuanced human voice.</p>
        <p>The valence and arousal scores represent the set of all possible
values of valence and arousal which could occur in each dialogue
phase, explained in further detail in discussion of Dialogue
Structure. This set of valence and arousal values was represented in the
dialogue system in the same manner as Ellie’s verbal responses, as
part of the training data corpus.
6.2</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Mixed-Initiative Dialogue Structure</title>
      <p>Part of the motivation behind making PRIMER a mixed-initiative
dialogue system was to ensure that Ellie could guide the user through
a pre-determined set of 7 conversational phases, which would mimic
a typical counselling session provided by a school counselor. The
goal was to create a semi-structured conversation in which the
user was free to say whatever they would like at any given point,
while still enabling the system to exert a measure of control about
which topics were being discussed, and the direction in which the
interaction progressed. The conversational phases are shown in
Figure 9. In each phase, the system tracks the user’s UPS, as well as
the number of user utterances and the length of those utterances,
and there is a certain threshold of emotional and linguistic
information that must be reached before it will advance to the next phase.
These thresholds difered from phase to phase and were defined in
the dialogue manager.</p>
      <p>It should be reiterated that for the purposes of this
proof-ofconcept system, the dialogue structure was designed with the
assumption that the system was interacting with a user with whom
it had previous interactions. This is apparent in the description
of the "Review and Probe" phase below, in which the system will
inquire about the user’s current emotional state, as compared to
previous sessions. A future developmental goal of PRIMER is to
create user profiles for each individual user including UPS
information from past sessions, in order to build on that information in
each following session. This was, however, not implemented in the
proof-of-concept.</p>
      <sec id="sec-13-1">
        <title>Phase 1: Ask Status</title>
        <p>In the first phase of the conversation, Ellie attempts to establish
a baseline UPS for the user by asking them about their current
mood. In this phase, if the user gives only very short, neutrally
valenced responses, Ellie will prompt the user for more information.
Once the system is satisfied that an emotional baseline has been
established, it will prompt an initiative and Ellie will ask the user
how they feel compared to last week. This initiative signals the
beginning of the Review and Probe phase.</p>
      </sec>
      <sec id="sec-13-2">
        <title>Phase 2: Review and Probe</title>
        <p>
          During the Review and Probe phase, Ellie probes the user to
provide more detailed information about their current mood, as
compared to the last session. In this phase of the dialogue, once the
user’s global UPS has reached a certain threshold in any quadrant,
an initiative will be prompted in which Ellie will attempt to confirm
with the user what their current emotional state is (see Table 1), line
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ). This is done with the intent to encourage the user to discuss their
feelings in more detail, and signals the beginning of the Discuss
Feelings phase.
        </p>
      </sec>
      <sec id="sec-13-3">
        <title>Phase 3: Discuss Feelings</title>
        <p>In the Discuss Feelings phase, Ellie will continue to prompt
the user to talk about their feelings for a few dialogue turns. In
this phase the system pays attention to the length of the user’s
utterances and Ellie will prompt the user for more info if they are
only providing very short responses to her inquiries. Once the
user has provided a few utterances of suficient length, a system
initiative will be triggered in which Ellie asks the user how they
deal with their feelings, signaling the beginning of the Discuss
Strategies phase.</p>
      </sec>
      <sec id="sec-13-4">
        <title>Phase 4: Discuss Strategies</title>
        <p>In the Discuss Strategies phase, the system attempts to get the
user to talk about the strategies they use to cope with their negative
feelings. In this phase the system is once again paying attention
to the length of the user’s utterances, and will continue to prompt
the user for more information until it feels it has received suficient
information from the user. At that time, a system initiative will
be triggered in which the system asks the user if they have any
questions, signaling the beginning of the Probe for Questions phase.</p>
      </sec>
      <sec id="sec-13-5">
        <title>Phase 5: Probe for Questions</title>
        <p>The Probe for Questions phase is a very brief phase of the
conversation designed to give the user the option to ask Ellie specific
questions about coping with negative emotions or with bullying
behavior. If the user responses in this phase indicate they have no
questions for the system, this will trigger an initiative in which
Ellie suggests the user partake in a coping exercise, signaling the
beginning of the Coping Exercise phase.</p>
      </sec>
      <sec id="sec-13-6">
        <title>Phase 6: Coping Exercise</title>
        <p>In this proof-of-concept version of the system, this particular
dialogue phase was not fully implemented, and the system operates
under the assumption that a user will respond in the negative when
Ellie proposes a coping exercise. The current system could be easily
modified to support this interaction during this phase, however at
present, the system will be expecting to move through it without
such an interaction. In this phase, all user input will prompt simple
feedback from Ellie and then trigger an initiative designed to bring
the conversation to a close (see Table 2). This signals the beginning
of the Closing phase.</p>
      </sec>
      <sec id="sec-13-7">
        <title>Phase 7: Closing</title>
        <p>In the closing phase, the system will end the conversation, but
encourage the user to come back for further support if needed.</p>
        <p>In addition to the conversational initiatives mentioned above,
the system was also capable of taking initiative during long periods
of silence in which no utterance was received from the user. During
periods of silence, Ellie may prompt the user for more information
about their previous statement, or make a general inquiry such
as "Is there anything you want to tell me?" which is designed to
get the user talking again. This feature was designed to keep the
interaction going during times when the user may not know what</p>
        <sec id="sec-13-7-1">
          <title>Speaker Utterance</title>
          <p>
            Ellie (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) I recommend we do a roleplay exercise
to practice strategies for the next time you’re in
a similar situation. Okay?
Taylor (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) Not now. I’ve gotta go to class.
          </p>
          <p>
            Ellie (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) Your parents, teachers, counselors,
          </p>
          <p>and I are all here to help you.</p>
          <p>
            Taylor (
            <xref ref-type="bibr" rid="ref4">4</xref>
            ) Ok, bye.
          </p>
          <p>
            Ellie (
            <xref ref-type="bibr" rid="ref5">5</xref>
            ) Bye for now, I’m here when you need me.
Table 2: Transition between Coping Exercise and
Closing phases
to say, and is another way in which the system itself drives the
conversation.
6.3
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Training Data</title>
      <p>Once the initial content development process was done, the demo
script was used to create an annotated corpus of training data.
Training data for PRIMER consisted of the set of user utterances
annotated with valence and arousal scores, and in some cases with
appropriate system responses. The valence and arousal scores for
each utterance were chosen based on the authors’ intuition. In a
small number of specific cases, user utterances were linked directly
to a system response, but the vast majority of user utterances were
annotated solely with information about valence and arousal scores.
In this way, the main classifier output for any given user utterance
is the ranked lists of arousal and valence scores, as well as a ranked
list of potential system responses. The top arousal and valence
scores are used to update the UPS according to Equations 1 and 2,
respectively. The information provided by these three classifiers is
then used by the DM in order to chose appropriate response at any
given phase of the dialogue.</p>
      <p>In order to facilitate this process, as well as facilitating the
mixedinitiative dialogue style, the system utterances were annotated with
information about their type, domain, and whether they signify the
beginning of a new conversational phase (toss). Examples of system
responses and their type and domain annotations can be found in
Table 3.</p>
      <sec id="sec-14-1">
        <title>Response Text</title>
        <p>You wanna tell
me anything?
That’s good.</p>
      </sec>
      <sec id="sec-14-2">
        <title>Type</title>
        <p>probe_for_info</p>
      </sec>
      <sec id="sec-14-3">
        <title>Domain</title>
        <p>ask_status
positive_feedback
discuss_feelings</p>
        <sec id="sec-14-3-1">
          <title>Type</title>
          <p>The type refers to the general intention of the system response.
Table 2 shows the type "probe_for_info", in which the system asks
the user for more information, and "positive_feedback" which is
related to a response of positive feedback, such as "That’s good".
By annotating utterances with type information, the DM is able to
choose a random utterance from within the set of utterances of a
given type. This provides a more realistic interaction, by ensuring
that each time a user interacts with the system, it provides slightly
diferent responses, even to the exact same input.</p>
        </sec>
        <sec id="sec-14-3-2">
          <title>Domain</title>
          <p>The domain is a constraint set on each response as to which
phase of the conversation it can appear in. In the examples in Table
2, the first response "You wanna tell me anything?" is constrained
to the initial phase of the dialogue "Ask Status" (see Figure 9). These
domain annotations enabled the system to provide nuanced
responses to similar user utterances, during diferent phases of the
conversation. It should be noted, however, that the same system
response could appear in two diferent domains. In order to achieve
this, there would need to be two instances of this response in the
training data, each with it’s own unique domain annotation. In this
way, the system was not limited to a unique set of responses in
each conversational phase.</p>
          <p>Additionally, a subset of a certain type of utterance could be
specified for each given conversational phase. For example, responses of
the type probe_for_info can be found in both the ask_status domain,
and the review_and_probe domain. The probe_for_info response in
the ask_status domain is very general, such as "You wanna tell me
anything?". The probe_for_info responses in the review_and_probe
domain ask the user to elaborate on something they have already
said, such as "Can you tell me more about that?". This strategy
allowed for the training data to define broader "types" of utterances
that can appear in more than one conversational phase, and
decisions about which utterances can appear in each phase can be
further restrained by using domain annotations.</p>
        </sec>
        <sec id="sec-14-3-3">
          <title>Toss</title>
          <p>Certain system responses were annotated with a "toss", which
would indicate to the system that it should enter the next
conversational phase. When the dialogue manager detects a toss on a certain
response, it will update the internal state tracking, which specifies
the particular set of rules for choosing responses within a given
conversational phase. This is the mechanism that allows the system
to provide diferent and nuanced responses to the user input in
diferent conversational phases, even if that input is identical to
input received in a previous phase.
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>INITIAL TESTING</title>
      <p>In initial testing of this demo system PRIMER showed promise at
detecting and tracking the user’s emotional state, and providing
appropriate verbal and non-verbal feedback. Testing of PRIMER
using the script developed specifically for the initial proof-of-concept
demonstration revealed the system to be capable of appropriate
emotional detection and adaptation to the user’s emotional state.
Further, a broader, albeit informal, test of the system suggests that
PRIMER also shows promise in correctly classifying the emotional
context of novel utterances, and providing appropriately adaptive
feedback as well.</p>
      <p>Table 4 shows an excerpt from an "of-script" interaction with
the PRIMER system. In this interaction, the majority of the user
utterances were not a word-for-word match to any of the training
data, yet the system was still capable of having a reasonably
coherent conversation, and showing appropriate emotional awareness.
The system was also successful at guiding the user through the</p>
      <sec id="sec-15-1">
        <title>Speaker Utterance</title>
        <p>
          User (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Hi.
        </p>
        <p>
          Ellie (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Hello, how are you today?
User (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Good.
        </p>
        <p>
          Ellie (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) Please continue.
        </p>
        <p>
          User (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) There’s not that much more to
        </p>
        <p>say, it’s just a normal day.</p>
        <p>
          Ellie (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) Can you tell me if things have
        </p>
        <p>
          improved for you?
User (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ) I think things are generally better, yes.
Ellie (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) Nice!
Ellie (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) I’d like to hear more.
        </p>
        <p>
          User (
          <xref ref-type="bibr" rid="ref10">10</xref>
          ) Well, all my exhausting travel is
        </p>
        <p>ifnally behind me.</p>
        <p>
          Ellie (
          <xref ref-type="bibr" rid="ref11">11</xref>
          ) Can you tell me more about that?
User (
          <xref ref-type="bibr" rid="ref12">12</xref>
          ) : Yeah, I’ve been travelling a lot for
        </p>
        <p>work lately.</p>
        <p>
          Ellie (
          <xref ref-type="bibr" rid="ref13">13</xref>
          ) Wow! That’s great!
User (
          <xref ref-type="bibr" rid="ref14">14</xref>
          ) : Yeah, it has been a lot of fun, but it is
also pretty exhausting to travel so much
Ellie (15) That’s totally normal.
        </p>
        <p>User (16) Yeah, I guess.</p>
        <p>Ellie (17) Sounds like you’re feeling angry.</p>
        <p>User (18) No, I’m not really angry at all.</p>
        <p>Ellie (19) Okay!
Ellie (20) Can you elaborate on that?
User (21) Well you said you thought I was feeling
angry, I’m definitely not angry.</p>
        <p>Ellie (22) Okay! That’s good!.</p>
        <p>Ellie (23) How do you make yourself feel better
when you’re down?
User (24) I guess I like to read.</p>
        <p>Ellie (25) Please tell me more.</p>
        <p>User (26) Reading is just a really good way to
forget about all the pressures of real life</p>
        <p>Ellie (25) Awesome!
Table 4: An excerpt from an of-script interaction with</p>
      </sec>
      <sec id="sec-15-2">
        <title>PRIMER</title>
        <p>conversational phases, although it should be noted that the
interaction does not represent a novel user with no prior knowledge of
the system. This interaction displays the moderate robustness of
PRIMER, which can produce a coherent interaction outside of the
general domain of bullying, but still within the the general dialogue
framework for which it was designed, namely that of a counseling
session.</p>
        <p>During this interaction, there were not any instances of
completely incoherent utterances from the system, however, it does
incorrectly gauge the user’s global UPS, as evidenced by lines (17)
and (18). Here, we can see that the system incorrectly gauged the
user’s UPS as being in the "Angry" quadrant. However, most of
the system’s responses are coherent, and display an appropriate
emotional valence in response to the user’s utterances.</p>
        <p>
          This example also shows how the system leads the user through
the conversation by switching between the diferent conversational
phases. This can be seen in line (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) when the system moves from
the Ask Status phase into the Review and Probe phase by asking the
user "Can you tell me if things have improved for you?" The system
again leads the conversation in line (17) when it coaxes the user
into the Discuss Feelings phase by saying "Sounds like you’re feeling
angry." Although this was not a correct assessment of the user’s
mood, it quickly recovers from this mistake, continuing to provide
appropriately valenced feedback, and soon takes initiative to guide
the user into the Discuss Strategies phase of the conversation by
asking about the user’s coping strategies in line (23).
        </p>
        <p>The initial version of PRIMER was given only limited training
data, and so was tested mainly on a scripted interaction and some
variations rather than a full test with the target population. This
limited informal testing did reveal that the purely lexically-based
sentiment analysis would probably not be a suficient means of
emotion detection, at least not given the relatively small amount
of training data used to train the classifiers for this task. However,
for demonstration purposes, lexical sentiment analysis and the
small training data corpus was suficient to provide the necessary
emotional awareness, and was also robust enough to handle some
of-script interactions as well.</p>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>8 FUTURE WORK</title>
      <p>
        The PRIMER system proved to be a successful proof-of-concept
system, capable of correctly executing the planned demo, and showing
promise as having broader applicability. The next phase of
development for the PRIMER system would include:
• Expanded Training Data and Dialogue Manager: As the
training data was modeled from a script representing the
second interaction of user with the agent, we would need
transition diagrams and training data to support additional
sessions with slightly diferent interaction plans.
• Enabling User Profiles: For a multi-session interaction, we
would want to use information gathered from the user in
previous sessions (e.g. name, main complaint, prior emotional
states and coping strategies), rather than assume it, as was
the case from our initial script for session 2. This is evidenced
by line (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) in Table 4. Here the system asks if things have
"improved", capitalizing on information in the user profile
about their emotional state during the last session.
• Enhanced Emotion Tracking: In order to make the
emotional detection and tracking more robust, PRIMER could be
adapted to accommodate additional means of emotional
detection and tracking, including audio and visual input from
the user. This would allow PRIMER to implement a number
of more sophisticated methods such as eye, gaze, and head
tracking, voice analysis, and non-verbal behavior [13].
• Increased Emotional Granularity: Implementing more
advanced methods of emotion tracking would enable the
reifnement of the currently limited range of emotions PRIMER
is capable of detecting. Using the methods mentioned above,
PRIMER could be modified to detect and adapt to a far
broader range of more subtle human emotional states.
      </p>
    </sec>
    <sec id="sec-17">
      <title>9 CONCLUSION</title>
      <p>We have presented PRIMER, a novel user interface for virtual
human interactions in virtual reality that are sensitive to the user’s
emotional state and dialogue behavior, and can be monitored and
guided by an external human observer. A proof of concept
scenario for counseling victims of bullying was presented in which the
agent used mixed initiative dialogue to guide and respond to the
user, while a counselor could observe diagnostic information about
the system’s behavior and the interaction. Results seem promising,
though building out a full system will require additional
development, training data, and possibly improved techniques.</p>
    </sec>
    <sec id="sec-18">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research was sponsored by CableLabs. Some of the authors
were supported in part by the U.S. Army; statements and opinions
expressed do not necessarily reflect the position or the policy of
the United States Government, and no oficial endorsement should
be inferred. Special thanks to our subject matter experts (Todd
Adamson PsyD, California School of Professional Psychology and
Bill Lemiueux, Milwaukee inter-city teacher/counselor), the team
responsible for the creation of the virtual agent and VR environment
(Jamison Moore, Adam Reilly, Dimitar Tzvetanov, Robert Weaver,
Peter Walters, Wendy Whitcup, Joe Yip), and Angela Nazarian
(UCSC), the vocal talent who brought life to our virtual human,
Ellie.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ruth</given-names>
            <surname>Aylett</surname>
          </string-name>
          , Marco Vala, Pedro Sequeira, and
          <string-name>
            <given-names>Ana</given-names>
            <surname>Paiva</surname>
          </string-name>
          .
          <year>2007</year>
          . FearNot!
          <article-title>- An Emergent Narrative Approach to Virtual Dramas for Anti-bullying Education</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          <volume>4871</volume>
          (
          <year>2007</year>
          ),
          <fpage>202</fpage>
          -
          <lpage>205</lpage>
          . https://doi.org/10.1007/ 978-3-
          <fpage>540</fpage>
          -77039-8_
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Julia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Campbell</surname>
          </string-name>
          , Matthew Jensen Hays, Mark Core, Mike Birth, Matt Bosack, and Richard E. Clark.
          <year>2011</year>
          .
          <article-title>Interpersonal and Leadership Skills: Using Virtual Humans to Teach New Oficers</article-title>
          .
          <source>In Proceedings of Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC)</source>
          <year>2011</year>
          . IITSEC.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Houwei</given-names>
            <surname>Cao</surname>
          </string-name>
          , Arman Savran, Ragini Verma, , and
          <string-name>
            <given-names>Ani</given-names>
            <surname>Nenkova</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Acoustic and lexical representations for afect prediction in spontaneous conversations</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          <volume>29</volume>
          (
          <year>January 2015</year>
          ),
          <fpage>203</fpage>
          -
          <lpage>217</lpage>
          . Issue 1. https://doi. org/10.1016/j.csl.
          <year>2014</year>
          .
          <volume>04</volume>
          .002
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>David</surname>
            <given-names>DeVault</given-names>
          </string-name>
          , Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, Gale Lucas, Stacy Marsella, Fabrizio Morbini, Angela Nazarian, Stefan Scherer, Giota Stratou, Apar Suri, David Traum,
          <string-name>
            <given-names>Rachel</given-names>
            <surname>Wood</surname>
          </string-name>
          , Yuyu Xu,
          <string-name>
            <given-names>Albert Rizzo</given-names>
            , and
            <surname>Louis-Philippe Morency</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>SimSensei Kiosk: A Virtual Human Interviewer for Healthcare Decision Support. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems</article-title>
          . IEEE,
          <fpage>1061</fpage>
          -
          <lpage>1068</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Arno</given-names>
            <surname>Hartholt</surname>
          </string-name>
          , David Traum, Stacy C. Marsella, Ari Shapiro,
          <string-name>
            <given-names>Giota</given-names>
            <surname>Stratou</surname>
          </string-name>
          , Anton Leuski,
          <string-name>
            <surname>Louis-Philippe Morency</surname>
            , and
            <given-names>Jonathan</given-names>
          </string-name>
          <string-name>
            <surname>Gratch</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>All Together Now: Introducing the Virtual Human Toolkit</article-title>
          .
          <source>In International Conference on Intelligent Virtual Humans. Edinburgh</source>
          , UK. http://ict.usc.edu/pubs/All%20Together%
          <fpage>20Now</fpage>
          . pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Deborah</given-names>
            <surname>Lessne</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christine</given-names>
            <surname>Yanez</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Student Reports of Bullying: Results From the 2015 School Crime Supplement to the National Crime Victimization Survey</article-title>
          . National Center for Education Statistics, U.S. Department of Education, and Bureau of Justice Statistics, Ofice of Justice Programs, U.S. Department of Justice. Washington, DC. (
          <year>December 2016</year>
          ). Retrieved from https://nces.ed.gov/ pubsearch/pubsinfo.asp?pubid=
          <fpage>2017015</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Anton</given-names>
            <surname>Leuski</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Traum</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>NPCEditor: Creating Virtual Human Dialogue Using Information Retrieval Techniques</article-title>
          .
          <source>AI Magazine</source>
          <volume>32</volume>
          ,
          <issue>2</issue>
          (
          <year>2011</year>
          ),
          <fpage>42</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Gale</given-names>
            <surname>Lucas</surname>
          </string-name>
          , Jonathan Gratch,
          <string-name>
            <given-names>Aisha</given-names>
            <surname>King</surname>
          </string-name>
          , and
          <string-name>
            <surname>Louise-Philippe Morency</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>It's only a computer: Virtual humans increase willingness to disclose</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>37</volume>
          (
          <year>2014</year>
          ),
          <fpage>94</fpage>
          -
          <lpage>100</lpage>
          . https://doi.org/10.1016/j.chb.
          <year>2014</year>
          .
          <volume>04</volume>
          .043
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Gale</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Albert Rizzo</given-names>
            , Jonathan Gratch, Stefan Scherer, Giota Stratou, Jill Boberg, and
            <surname>Louise-Philippe Morency</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Reporting Mental Health Symptoms: Breaking Down Barriers to Care with Virtual Human Interviewers</article-title>
          .
          <source>Frontiers in Robotics and AI</source>
          <volume>4</volume>
          (
          <year>2017</year>
          ). Issue 51. https://doi.org/10.3389/frobt.
          <year>2017</year>
          .00051
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Lauren</given-names>
            <surname>Musu-Gillette</surname>
          </string-name>
          , Anlan Zhang, Ke Wang, Jizhi Zhang, and
          <string-name>
            <surname>Barbara</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Oudekerk</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Indicators of School Crime and Safety: 2016</article-title>
          . National Center for Education Statistics, U.S. Department of Education, and Bureau of Justice Statistics, Ofice of Justice Programs, U.S. Department of Justice. Washington, DC. (May
          <year>2017</year>
          ). Retrieved from https://nces.ed.gov/pubs2017/2017064.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Albert</surname>
          </string-name>
          '
          <article-title>Skip' Rizzo</article-title>
          and
          <string-name>
            <given-names>Russell</given-names>
            <surname>Shilling</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Clinical Virtual Reality tools to advance the prevention, assessment, and treatment of PTSD</article-title>
          .
          <source>European Journal of Psychotraumatology</source>
          <volume>8</volume>
          (
          <year>2017</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . https://doi.org/10.1080/20008198.
          <year>2017</year>
          . 1414560
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Antonio</given-names>
            <surname>Roque</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Traum</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A model of compliance and emotion for potentially adversarial dialogue agents</article-title>
          .
          <source>In Proceedings of the 8th annual SIGDIAL Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Marc</given-names>
            <surname>Schroder</surname>
          </string-name>
          , Elisabetta Bevacqua, Roddy Cowie, Florian Eyben, Hatice Gunes, Dirk Heylen, Mark ter Maat,
          <string-name>
            <surname>Gary</surname>
            <given-names>McKeown</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Sathish</given-names>
            <surname>Pammi</surname>
          </string-name>
          , Maja Pantic, Catherine Pelachaud, Bjorn Schuller, Etienne de Sevin, Michel Valstar, and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Wollmer</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Building Autonomous Sensitive Artificial Listeners</article-title>
          .
          <source>IEEE Transactions on Afective Computing</source>
          <volume>3</volume>
          (
          <year>October 2011</year>
          ),
          <fpage>165</fpage>
          -
          <lpage>183</lpage>
          . Issue 2. https://doi.org/10.1109/T-AFFC.
          <year>2011</year>
          .34
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>William</given-names>
            <surname>Swartout</surname>
          </string-name>
          , David Traum,
          <string-name>
            <given-names>Ron</given-names>
            <surname>Artstein</surname>
          </string-name>
          , Dan Noren, Paul Debevec, Kerry Bronnenkant,
          <string-name>
            <given-names>Josh</given-names>
            <surname>Williams</surname>
          </string-name>
          , Anton Leuski, Shrikanth Narayanan, Diane Piepol,
          <string-name>
            <given-names>H. Chad</given-names>
            <surname>Lane</surname>
          </string-name>
          , Jacquelyn Morie, Priti Aggarwal, Matt Liewer,
          <string-name>
            <surname>Jen-Yuan</surname>
            <given-names>Chiang</given-names>
          </string-name>
          , Jillian Gerten, Selina Chu,
          <string-name>
            <given-names>and Kyle</given-names>
            <surname>White</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Virtual Museum Guides demonstration</article-title>
          .
          <source>In Proceedings of the 2010 IEEE Spoken Language Technology Workshop</source>
          . IEEE. https://doi.org/10.1109/JPROC.
          <year>2012</year>
          .2236291
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>