<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Situation Awareness of Conversational Assistants in the Age of LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shih-Hong Huang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chieh-Yang Huang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hua Shen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuxin Deng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ting-Hao 'Kenneth' Huang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carnegie Mellon University</institution>
          ,
          <addr-line>Pittsburgh, PA 15213</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MetaMetrics Inc.</institution>
          ,
          <addr-line>Durham, NC 27701</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Pennsylvania State University</institution>
          ,
          <addr-line>University Park, PA 16802</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Washington</institution>
          ,
          <addr-line>Seattle, WA 98195</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Large language models (LLMs) like ChatGPT enable near-human interaction, yet meaningful, lengthy dialogues need more than just delivering information. This paper argues that future conversational assistants should be aware of users' situations and alternate the conversation format based on the real-world situations. Through a Patrol Study, we demonstrate that users modify their communication approaches depending on their situations. Participants engaging in information-seeking conversations via WhatsApp while patrolling a building preferred voice messaging over text. This paper lays the groundwork for situation awareness in conversational assistants. The enhanced AI capabilities of LLMs make addressing HCI challenges essential to enable human-like, meaningful conversations beyond just providing information and generating fluent responses.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Conversational Assistant</kwd>
        <kwd>Situation Awareness</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Background</title>
      <p>With tools powered by large language models (LLMs) becoming increasingly accessible, more users are
turning to them for assistance with various tasks. One of the most common ways users interact with
LLMs is through agents or assistants that understand natural language. These agents allow users to
either outsource tasks entirely or request partial assistance. However, most existing LLM assistants
rely heavily on text interaction and require users to type their requests. Such reliance on text-based
interactions often assumes that users are fully focused on a single task, seated at a computer, or have
easy access to the necessary resources. Popular systems like ChatGPT, Claude, Gemini, DeepSeek, and
Grok allow users to navigate their information needs, but still primarily operate in a text-based format
as mentioned.</p>
      <p>
        Users can covnerse with advanced LLMs almost as if with another human. However, many challenges
emerge when incorporating LLMs into deployed conversational systems. A study showed that over
30% of conversations were erroneous, and nearly 30% of those erroneous conversations resulted in
breakdowns when users tried to talk to GPT-3.5 via an Echo device [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Holding human-like, lengthy
conversations requires more than just delivering information and producing fluent responses. In this
paper, we argue that future conversational assistants, designed to assist users through text or voice in a
turn-taking fashion, should learn to be aware of users’ situations and alternate the conversation
format based on the situation. We define “format” as the general attributes of a conversation– such
as input modality and conversation length– separate from the primary content the conversation aims to
convey.1 For example, if the assistant detects that a user is walking outdoors, it should deliver shorter
sentences, speak louder, and expect voice input rather than text messages from the user. When the
AutomationXP25: Hybrid Automation Experiences, April 27, 2025, Yokohama, Japan. In conjunction with ACM CHI’25
∗Corresponding author.
      </p>
      <p>
        © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
1We deliberately chose to use “situation” instead of “context” in our work. This is because “context” often includes dialogue
content, like previous conversation history, in dialog systems and natural language processing literature, which is not our
focus in this work.
assistants know the user is near their computer, the system might send an email or a Slack message
rather than a notification via their Echo device. Furthermore, if the topic of conversation is perceived
as less urgent or unengaging, the assistant should try to keep the exchange brief. We call this capability
situation awareness in conversational agents. This capability difers from the personalization in
recommender or dialogue systems: In the cases of personalization, the same query from diferent users
with varied contexts (such as search history, preferences, location, and age) yields diferent results [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Our focus is not on personalizing the system’s responses. Instead, we emphasize that once a system
has formulated a response, it should determine the best way to deliver it based on the user’s situation.
Although changing the conversation format can sometimes modify the content, it does not typically
alter the core message and can be handled through simple paraphrasing or minor adjustments.
      </p>
      <p>
        In the field of conversational assistants, much research has been devoted to understanding the
broader context in which users interact with these systems, including their use in everyday household
scenarios [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Within Ubiquitous Computing, a significant body of work has been done on
“contextaware computing” [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This research often revolves around enhancing conversational assistants with
the ability to sense user context [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. For example, using ambient acoustic data to detect human
activities like typing or walking [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and utilizing respiration sensor to assist users in managing breathing
patterns [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Likewise, in Afective Computing, eforts have been made to adjust conversational strategies
based on detected user emotions– in which the “context” is the emotion– whose goals were often to
resonate with users’ emotions and thus enhance user engagement [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10, 11, 12, 13</xref>
        ]. However, despite
intriguing research that took user context into account, the majority of dialogue systems research,
including recent LLM-based chatbots, remained largely disconnected from such considerations [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15,
16</xref>
        ].should we put some dialogue system research and gpt papers here sure In contrast, the natural
language processing (NLP) and dialogue system communities focus primarily on producing accurate
responses without considering the user’s specific situation; most benchmark datasets for dialogue
systems lack user context information [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17, 18, 19</xref>
        ]. The phrase “context-aware” conversational systems
in NLP literature often refers to those that consider the domain or user’s chat history, emphasizing
the content of the conversation over the user’s situation [
        <xref ref-type="bibr" rid="ref20 ref21 ref22 ref23">20, 21, 22, 23</xref>
        ]. Consequently, the systems
considering user context have typically been developed only as ad-hoc projects with specific sensing
capabilities, such as emotion-detection or environmental-sensing features [
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ].
      </p>
      <p>In this paper, we respond to this gap by advocating a separation of conversation format from
content, with situational information mainly influencing format . This separation ofers two-fold
advantages: First, it ensures that efective existing dialogue systems like ChatGPT maintain their
focus on content. This way, they can continue enhancing their capabilities using current datasets,
model frameworks, and infrastructure while potentially benefiting from additional situation awareness.
Second, it provides a straightforward path for researchers focused on user context to leverage the
advancements of LLM-powered chatbots in their studies. With the impressive capabilities of modern
LLMs, tailoring a response to suit a user’s situation is now more attainable than ever.</p>
      <p>To emphasize our unique approach, we have chosen to use “situation” instead of more commonly
used terms like “context” or “scenarios.”</p>
      <p>
        We establishes our argument through a study focusing on a specific attribute of conversation format,
to highlight the necessity for conversational assistants to adapt their formats according to user situations.
Namely, users modify their communication approaches based on their situations, so situation
awareness in assistants enhances their communicative reciprocity. The Patrol Study, validates this by
involving participants in a walking task where they patrolled a building’s interior while concurrently
engaging in information-seeking conversations with a remote human helper via WhatsApp, using either
voice or text messages. Despite the predominant use of text messaging in WhatsApp (over 90%) [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ],
the majority, when placed in this situation, preferred voice messaging.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Patrol Study: Efects of Using Voice Interfaces to Receive Remote</title>
    </sec>
    <sec id="sec-3">
      <title>Help</title>
      <p>
        The Patrol Study focused on one attribute of conversation format: the user’s input modality. The
objective of this study is to substantiate the argument that users adapt their communication
behavior according to their situations, in particular, altering their preferred input modality in
specific situations. WhatsApp Messenger, a widely-used instant messaging application, was utilized as
the platform for this study. Notably, over 90% of messages were conveyed in text form, with a mere 7
out of every 100 billion messages being voice communications [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. We hypothesize that users will
deviate from their default preferences under specific circumstances.
      </p>
      <p>Study Design. Figure 1 shows the overview of the Patrol Study. The two main components of the
study were (i) information-seeking and (ii) room-checking. The information-seeking task required
users to interact with a remote helper to ask for help on certain tasks and questions. Room-checking
required users to walk around a three-story university building looking for certain rooms or research
labs to check the availability of the rooms. We used the room-checking task to create a realistic, daily
scenario– like conversing while navigating a building– and to potentially make participants prefer
voice over text. For each session, we asked participants to simultaneously reach out to a remote helper
for help on the information-seeking task while they were doing the room-checking task.2 A shared
Google Doc was updated by the remote helper and checked by the user in order to give further requests
or ask clarifying questions. Each session took a total of 25 minutes. Users spent the first 15 minutes
performing the room-checking and information-seeking tasks simultaneously. One research team
member, dubbed the in-person helper, accompanied the users while they walked around the building to
provide help if necessary, waiting at the end of the hallway. The in-person helper kept track of the time
and told users when the 15 minutes were up. The in-person helper provided only logistics-related help
outside the scope of the study, such as unexpectedly locked entrances. After the 15-minute patrol, the
participant returned to the research lab and prepared for a short oral presentation on the information
they obtained with the help of the remote helper. Users then had up to seven minutes for preparation
and three minutes to present their findings. The details of information-seeking (Appendix A.2) and
room-checking (Appendix A.1) tasks can be found in the Appendix.
2We intentionally informed participants that they were conversing with a human rather than conducting a Wizard-of-Oz study.
This decision stemmed from our pilot studies, which indicated that, in such open-ended conversation settings, concealing the
fact that participants were interacting with a human proved to be notably challenging.</p>
      <p>Study Procedure. The study consisted of five sessions: a pre-study session, two interactive sessions,
one multitasking session, and one evaluation session (see Figure 3 in Appendix A).
1. The pre-study session introduced the study and included a short tutorial.
2. Two Interactive Sessions (Text Condition / Voice Condition): At each interactive session,
users were asked to perform the information-seeking and room-checking tasks and to reach out
to the remote helper via WhatsApp to help solve the information-seeking task. They used texting
in one session and voice messaging in the other. They were able to check the progress of the
remote helper through a shared Google Doc. Additional questions regarding the remote helper’s
progress were communicated through WhatsApp. At the end of each interactive session, users
were asked to verbally summarize what they learned from the information-seeking task.
3. Multitasking Session (No-Help Condition): The multitasking session required users to
perform both the information-seeking and room-checking tasks without the help of the remote
helper and to verbally summarize the information they gathered.
4. Users filled out questionnaires about the study during the evaluation session.</p>
      <p>The order of the interactive session with text input (Text Condition) and interactive session with
voice input (Voice Condition) was randomized for each participant. Conversation logs between the
user and the remote helper were collected for both interactive sessions. Verbal summarization given by
users during each session was recorded in audio form for further analysis. It took each participant 1.5-2
hours to complete the entire study, and participants were compensated with $30.00. This study was
approved by the IRB ofice of the institute of the authors.</p>
      <p>Participants. The participants for the Patrol Study were recruited through personal networks and
university mailing lists. A total of 16 individuals were recruited as users of the study (participants were
ID coded P1-P16 in the following paper): seven males and nine females. Fourteen of the participants
were between 18 and 35; two participants were above the age of 36. The majority of the participants
were undergraduate and graduate students at the university. Among all participants, only two had no
prior experience using virtual assistants of any kind (e.g., Siri, Google Assistant, or Alexa). Participants
were informed that they would be interacting with a human assistant during the study instead of an
automated agent. Participants were not aware of whether our study focused on voice or text assistance.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Results</title>
      <sec id="sec-4-1">
        <title>3.1. The voice interface was preferred over the text interface.</title>
        <p>In the post-study survey, we asked participants to rate how satisfied they were when using text and
voice on a five-point Likert scale from Very Dissatisfied (1) to Very Satisfied (5). Results shown in
Table 1 indicate that the participants were satisfied with both conditions. We then asked participants
to directly compare using text and voice to interact with the assistant in the context of the study
on a five-point Likert scale from Much Worse (1) to Much Better (5). The average score was 3.857.
Figure 2 shows the histogram of the responses and indicates that although users were satisfied with
both communication interfaces, they preferred the voice interface over the text interface. These results
validate our hypothesis: Despite WhatsApp’s predominant use of text messaging, participants,
when placed in certain specific situations, diverge from their default behaviors to favor voice
messaging.</p>
        <p>We asked participants to elaborate on their ratings for the comparison between text and voice
interaction in the survey. P6 commented, “I needed to take care of typos to deliver my message clearly,
which is time-consuming. Also, I felt dificult to text in walking, while not hard to send a voice message in
walking.” Others shared similar experiences when comparing using text or voice to communicate with
the remote helper. P7 said, “It’s hard to type and walk at the same time! I prefer voice because I can talk
and walk much easier.” P9 also said, “When texting, I needed to concentrate on what I typed, when using
voice, it was easier to speak the query I wanted and needed less efort.”</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. The participants tended to speak more when using voice.</title>
        <p>To understand user behavior when using diferent communication interfaces, we calculated the number
of sentences and words used in the conversation. Table 2 shows conversations statistics. We found
that when using voice, participants tended to say more words in each sentence (number of words per
sentence: 7.379 for text vs. 10.728 for voice) but use slightly fewer sentences in each conversation
(number of sentences per session: 6.833 for text vs. 5.833 for voice). Overall, participants said more
words in each session (number of words per session: 45.333 for text vs. 62.250 for voice). We include
a few conversations in Appendix B. These findings suggest that users modify their behavior in
diferent situations .</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Positive preferences about using voice for other similar situations.</title>
        <p>We also asked, “Think about day-to-day scenarios like running errands and walking between buildings,
but you need to complete some other tasks at the same time. How likely would you choose to use
voice instead of text to interact with a remote assistant to seek for help?” with a five-point Likert scale
from Very Unlikely (1) to Very Likely (5). The average score was 4.357, suggesting that most of the
participants felt positive about using voice in other situations. Participants in general agreed that the
voice interface is easier to use when doing other tasks. For example, P8 said, “Because voice is easier
to communicate and lesser eforts over text;” P7 reported, “When using voice I can concentrate on my
surroundings better. Using text requires me to look at the screen and type;” P9 said, “Using a voice assistant
reduces the amount of work you need to do in terms of typing;” and P12 commented, “It is much easier
to use voice than text while performing other tasks.” However, participants also pointed out that the
voice interface may be inappropriate in some situations. For example, P4 said, “Voice may be dificult
to use in public;” and P3 said “I think it would still depend on how comfortable I am speaking given my
surroundings.” P15 adopted a neutral stance: “In my opinion, texting and using voice is identical. It doesn’t
have much diference between them. Because, with current technology, the voice technology is not that
good enough with accuracy. So it might be the same as well.”</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion</title>
      <p>This paper introduces a study to advocate integrating situation awareness in conversational assistants.
We demonstrate that users adjust their communication methods according to their situations, validating
that assistants can communicate more efectively by recognizing and adapting to these situations.</p>
      <p>The concept of “situation” warrants clarification. While “situation” and “context” can have varied
meanings across diferent domains, like ubiquitous computing, afective computing, and NLP/dialogue
systems, this work intentionally separates the content and format of conversation. We focus on adjusting
the conversation format using factors external to content, countering the mainstream dialogue system
research’s emphasis on content, and underscoring the importance of meta aspects. Enabling extensive,
human-like conversations involves more than merely disseminating information and crafting fluent
responses.</p>
      <sec id="sec-5-1">
        <title>4.1. Limitations</title>
        <p>
          Generalizability. Although we attempt to create a scenario that emulates real life, there are still
intricacies that are dificult to replicate. For example, the level of urgency while the task is being
completed. Another significant factor is personal preference, as some users are just inclined to use one
modality over others, which was reflected in the post-study questionnaire. While we chose questions
that are more complex than typical voice commands, they were still generic and did not relate to the
users on a personal level. We are aware that people care about highly social and personal questions [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ],
which can lead to reduced engagement with the questions we provided. Furthermore, privacy is a major
concern if we are to ask really personal questions.
        </p>
        <p>Limitations of Human Ratings. In our study, we had MTurk workers and Toloka workers rate the
quality of transcriptions of the conversations as opposed to the audio. This setup did not capture the
extra contextual information passed to remote helpers via voice. Furthermore, the third-party ratings of
a conversation only reflected the perceived quality rather than the speaker’s experience. We attempted
to mitigate this gap by averaging the ratings collected from ten distinct workers, but it is still possible
that the owner of the message disagrees with aggregated social perception.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>This paper introduces a study that advocates for the situation awareness of conversational assistants.
The Patrol Study illustrates that users adapt their communication strategies based on their respective
situations, suggesting that assistants, aware of these situations, could better fulfill user needs. We
argue for developing future conversational assistants that can recognize user situations and accordingly
adjust conversational formats. Looking ahead, we aim to explore the potential for conversational
assistants to automatically detect situations and make strategic communication decisions. Just as we do
not explicitly instruct our friends about conversational preferences, we should not need to configure
conversational assistants during each interaction. Leveraging the enhanced AI capabilities of LLMs to
address HCI challenges is crucial for facilitating human-like, meaningful conversations that transcend
mere information provision and fluent response generation.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Details of PATROL STUDY</title>
      <sec id="sec-8-1">
        <title>A.1. Room-checking Task</title>
        <p>In the room-checking task, we asked users to conduct a 15-minute patrol task inside a university
building. How we created the room list, provided introductions to users, and conducted the patrolling
process is described below.</p>
        <p>Creating the Room Lists. Firstly, we created a list of room numbers of the building. The building
has 200,000 square feet of floor area and three floors. Users could access two elevators and multiple
staircases at all times. A total of 30 rooms were selected for users to navigate. The selected rooms
were classrooms, research labs, and meeting rooms spread across all three floors of the building; all
had windows with a view from the hallway, as the study required participants to look in the window
to take notes on how many people were in the room. The 30 rooms were randomly distributed into
three lists containing similar numbers of rooms on each floor so that each user was required to travel
approximately the same distance. The order of rooms within each list was randomized. The goal was to
have all the lists require the same level of physical and cognitive efort from the participants. The study
consisted of three room-checking sessions; therefore, it was ideal to keep the variation of dificulties
between lists as minimal as possible. We also asked users to take their time and check as many rooms
on the list as they possibly can without rushing, even though we did not intend for the users to finish
the entire list within the 15-minute time frame. Empirical experience showed that five to seven rooms
can be checked in one session under such room arrangements.</p>
        <p>User Instruction Details. We instructed the users to find the listed rooms. Particularly, we asked
them to record the number of people inside the room and record the time they checked. Additionally, we
suggested but did not require that they follow the order of rooms on the list. Users were informed that
the results of the room-checking and information-seeking would be treated equally, and encouraged to
complete both tasks as much as possible.</p>
        <p>Patrolling Process. After the instruction, the users started to navigate the building based on their
room list. We constrained the building navigation task to last 15 minutes. We specifically asked one
author of the paper to provide optional in-person assistance during the user’s patrolling process. Before
the navigation began, the in-person helper briefly introduced the building layout if the users were
not familiar with it. Thereafter, the in-person helper remained in a convenient position ( e.g., around
the elevator) to respond if the users asked for help. When the navigation had lasted 15 minutes, the
in-person helper found the users and told them their time was up.</p>
      </sec>
      <sec id="sec-8-2">
        <title>A.2. Information-seeking Task</title>
        <p>While users performed the room-checking task, they were asked to reach out to the remote helper via
WhatsApp on their smart phone in order to answer the question they were given for this task. We chose
the WhatsApp platform because it can take both text and voice input and can export the conversation
log (both text and voice messages) for further analysis. The remote helper updated their progress to a
shared Google Doc accessible by users via their smart phone, so the user could monitor the progress of
the remote helper as they gathered data at the user’s instruction.</p>
        <p>We asked users to give a short oral debriefing after they finish the 15 minutes of room-checking
and information-seeking.. The purpose of the presentation was to assess the amount and accuracy
of information obtained through an information-seeking task. We hypothesized that using a better
communication channel could potentially enhance the efectiveness of information retrieval and improve
the accuracy of the information gathered. We considered the scenario in real life when people reach out
to others for help. It is likely that they will go through the received information, do further preparation,
and validation before actually using the information. Users were asked to compose an oral presentation
on how to carry out the task based on the information in the Google Doc. They were allowed to use the
web links provided by the remote helper on the shared document for details and search for additional
information if necessary. Users were not prohibited from searching for additional information on
the internet. We believe that it is more realistic to acknowledge that people may want to verify the
information they have been provided with or might have additional thoughts after taking the time to
gather their thoughts. However, due to the time constraints, participants had to prioritize their eforts
between searching for new and organizing existing information accordingly. Users had up to seven
minutes to prepare for their presentation and were encouraged to speak for three minutes, or as long
as they needed to convey the message they wanted to deliver. The debriefing was audio-recorded for
further analysis.</p>
        <p>In the following, we describe how we prepared the questions and the workflow of the remote helper.</p>
        <sec id="sec-8-2-1">
          <title>Selecting the Questions and Preparing the Answers. The questions for the information-seeking</title>
          <p>
            tasks were selected from MSComplexTasks dataset [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ], where a list of complex tasks is broken down
into subtasks. Complex tasks were defined as one task requiring two or more individual steps for its
completion. The individual steps needed to complete a complex task were considered the subtasks. The
three topics were (i) how to write a business report, (ii) how to write a nonfiction book, and ( iii) how to
start a baking business. The number of subtasks required for each task was 14, 12, and 15 respectively.
According to the subtasks and dependencies of the subtasks provided in MSComplexTasks [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ], the
subtasks for each selected task were pre-arranged into a list in ideal order. The remote helper did not
collect additional information aside from pre-arranging the listed subtasks and listing the corresponding
source web links provided in the dataset. As the users reached out to the remote helper for information
on a certain task, the remote helper first updated the shared document with the pre-arranged subtasks
and then asked if the user need any additional information. All requests from users outside the scope
of the pre-arranged subtasks were researched by the remote helper and discussed with the users in real
time. The helper also had access to the results of prior searches and could reuse the information.
The Workflow of the Remote Helper. Upon receiving a request from a user, the remote helper
ifrst listed the subtasks required to finish the requested task according to MSComplexTasks [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ]. The
remote helper then followed the user’s instructions to look for further information on the internet or
answer follow-up questions. While the remote helper performed the search, the user was in charge
of directing their eforts. For the conversation in WhatsApp, the remote helper always responded in
the form of text. Users, on the other hand, communicated to the remote helper using texting in one
session and voice messaging in the other. The remote helper used the desktop version of WhatsApp to
communicate with users and was able to hear users’ voice messages.
          </p>
        </sec>
      </sec>
      <sec id="sec-8-3">
        <title>A.3. Pilot Study</title>
        <p>Before the formal study, we conducted a small set of pilot studies with four participants to test the
procedure. The first three participants were asked to complete the two interaction sessions (Text and
Voice conditions) but not the multitasking session. Informal discussions with the participants inspired
us to add the No-Help condition. We also adjusted the room list to avoid some rooms that were too
hard to find or required special access. In response to the feedback of users, slight changes to the room
arrangement were made in order to balance the room distribution across diferent lists. The session
outline and room list were finalized after considering the feedback provided by the fourth participant.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>B. Example Conversations for Study 1</title>
      <p>We show four complete conversations including both voice and text. As we can see, the remote helper’s
responses were mostly short and also tried to confirm what information was being searched.</p>
      <sec id="sec-9-1">
        <title>Voice conversation on topic “How to start a baking business.” (P7)</title>
        <p>User: Where is the best bank to try to get a loan from? Like which one has the best interest rate?
Helper: I will also look for loan options</p>
        <p>User: What licenses do I need in order to start my business?</p>
        <p>User: Also, where can I find information on how to apply for these licenses?
Helper: checking licenses needed for baking business</p>
        <p>User: Will I need to have anything notarized
User: Also I’m curious how much money I should have saved up in order to start my business. Like
my own personal money just in case
Helper: okay, also looking for where to apply licenses
Helper: Will look up on that part also</p>
      </sec>
      <sec id="sec-9-2">
        <title>Text conversation on topic “How to start a baking business.” (P12)</title>
        <p>User: how to start a baking business
Helper: let me do the search, I will update in the google doc</p>
        <p>User: what permits and licenses are necessary for a baking business?
Helper: looking into the permits and licenses required</p>
        <p>User: what equipment is necessary?
Helper: I will look into that</p>
        <p>User: good advertising strategies for a first time bakery owner?
Helper: okay, looking for advertising strategies for first time owner</p>
      </sec>
      <sec id="sec-9-3">
        <title>Voice conversation on topic “How to write a nonfiction book.” (P10)</title>
        <p>User: What are the components of a nonfiction book?
Helper: let me update the information in the shared document</p>
        <p>User: What are the tips to write a good nonfiction?
Helper: Updated some topics of nonfiction books and the definition of it
Helper: looking for tips to write a nonfiction book</p>
        <p>User: Give me some tips on how to write nonfiction books.</p>
        <p>Helper: listing some steps to write nonfiction books</p>
        <p>User: Who are some of the famous nonfiction writers in English language?
Helper: looking for famous nonfiction writer in English language</p>
        <p>User: Who are the famous nonfiction writers in English language?</p>
        <p>User: And how to publish a nonfiction book?
Helper: updated the top selling nonfiction books and writers
Helper: listing publishing methods for nonfiction books</p>
        <p>User: What are a fee of rhe most successful non fiction titles</p>
        <p>User: What kind of audiences read non fiction
Helper: are you thinking about the price of them?</p>
        <p>User: What are a few of the most</p>
        <p>User: Looking for topics and audience
Helper: I see, let me look them up</p>
        <p>User: Could you look up as well, common non fiction topics
Helper: Will do
Helper: [Reply to “Looking for topics and audience”] I do not think there are specific target groups for nonfiction
books</p>
        <p>User: could u find average length and intensity of a non fiction book?
Helper: okay
Helper: can you elaborate on the intensity aspect you are looking for?</p>
        <p>User: yes. if theyre mostly narrative and if so what kind of narrative, or mostly informative
Helper: I see
Helper: let me check</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-M. Huang</surname>
          </string-name>
          ,
          <article-title>Llm-powered conversational voice assistants: Interaction patterns, opportunities, challenges, and design guidelines</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>13879</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Conversational recommender system</article-title>
          ,
          <source>in: The 41st international acm sigir conference on research &amp; development in information retrieval</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>235</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Porcheron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reeves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharples</surname>
          </string-name>
          ,
          <article-title>Voice interfaces in everyday life</article-title>
          ,
          <source>in: proceedings of the 2018 CHI conference on human factors in computing systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Forlizzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. I. Hong</surname>
          </string-name>
          , ”
          <article-title>hey alexa, what's up?” a mixed-methods studies of in-home conversational agent usage</article-title>
          ,
          <source>in: Proceedings of the 2018 designing interactive systems conference</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>857</fpage>
          -
          <lpage>868</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <article-title>Context-aware computing, in: Ubiquitous computing fundamentals, Chapman</article-title>
          and Hall/CRC,
          <year>2018</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>U. G.</given-names>
            <surname>Acer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          v. d. Broeck,
          <string-name>
            <given-names>C.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dasari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kawsar</surname>
          </string-name>
          ,
          <article-title>The city as a personal assistant: turning urban landmarks into conversational agents for serving hyper local information</article-title>
          ,
          <source>Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sapkota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mathews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Nanayakkara, Prompto:
          <article-title>Investigating receptivity to prompts based on cognitive load from memory training conversational agent</article-title>
          ,
          <source>Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 4</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kawsar</surname>
          </string-name>
          ,
          <article-title>Augmenting conversational agents with ambient acoustic contexts</article-title>
          ,
          <source>in: 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shamekhi</surname>
          </string-name>
          , T. Bickmore,
          <article-title>Breathe deep: A breath-sensitive interactive meditation coach</article-title>
          ,
          <source>in: Proceedings of the 12th EAI International Conference on Pervasive Computing Technologies for Healthcare</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>108</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghandeharioun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McDuf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rowan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Emma:</surname>
          </string-name>
          <article-title>An emotion-aware wellbeing chatbot</article-title>
          ,
          <source>in: 2019 8th International Conference on Afective Computing and Intelligent Interaction (ACII)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Casas</surname>
          </string-name>
          , T. Spring,
          <string-name>
            <given-names>K.</given-names>
            <surname>Daher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mugellini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. A.</given-names>
            <surname>Khaled</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudré-Mauroux</surname>
          </string-name>
          ,
          <article-title>Enhancing conversational agents with empathic abilities</article-title>
          ,
          <source>in: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Samrose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Anbarasu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          , T. Mishra,
          <article-title>Mitigating boredom using an empathetic conversational agent</article-title>
          ,
          <source>in: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aurisicchio</surname>
          </string-name>
          , W. Baxter,
          <article-title>Understanding afective experiences with conversational agents</article-title>
          ,
          <source>in: proceedings of the 2019 CHI conference on human factors in computing systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jang</surname>
          </string-name>
          , K.-E. Kim,
          <article-title>End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>583</fpage>
          -
          <lpage>592</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>54</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl- main.54.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Gpt-4
          <source>technical report, ArXiv abs/2303</source>
          .08774 (
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2303. 08774.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Introducing chatgpt, ???? URL: https://openai.com/blog/chatgpt.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Towards identifying social bias in dialog systems: Framework, dataset, and benchmark</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>3576</fpage>
          -
          <lpage>3591</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .findings-emnlp.
          <volume>262</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .findings- emnlp.262.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dziri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Linzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Reitter</surname>
          </string-name>
          ,
          <article-title>Evaluating attribution in dialogue systems: The BEGIN benchmark</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>1066</fpage>
          -
          <lpage>1083</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .tacl-
          <volume>1</volume>
          .62. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00506</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lowe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pow</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Serban</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Pineau,</surname>
          </string-name>
          <article-title>The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems</article-title>
          ,
          <source>in: Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue</source>
          , Association for Computational Linguistics, Prague, Czech Republic,
          <year>2015</year>
          , pp.
          <fpage>285</fpage>
          -
          <lpage>294</lpage>
          . URL: https://aclanthology.org/W15-4640. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W15</fpage>
          - 4640.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>O.</given-names>
            <surname>Dušek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jurčíček</surname>
          </string-name>
          ,
          <article-title>A context-aware natural language generator for dialogue systems</article-title>
          ,
          <source>in: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue</source>
          , Association for Computational Linguistics, Los Angeles,
          <year>2016</year>
          , pp.
          <fpage>185</fpage>
          -
          <lpage>190</lpage>
          . URL: https://aclanthology.org/W16-3622. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W16</fpage>
          - 3622.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Context-aware natural language generation for spoken dialogue systems</article-title>
          ,
          <source>in: Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          ,
          <source>The COLING 2016 Organizing Committee</source>
          , Osaka, Japan,
          <year>2016</year>
          , pp.
          <fpage>2032</fpage>
          -
          <lpage>2041</lpage>
          . URL: https://aclanthology.org/C16-1191.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Liu,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <article-title>Multimodal dialog system: Relational graph-based context-aware question understanding</article-title>
          ,
          <source>in: Proceedings of the 29th ACM International Conference on Multimedia, MM '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>695</fpage>
          -
          <lpage>703</lpage>
          . URL: https://doi.org/10.1145/3474085.3475234. doi:
          <volume>10</volume>
          .1145/3474085.3475234.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>C.</given-names>
            <surname>Snell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <article-title>Context-aware language modeling for goal-oriented dialogue systems</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: NAACL</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>2351</fpage>
          -
          <lpage>2366</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .findings-naacl.
          <volume>181</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .findings- naacl.181.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hazarika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          , E. Cambria,
          <string-name>
            <surname>Dialoguernn:</surname>
          </string-name>
          <article-title>An attentive rnn for emotion detection in conversations</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>33</volume>
          (
          <year>2019</year>
          )
          <fpage>6818</fpage>
          -
          <lpage>6825</lpage>
          . URL: https://ojs.aaai.org/index.php/AAAI/article/view/4657. doi:
          <volume>10</volume>
          .1609/aaai.v33i01.
          <fpage>33016818</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Takanobu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Gao,</surname>
          </string-name>
          <article-title>ConvLab: Multi-domain end-to-end dialog system platform, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>69</lpage>
          . URL: https://aclanthology.org/P19-3011. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          - 3011.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Whatsapp tops 7 billion daily voice messages</article-title>
          ,
          <year>2022</year>
          . URL: https://techcrunch.com/
          <year>2022</year>
          / 03/30/people-are
          <article-title>-sending-7-billion-voice-messages-on-whatsapp-every-da.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Huang</surname>
          </string-name>
          , C.-Y. Huang,
          <string-name>
            <given-names>Y.-F.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ting-Hao</surname>
          </string-name>
          <string-name>
            <surname>Kenneth</surname>
          </string-name>
          ,
          <article-title>What types of questions require conversation to answer? a case study of askreddit questions</article-title>
          ,
          <source>in: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, CHI EA '23</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2023</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Jauhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiseleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Learning to decompose and organize complex tasks</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>