<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CHIIR</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Data Sets for Spoken Conversational Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johanne Trippas</string-name>
          <email>johanne.trippas@rmit.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Thomas</string-name>
          <email>pathom@microsoft.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Microsoft</institution>
          ,
          <addr-line>Canberra</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RMIT University</institution>
          ,
          <addr-line>Melbourne</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>14</volume>
      <abstract>
        <p>There is increasing interest in spoken conversational search-multiturn interactions with a search engine, spoken in natural languagebut until recently there was little public data to support research. We describe our experiences building two data sets for spoken conversational search: the Microsoft Information-Seeking Conversation set (“MISC”) and the Spoken Conversational Search set (“SCSdata”). Each data set contains recordings of spoken interactions between two people collaborating on web search tasks, but relatively small diferences in protocol have led to observably diferent data. We discuss some consequences of these diferences, and describe attempts to reproduce analyses from one set to the other.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>DATA SETS OVERVIEW</title>
      <p>The increasing capability for natural-language, voice interactions
with computers poses a range of research and engineering questions.
To address these questions we need corresponding data—for
example, recordings of conversations with information-gathering agents.
Unfortunately, current systems cannot maintain a lengthy exchange,
have trouble tracking context, and are largely unaware of
nonverbal communication and of users’ emotional state. In 2016–17 two
separate groups tried to bridge the gap by recording
informationseeking conversations between people, looking for structures which
would help build new systems or evaluate old ones [c.f. 5, 9, 19].</p>
    </sec>
    <sec id="sec-2">
      <title>MISC</title>
      <p>
        The Microsoft Information-Seeking Conversation data (MISC) is a
set of recordings of spoken conversation between human “seekers”
and “intermediaries” [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. It was designed to support research on
questions such as: do human intermediaries show behaviours which
correlate with seeker satisfaction?; do seekers show behaviours
which we could use as a baseline for online metrics, appropriate
to conversational agents?; what role is played by politeness or
other conversational norms?; what tactics do we see in
informationseeking conversation, and do particular structures help or impede
progress or satisfaction? MISC has been used in unpublished work
on these questions, in work on conversational style [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], on
multimodal collaboration [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and on conversational structures
described below.
      </p>
      <p>The study. The overall setup for both the MISC and SCSdata
recordings is shown in Figure 1. Tasks were assigned to a “seeker”,
who was responsible for assembling information and deciding a
ifnal answer. They were connected over an audio link to an
“intermediary”, who stood in for a future software agent (SCSdata
participants were located in the same room). The intermediary
had unrestricted access to the web, including search engines. We
recorded video and audio from both participants.</p>
      <p>The data. The MISC data includes audio and video signals;
transcripts; prosodic and linguistic signals; entry questions on
demographics and personality; and post-task surveys on emotion,
engagement, and efort. Screen recordings are also available, as is data
on afective and physiological signals.</p>
      <p>Reuse and reusability. We designed the MISC data with regard to
our own future research, but intended from the start that it could be
used by other researchers. Our participants consented to possible
reuse and sharing, and were informed of their right to withdraw
consent at any time, including post-hoc. The study was approved
by our internal ethics review board.</p>
      <p>
        Although MISC includes a good deal of derived data, we have
chosen to include the raw data wherever possible so as to enable
(a) replication and (b) further unanticipated analyses. For example,
we include the raw audio, from which we derived the included
transcripts; and we include these transcripts, from which we derived
data on word use. The only processing of the “raw” video and audio
has been to segment by task. The full text of each pre-experiment
and post-task question is also included. This policy has already
enabled reuse inside our research group: for example, work by
McDuf et al . [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], on the efect of facial expressivity and multimodel
communication, was not anticipated when we collected MISC. We
are not aware of any attempts to re-process the audio or video
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR 2019), 14 March 2019, Glasgow, UK Johanne Trippas and Paul Thomas
streams, but we hope this policy also makes reuse outside our own
research group more likely.
      </p>
      <p>
        We used standard instruments and standard processing tools
where available:
• To help interpret physiological and afectual signals, we
used the UPSS Impulsive Behaviour Scale [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and Cohen
et al.’s perceived stress scale [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]1. These are commonly-used
instruments and should be comparable across studies.
• MISC includes five tasks, one of which was used as a warmup.
      </p>
      <p>
        We believed there may be a diference in behaviour and
selfreports depending on the complexity and dificulty of the
task, so we varied these in a controlled manner. We also
wanted tasks that elicited an emotional response, which
ruled out those from most past collections; instead we
selected tasks from the Repository of Assigned Search Tasks
(RepAST)2. Participants addressed the tasks using the open
web, which may make it hard to reproduce some results but
did allow intermediaries to use the full range of web search
features.
• We measured efort with the NASA task load index (TLX) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        This is a commonly-used and well-validated scale which we
were able to adopt with minimal modification (only omitting
the question on physical efort). Post-hoc tests validated this
modified scale (Krippendorf’s α = 0.84 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]).
• We measured engagement using a subset of the user
engagement scale (UES) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. This proved very useful for our
purposes, and again the modifications were validated post
hoc (α = 0.85 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]).
• Questions on per-task emotion used a widely recognised set
of basic emotions, as well as a separate question about other
emotions which we considered more likely during our tasks.
• Processing used standard tools, both to reduce efort and
to aid reproducability. We used OpenSMILE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for audio
analysis; OpenFace [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for coding facial actions; Microsoft
Cognitive Services3 to produce transcripts; and Linguistic
      </p>
      <p>
        Inquiry and Word Count (LIWC) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for lexical analysis.
      </p>
      <p>We are happy to make many of our processing scripts available
for other researchers—a small number use in-house tools—although
again there have been no requests so far.</p>
      <p>
        Reporting and availability. We described our protocol in detail in
our first publication [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. This paper includes details of participants,
the wording for all tasks and questions, and descriptive statistics
including reliability measures.
      </p>
      <p>The MISC data is available online at http://aka.ms/MISCv1.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>SCSdata</title>
      <p>
        The Spoken Conversational Search data set (SCSdata) contains the
utterance transcriptions of a spoken information seeking process
between two actors. To the best of our knowledge, SCSdata was
the first data set which was created in this experimental setup. It is
also the first SCS data set which received labelling of the actions or
utterances, albeit only for the first three turns [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. However, the
release of the fully labelled data set is planned.
1See also e.g. http://www.mindgarden.com/documents/PerceivedStressScale.pdf.
2https://ils.unc.edu/searchtasks/
3https://www.microsoft.com/cognitive-services/en-us/speech-api
      </p>
      <p>
        The SCSdata was created to investigate the interaction behaviour
between the two actors, including helping us to understand
questions such as; what is the impact of audio-only interactions for
search?; how are information-dense documents transferred in an
audio-only setting?; what are the components or actions of an
information-seeking process via audio, and what is the impact of
query complexity on the interactions and interactivity in spoken
conversational search? The SCSdata has been used in research
published by the creators of the data set [
        <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
        ] and also has been used
recently in a study by the broader IR community [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>
        The study. The SCSdata was created in a controlled laboratory
study at RMIT University. We recorded the spoken interactions
between seeker and intermediary (as explained in Section 1.1). We
then transcribed the recordings with transcription principles and
protocols described by Trippas et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Much detailed work went
into creating highly accurate transcriptions, with the vision to
increase the reusability of the data set, including indexing [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>The data. The data includes the transcriptions of the audio
signals, the codebook and labels for the first three utterances, and the
backstories used in the setup. Other data such as the audio, video,
pre- and post-task questionnaires are not available due to ethics
regulations.</p>
      <p>The data is maintained by an author of this paper (Trippas).</p>
      <p>
        Reuse and reusability. The SCSdata reuses nine backstories based
on TREC Q02, R03, and T04 as described by Bailey, Mofat, Scholer,
and Thomas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These backstories follow the cognitive complexity
framework of the Taxonomy of Learning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Participants completed a pre-test questionnaire before starting
the study. This pre-test questionnaire gathered demographic data
such as age, gender, highest level of education, employment, and
computer and search engine usage. Participants were also asked to
complete a modified version of the Search Self-Eficacy Scale [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
and how they would rate their own overall search skills.
Participants were asked if they had experience with intelligent personal
assistants such as Google Now, Siri, Amazon Alexa, or Cortana.
Seekers and intermediaries were asked to complete pre- and
posttask questionnaires throughout the study measuring interest and
knowledge about the task, experienced task dificulty, experienced
conversational dificulty, experienced collaboration dificulty,
experienced search presentation dificulty, overall dificulty, overall
satisfaction, and open questions. Some of these questions were
adapted or reproduced from Kelly, Arguello, Edwards, and Wu [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>The SCSdata was designed with our own research questions in
mind, while optimising the transcriptions and labelling for future
use. We believe that the labelled data set is very valuable for the
research community. The data set was recently updated and it is
planned to update the data set with the full labelling annotations
and label creation methodology in the near future.</p>
      <p>
        Reporting and availability. We described our experimental setup
in the preliminary data analysis paper [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Fully documented
information is available on the transcription protocol and labelling
process in Trippas et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. That paper aimed to establish a
protocol for spoken search interaction transcription, minimising the
likelihood that consequently produced transcripts are inconsistent
with each other.
      </p>
      <p>Other details such as the procedure of the study or questionnaire
results have not yet been published.</p>
      <p>The SCSdata is available online via https://jtrippas.github.io/
Spoken-Conversational-Search/.
2</p>
    </sec>
    <sec id="sec-4">
      <title>COMPARING MISC AND SCSdata</title>
      <p>In recent, unpublished work, one of us (Trippas) has developed a
code schema for annotating utterances in spoken conversational
search. Initial development used SCSdata, but since MISC is very
similar it has been reused to validate the schema. We ofer below
some observations on re-using MISC and SCSdata, based on this
experience.</p>
      <p>It is clearly valuable to have two data sets collected with such
similar protocols, and for similar purposes. Coding conversations
relies on having lengthy, naturalistic exchanges, and both SCSdata
and MISC have several exchanges running to ten minutes. Both sets
distinguish the “seeker” and “intermediary” roles, allowing direct
comparison, and both include transcripts which could be coded
more or less directly. However, some diferences across the data
sets did hamper reuse, or led to unexpected findings.
2.1</p>
    </sec>
    <sec id="sec-5">
      <title>Protocol diferences</title>
      <p>First, while the SCSdata was manually transcribed, the MISC data
is about ten times larger but has only been transcribed with a
commercial speech-to-text system. Although the automatic speech
recognition (ASR) system was state of the art, it was still prone to
errors. (One common error was to inject “speech” when a
participant was typing, as if the ASR was confused by keyboard noise.)
These errors were discovered because a close reading was needed
to label the MISC utterances for the validation of an annotation
schema. The diference in transcription techniques also gave a
different notion of utterance or turn: in SCSdata these are divided
manually, while in MISC they are separated by pauses in the audio
signal. Utterance-level statistics may not be directly comparable.</p>
      <p>The sets also difer in the pre- and post-task questions. The
MISC questions and responses are part of the released data, and
the published description includes descriptive statistics and basic
validity checks. We hope this is useful for future work. The SCSdata
protocol also added many pre- and post-task items (see section 1.2),
on overlapping themes but with diferent instruments. These have
not been examined to date so they may or may not be useful or
comparable. Future SCSdata releases will not include this data.</p>
      <p>Some apparently small diferences between the SCSdata and
MISC protocols have led to observable diferences in the collected
data. SCSdata participants were expressly prohibited from reading
out the task statement verbatim and had to verbalise their
information request; MISC participants were given no instruction on this
matter. As a result, the MISC data include seekers reading out and
repeating the task statements, verbatim. More importantly, once
both participants have the same statement, the roles of “seeker” and
“intermediary” are blurred and the two act much more like peers.
This has influenced the interactions in MISC, and the distribution
of conversational moves.</p>
      <p>The two protocols also difered at the end of each task. For MISC,
“seekers” were asked to record an answer: this was meant partly to
encourage participants to properly complete each task, and partly
so researchers could look for diferences in answer correctness
or completeness4. SCSdata participants were not asked to record
an answer, but were asked to say “stop search” when they were
satisfied with the found information and could answer the
information need. This again led to diferences in behaviour, such as MISC
“seekers” confirming spelling in order to write down the answer.</p>
      <p>These diferences were an unexpected nuisance, as even with
such similar protocols it required some work to understand and
account for the substantial diferences in data. However, familiarity
with the data meant that once we had observed the diferences,
they were easy to understand. A close reading of the published
descriptions would have given the same hints. Further, it is likely
that the diferences were in fact useful for the validation, as they
gave more variety and tested the coding schema in slightly diferent
exchanges.</p>
      <p>We also note some smaller diferences. For the SCSdata
recordings, a researcher was in the room with the participant. The MISC
researchers were not. This may have led to some diferences in
the data, although we have not yet explored this. There is also a
diference in audio quality. The audio files from the SCSdata are
poor, because they were recorded through a video camera. Using
those recordings was never part of the experimental setup.</p>
      <p>Finally, there are details of the protocol which may have resulted
in minor diferences between the sets. MISC featured a warm-up
task, while SCSdata did not; MISC participants used a Windows PC,
while SCSdata participants used a Mac; and MISC intermediaries
started with Bing, SCSdata intermediaries with Google, although
all were allowed to switch to any other site.
2.2</p>
    </sec>
    <sec id="sec-6">
      <title>Terminology</title>
      <p>
        There has been some inconsistency in terminology. First the two
actors of the SCSdata were referred to as the “user” (the participant
with the search task) and “retriever” (the participant with the search
engine) [
        <xref ref-type="bibr" rid="ref22 ref24">22, 24</xref>
        ]. In later publications describing the SCSdata, “user”
became “seeker” and “retriever” became “intermediary” [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. These
latter terms match MISC.
      </p>
      <p>Other terminology is not standard. Trippas et al. used
“spoken conversational search” to emphasise the spoken channel, as
opposed to multi-turn interactions with e.g. typing or selecting
buttons. For the same scenario, Thomas et al. used the phrase
“information-seeking conversation” to encourage a broader
understanding encompassing negotiation and clarification, not just a
traditional query/response “search” model. Other terms again are
used elsewhere in the literature. Presumably in the near future this
terminology, as well as the names of the diferent roles, will be
standardised.
2.3</p>
    </sec>
    <sec id="sec-7">
      <title>Task design</title>
      <p>
        As explained in section 1.2, the tasks used for the SCSdata were
reused from research by Bailey et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and are based on the
Taxonomy of Learning. Three of the five cognitive dimensions were
used: Remember, Understand, and Analyse. However, it has been
suggested that there are no clear interaction diferences between
Understand and Analyse tasks [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], which is consistent with the
dificulties Mofat et al. reported when classifying tasks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
4In the event, we have not been able to code the answers with any degree of reliability.
      </p>
      <p>Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
Johanne Trippas and Paul Thomas</p>
    </sec>
    <sec id="sec-8">
      <title>OBSERVATIONS</title>
      <p>Two sets of spoken conversational searches—SCSdata and MISC—
were collected independently, by diferent teams, in diferent
geographical locations, to support diferent research. It is fortunate
that the data sets are similar enough so that we can make direct
comparisons, and use one set to verify observations from the other.</p>
      <p>Despite being collected with very similar goals and methods,
relatively small diferences in protocol made observable diferences
to the data and we have had to be careful with reuse and
comparisons. This was made much easier by our familiarity with the data;
another researcher could quite reasonably choose these two data
sets, compare them, and have dificulty. That this is possible
despite careful design and description, and despite close similarity in
protocol, may perhaps caution us about reuse in interactive studies
generally.</p>
      <p>We were however helped by the decision to explicitly allow the
release of MISC’s raw data (not just, e.g., transcripts). Because audio
was available, the transcription errors could be detected.
Unfortunately ethical clearance precludes a similar release for SCSdata, and
this may limit reuse.</p>
      <p>
        Communication between two people is very culture-specific [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Even though both MISC and SCSdata were collected in English
speaking countries, and all participants claimed native or
highlevel English, we do not exclude that cultural diferences played a
role in the diferences in the two data sets. Similarly, the diference
in participant populations (more uniform in SCSdata, more varied
in MISC) may have resulted in diferences in communication.
Spoken conversational search is still an immature field of inquiry,
and we should exercise some caution re-using data sets. Nuances
of data collection are not always easy to describe in a paper, but the
protocols for SCSdata and MISC were relatively simple and the data
can be re-used with care. It has been interesting and informative to
compare the two sets of transcripts, and we hope to continue this
to investigate other conversational questions.
      </p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENTS</title>
      <p>We thank Daniel McDuf, Mary Czerwinski, and Nick Craswell for
their efort assembling MISC, and Penny Analytis for auditing the
SCSdata transcriptions. We are grateful to our participants for their
time.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. W.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Krathwohl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Bloom</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives</article-title>
          .
          <source>Longman</source>
          , New York.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Bailey</surname>
          </string-name>
          , Alistair Mofat, Falk Scholer, and Paul Thomas.
          <year>2016</year>
          .
          <article-title>UQV100: A test collection with query variability</article-title>
          .
          <source>In Proc. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval</source>
          .
          <fpage>725</fpage>
          -
          <lpage>728</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Tadas</given-names>
            <surname>Baltrušaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Robinson</surname>
          </string-name>
          , and
          <string-name>
            <surname>Louis-Philippe Morency</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Openface: An open source facial behavior analysis toolkit</article-title>
          .
          <source>In Proc. IEEE Winter Conf. Applications of Computer Vision</source>
          . 1-
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Kathy</given-names>
            <surname>Brennan</surname>
          </string-name>
          , Diane Kelly,
          <string-name>
            <given-names>and Yinglong</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Factor analysis of a search self-eficacy scale</article-title>
          .
          <source>In Proc. ACM SIGIR Conf. on Human Information Interaction and Retrieval</source>
          .
          <volume>241</volume>
          -
          <fpage>244</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Brooks</surname>
          </string-name>
          and
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Belkin</surname>
          </string-name>
          .
          <year>1983</year>
          .
          <article-title>Using discourse analysis for the design of information retrieval interaction mechanisms</article-title>
          .
          <source>In Proc. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval</source>
          .
          <fpage>31</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Ramona</given-names>
            <surname>Broussard</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yan</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Seeking treatment options: Consumers' search behaviors and cognitive activities</article-title>
          .
          <source>J. American Society for Information Science and Technology 50</source>
          ,
          <issue>1</issue>
          (
          <year>2013</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Eric</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Buhi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ellen M. Daley</surname>
            ,
            <given-names>Hollie J.</given-names>
          </string-name>
          <string-name>
            <surname>Fuhrmann</surname>
          </string-name>
          , and
          <string-name>
            <surname>Sarah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>An observational study of how young people search for online sexual health information</article-title>
          .
          <source>J American College Health</source>
          <volume>58</volume>
          ,
          <issue>2</issue>
          (
          <year>2009</year>
          ),
          <fpage>101</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Sheldon</surname>
            <given-names>Cohen</given-names>
          </string-name>
          , Tom Kamarck, and
          <string-name>
            <given-names>Robin</given-names>
            <surname>Mermelstein</surname>
          </string-name>
          .
          <year>1983</year>
          .
          <article-title>A global measure of perceived stress</article-title>
          .
          <source>J. Health and Social Behavior</source>
          <volume>24</volume>
          ,
          <issue>4</issue>
          (Dec.
          <year>1983</year>
          ),
          <fpage>385</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Penny</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Daniels</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Brooks</surname>
            , and
            <given-names>N. J.</given-names>
          </string-name>
          <string-name>
            <surname>Belkin</surname>
          </string-name>
          .
          <year>1985</year>
          .
          <article-title>Using problem structures for driving human-computer dialogues</article-title>
          .
          <source>In RIAO-85: Actes: Recherche d'Informations Assistée par Ordinateur</source>
          .
          <volume>645</volume>
          -
          <fpage>660</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Birgit</surname>
            <given-names>Endrass</given-names>
          </string-name>
          , Matthias Rehm, and
          <string-name>
            <given-names>Elisabeth</given-names>
            <surname>André</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Culture-specific communication management for virtual agents</article-title>
          .
          <source>In Proc. Int. Confo˙n Autonomous Agents and Multiagent Systems-Volume</source>
          <volume>1</volume>
          .
          <fpage>281</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Florian</surname>
            <given-names>Eyben</given-names>
          </string-name>
          , Felix Weninger, Florian Gross, and
          <string-name>
            <given-names>Björn</given-names>
            <surname>Schuller</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor</article-title>
          .
          <source>In Proc. ACM Multimedia</source>
          .
          <volume>835</volume>
          -
          <fpage>838</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Diane</surname>
            <given-names>Kelly</given-names>
          </string-name>
          , Jaime Arguello, Ashlee Edwards, and
          <string-name>
            <surname>Wan-ching Wu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Development and evaluation of search tasks for IIR experiments using a cognitive complexity framework</article-title>
          .
          <source>In Proc. Int. Conf. on the Theory of Information Retrieval</source>
          .
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Martha</given-names>
            <surname>Larson</surname>
          </string-name>
          and Gareth JF Jones.
          <year>2012</year>
          .
          <article-title>Spoken content retrieval: A survey of techniques and technologies</article-title>
          .
          <source>Foundations and Trends® in Information Retrieval 5</source>
          ,
          <fpage>4</fpage>
          -
          <lpage>5</lpage>
          (
          <year>2012</year>
          ),
          <fpage>235</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Daniel</surname>
            <given-names>McDuf</given-names>
          </string-name>
          , Paul Thomas,
          <string-name>
            <given-names>Mary</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Multimodal analysis of vocal collaborative search: a public corpus and results</article-title>
          .
          <source>In Proc. ACM Int. Conf. on Multimodal Interaction</source>
          .
          <fpage>456</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Alistair</surname>
            <given-names>Mofat</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Bailey</surname>
          </string-name>
          , Falk Scholer, and Paul Thomas.
          <year>2014</year>
          .
          <article-title>Assessing the cognitive complexity of information needs</article-title>
          .
          <source>In Proc. Australasian Document Computing Symposium. ACM</source>
          ,
          <volume>97</volume>
          -
          <fpage>100</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[16] National Aeronautics and Space Administration Human Systems Integration Division</source>
          .
          <year>2016</year>
          . TLX @ NASA Ames.
          <article-title>(</article-title>
          <year>2016</year>
          ).
          <source>Retrieved January</source>
          <year>2017</year>
          from https://humansystems.arc.nasa.gov/groups/TLX/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Heather L O'Brien and Elaine G Toms</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The development and evaluation of a survey to measure user engagement</article-title>
          .
          <source>J American Society for Information Science and Technology 61</source>
          ,
          <issue>1</issue>
          (
          <year>2010</year>
          ),
          <fpage>50</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>James</surname>
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Pennebaker</surname>
          </string-name>
          , Ryan L. Boyd, Kayla
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>and Kate</given-names>
          </string-name>
          <string-name>
            <surname>Blackburn</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The development and psychometric properties of LIWC2015</article-title>
          .
          <source>Technical Report</source>
          . University of Texas at Austin.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Rachel</given-names>
            <surname>Reichman</surname>
          </string-name>
          .
          <year>1985</year>
          .
          <article-title>Getting computers to talk like you and me</article-title>
          . MIT Press, Cambridge, Massachusetts.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Thomas</surname>
          </string-name>
          , Mary Czerwinksi,
          <string-name>
            <surname>Daniel</surname>
            <given-names>McDuf</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gloria</given-names>
            <surname>Mark</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Style and alignment in information-seeking conversation</article-title>
          .
          <source>In Proc. ACM SIGIR Conf. on Human Information Interaction and Retrieval</source>
          .
          <volume>42</volume>
          -
          <fpage>51</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21] Paul Thomas,
          <string-name>
            <surname>Daniel</surname>
            <given-names>McDuf</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mary</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>MISC: A data set of information-seeking conversations</article-title>
          .
          <source>In Proc. Int</source>
          . Workshop on Conversational Approaches to Information Retrieval.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Johanne</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Trippas</surname>
            , Lawrence Cavedon, Damiano Spina, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Sanderson</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>How do people interact in conversational speech-only search tasks: A preliminary analysis</article-title>
          .
          <source>In Proc. ACM SIGIR Conf. on Human Information Interaction and Retrieval</source>
          .
          <volume>325</volume>
          -
          <fpage>328</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Johanne</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Trippas</surname>
            , Damiano Spina, Lawrence Cavedon, Hideo Joho, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Sanderson</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Informing the design of spoken conversational search: Perspective paper</article-title>
          .
          <source>In Proc. ACM SIGIR Conf. on Human Information Interaction and Retrieval</source>
          .
          <volume>32</volume>
          -
          <fpage>41</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Johanne</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Trippas</surname>
            , Damiano Spina, Lawrence Cavedon, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Sanderson</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A conversational search transcription protocol and analysis</article-title>
          .
          <source>In Proc. Int</source>
          . Workshop on Conversational Approaches to Information Retrieval.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Svitlana</surname>
            <given-names>Vakulenko</given-names>
          </string-name>
          , Kate Revoredo, Claudio Di Ciccio, and Maarten de Rijke.
          <year>2019</year>
          .
          <article-title>QRFA: A data-driven model of information-seeking dialogues</article-title>
          .
          <source>In Proc. European Conf. on Information Retrieval</source>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Ryen</surname>
            <given-names>W</given-names>
          </string-name>
          <string-name>
            <surname>White</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Implicit feedback for interactive information retrieval</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . University of Glasgow.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Stephen</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Whiteside and Donald R. Lynam</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Understanding the role of impulsivity and externalizing psychopathology in alcohol abuse: application of the UPPS impulsive behavior scale</article-title>
          .
          <volume>11</volume>
          ,
          <issue>3</issue>
          (
          <year>2003</year>
          ),
          <fpage>669</fpage>
          -
          <lpage>689</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>