=Paper=
{{Paper
|id=Vol-2337/paper2
|storemode=property
|title=Data Sets for Spoken Conversational Search
|pdfUrl=https://ceur-ws.org/Vol-2337/paper2.pdf
|volume=Vol-2337
|authors=Johanne Trippas,Paul Thomas
|dblpUrl=https://dblp.org/rec/conf/chiir/TrippasT19
}}
==Data Sets for Spoken Conversational Search==
<pdf width="1500px">https://ceur-ws.org/Vol-2337/paper2.pdf</pdf>
<pre>
                            Data Sets for Spoken Conversational Search
                                Johanne Trippas                                                                 Paul Thomas
                              RMIT University                                                                    Microsoft
                             Melbourne, Australia                                                            Canberra, Australia
                         johanne.trippas@rmit.edu.au                                                       pathom@microsoft.com

ABSTRACT
There is increasing interest in spoken conversational search—multi-
turn interactions with a search engine, spoken in natural language—
but until recently there was little public data to support research.
   We describe our experiences building two data sets for spoken
conversational search: the Microsoft Information-Seeking Conver-
sation set (“MISC”) and the Spoken Conversational Search set (“SCS-
data”). Each data set contains recordings of spoken interactions
between two people collaborating on web search tasks, but rela-
tively small differences in protocol have led to observably different
data. We discuss some consequences of these differences, and de-
scribe attempts to reproduce analyses from one set to the other.


1     DATA SETS OVERVIEW                                                                 Figure 1: Recording setup for both MISC and SCSdata. Tasks
The increasing capability for natural-language, voice interactions                       were assigned to a “seeker”, who communicated with an “in-
with computers poses a range of research and engineering questions.                      termediary” who had access to a browser. From Thomas et al.
To address these questions we need corresponding data—for exam-                          [21].
ple, recordings of conversations with information-gathering agents.
Unfortunately, current systems cannot maintain a lengthy exchange,
have trouble tracking context, and are largely unaware of non-                           final answer. They were connected over an audio link to an “in-
verbal communication and of users’ emotional state. In 2016–17 two                       termediary”, who stood in for a future software agent (SCSdata
separate groups tried to bridge the gap by recording information-                        participants were located in the same room). The intermediary
seeking conversations between people, looking for structures which                       had unrestricted access to the web, including search engines. We
would help build new systems or evaluate old ones [c.f. 5, 9, 19].                       recorded video and audio from both participants.
                                                                                            The data. The MISC data includes audio and video signals; tran-
1.1     MISC
                                                                                         scripts; prosodic and linguistic signals; entry questions on demo-
The Microsoft Information-Seeking Conversation data (MISC) is a                          graphics and personality; and post-task surveys on emotion, en-
set of recordings of spoken conversation between human “seekers”                         gagement, and effort. Screen recordings are also available, as is data
and “intermediaries” [21]. It was designed to support research on                        on affective and physiological signals.
questions such as: do human intermediaries show behaviours which
correlate with seeker satisfaction?; do seekers show behaviours                             Reuse and reusability. We designed the MISC data with regard to
which we could use as a baseline for online metrics, appropriate                         our own future research, but intended from the start that it could be
to conversational agents?; what role is played by politeness or                          used by other researchers. Our participants consented to possible
other conversational norms?; what tactics do we see in information-                      reuse and sharing, and were informed of their right to withdraw
seeking conversation, and do particular structures help or impede                        consent at any time, including post-hoc. The study was approved
progress or satisfaction? MISC has been used in unpublished work                         by our internal ethics review board.
on these questions, in work on conversational style [20], on mul-                           Although MISC includes a good deal of derived data, we have
timodal collaboration [14], and on conversational structures de-                         chosen to include the raw data wherever possible so as to enable
scribed below.                                                                           (a) replication and (b) further unanticipated analyses. For example,
                                                                                         we include the raw audio, from which we derived the included tran-
   The study. The overall setup for both the MISC and SCSdata                            scripts; and we include these transcripts, from which we derived
recordings is shown in Figure 1. Tasks were assigned to a “seeker”,                      data on word use. The only processing of the “raw” video and audio
who was responsible for assembling information and deciding a                            has been to segment by task. The full text of each pre-experiment
                                                                                         and post-task question is also included. This policy has already
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on   enabled reuse inside our research group: for example, work by Mc-
Human Information Interaction and Retrieval (CHIIR 2019), 14 March 2019, Glasgow, UK     Duff et al. [14], on the effect of facial expressivity and multimodel
2019. Copyright for the individual papers remains with the authors. Copying permitted
for private and academic purposes. This volume is published and copyrighted by its       communication, was not anticipated when we collected MISC. We
editors..                                                                                are not aware of any attempts to re-process the audio or video
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR 2019), 14 March 2019, Glasgow, UK                                                        Johanne Trippas and Paul Thomas

streams, but we hope this policy also makes reuse outside our own                  The SCSdata was created to investigate the interaction behaviour
research group more likely.                                                     between the two actors, including helping us to understand ques-
   We used standard instruments and standard processing tools                   tions such as; what is the impact of audio-only interactions for
where available:                                                                search?; how are information-dense documents transferred in an
     • To help interpret physiological and affectual signals, we                audio-only setting?; what are the components or actions of an
       used the UPSS Impulsive Behaviour Scale [27] and Cohen                   information-seeking process via audio, and what is the impact of
       et al.’s perceived stress scale [8]1 . These are commonly-used           query complexity on the interactions and interactivity in spoken
       instruments and should be comparable across studies.                     conversational search? The SCSdata has been used in research pub-
     • MISC includes five tasks, one of which was used as a warmup.             lished by the creators of the data set [22, 23] and also has been used
       We believed there may be a difference in behaviour and self-             recently in a study by the broader IR community [25].
       reports depending on the complexity and difficulty of the                   The study. The SCSdata was created in a controlled laboratory
       task, so we varied these in a controlled manner. We also                 study at RMIT University. We recorded the spoken interactions
       wanted tasks that elicited an emotional response, which                  between seeker and intermediary (as explained in Section 1.1). We
       ruled out those from most past collections; instead we se-               then transcribed the recordings with transcription principles and
       lected tasks from the Repository of Assigned Search Tasks                protocols described by Trippas et al. [24]. Much detailed work went
       (RepAST)2 . Participants addressed the tasks using the open              into creating highly accurate transcriptions, with the vision to
       web, which may make it hard to reproduce some results but                increase the reusability of the data set, including indexing [13].
       did allow intermediaries to use the full range of web search
       features.                                                                   The data. The data includes the transcriptions of the audio sig-
     • We measured effort with the NASA task load index (TLX) [16].             nals, the codebook and labels for the first three utterances, and the
       This is a commonly-used and well-validated scale which we                backstories used in the setup. Other data such as the audio, video,
       were able to adopt with minimal modification (only omitting              pre- and post-task questionnaires are not available due to ethics
       the question on physical effort). Post-hoc tests validated this          regulations.
       modified scale (Krippendorf’s α = 0.84 [21]).                               The data is maintained by an author of this paper (Trippas).
     • We measured engagement using a subset of the user en-                       Reuse and reusability. The SCSdata reuses nine backstories based
       gagement scale (UES) [17]. This proved very useful for our               on TREC Q02, R03, and T04 as described by Bailey, Moffat, Scholer,
       purposes, and again the modifications were validated post                and Thomas [2]. These backstories follow the cognitive complexity
       hoc (α = 0.85 [21]).                                                     framework of the Taxonomy of Learning [1].
     • Questions on per-task emotion used a widely recognised set                  Participants completed a pre-test questionnaire before starting
       of basic emotions, as well as a separate question about other            the study. This pre-test questionnaire gathered demographic data
       emotions which we considered more likely during our tasks.               such as age, gender, highest level of education, employment, and
     • Processing used standard tools, both to reduce effort and                computer and search engine usage. Participants were also asked to
       to aid reproducability. We used OpenSMILE [11] for audio                 complete a modified version of the Search Self-Efficacy Scale [4]
       analysis; OpenFace [3] for coding facial actions; Microsoft              and how they would rate their own overall search skills. Partici-
       Cognitive Services3 to produce transcripts; and Linguistic               pants were asked if they had experience with intelligent personal
       Inquiry and Word Count (LIWC) [18] for lexical analysis.                 assistants such as Google Now, Siri, Amazon Alexa, or Cortana.
   We are happy to make many of our processing scripts available                Seekers and intermediaries were asked to complete pre- and post-
for other researchers—a small number use in-house tools—although                task questionnaires throughout the study measuring interest and
again there have been no requests so far.                                       knowledge about the task, experienced task difficulty, experienced
   Reporting and availability. We described our protocol in detail in           conversational difficulty, experienced collaboration difficulty, ex-
our first publication [21]. This paper includes details of participants,        perienced search presentation difficulty, overall difficulty, overall
the wording for all tasks and questions, and descriptive statistics             satisfaction, and open questions. Some of these questions were
including reliability measures.                                                 adapted or reproduced from Kelly, Arguello, Edwards, and Wu [12].
   The MISC data is available online at http://aka.ms/MISCv1.                      The SCSdata was designed with our own research questions in
                                                                                mind, while optimising the transcriptions and labelling for future
1.2      SCSdata                                                                use. We believe that the labelled data set is very valuable for the
                                                                                research community. The data set was recently updated and it is
The Spoken Conversational Search data set (SCSdata) contains the
                                                                                planned to update the data set with the full labelling annotations
utterance transcriptions of a spoken information seeking process
                                                                                and label creation methodology in the near future.
between two actors. To the best of our knowledge, SCSdata was
the first data set which was created in this experimental setup. It is             Reporting and availability. We described our experimental setup
also the first SCS data set which received labelling of the actions or          in the preliminary data analysis paper [22]. Fully documented in-
utterances, albeit only for the first three turns [22]. However, the            formation is available on the transcription protocol and labelling
release of the fully labelled data set is planned.                              process in Trippas et al. [24]. That paper aimed to establish a pro-
1 See also e.g. http://www.mindgarden.com/documents/PerceivedStressScale.pdf.   tocol for spoken search interaction transcription, minimising the
2 https://ils.unc.edu/searchtasks/                                              likelihood that consequently produced transcripts are inconsistent
3 https://www.microsoft.com/cognitive-services/en-us/speech-api                 with each other.
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
Data Sets for Spoken Conversational Search                                               (CHIIR 2019), 14 March 2019, Glasgow, UK


   Other details such as the procedure of the study or questionnaire    so researchers could look for differences in answer correctness
results have not yet been published.                                    or completeness4 . SCSdata participants were not asked to record
   The SCSdata is available online via https://jtrippas.github.io/      an answer, but were asked to say “stop search” when they were
Spoken-Conversational-Search/.                                          satisfied with the found information and could answer the informa-
                                                                        tion need. This again led to differences in behaviour, such as MISC
2     COMPARING MISC AND SCSdata                                        “seekers” confirming spelling in order to write down the answer.
                                                                            These differences were an unexpected nuisance, as even with
In recent, unpublished work, one of us (Trippas) has developed a
                                                                        such similar protocols it required some work to understand and
code schema for annotating utterances in spoken conversational
                                                                        account for the substantial differences in data. However, familiarity
search. Initial development used SCSdata, but since MISC is very
                                                                        with the data meant that once we had observed the differences,
similar it has been reused to validate the schema. We offer below
                                                                        they were easy to understand. A close reading of the published
some observations on re-using MISC and SCSdata, based on this
                                                                        descriptions would have given the same hints. Further, it is likely
experience.
                                                                        that the differences were in fact useful for the validation, as they
   It is clearly valuable to have two data sets collected with such
                                                                        gave more variety and tested the coding schema in slightly different
similar protocols, and for similar purposes. Coding conversations
                                                                        exchanges.
relies on having lengthy, naturalistic exchanges, and both SCSdata
                                                                            We also note some smaller differences. For the SCSdata record-
and MISC have several exchanges running to ten minutes. Both sets
                                                                        ings, a researcher was in the room with the participant. The MISC
distinguish the “seeker” and “intermediary” roles, allowing direct
                                                                        researchers were not. This may have led to some differences in
comparison, and both include transcripts which could be coded
                                                                        the data, although we have not yet explored this. There is also a
more or less directly. However, some differences across the data
                                                                        difference in audio quality. The audio files from the SCSdata are
sets did hamper reuse, or led to unexpected findings.
                                                                        poor, because they were recorded through a video camera. Using
                                                                        those recordings was never part of the experimental setup.
2.1    Protocol differences                                                 Finally, there are details of the protocol which may have resulted
First, while the SCSdata was manually transcribed, the MISC data        in minor differences between the sets. MISC featured a warm-up
is about ten times larger but has only been transcribed with a          task, while SCSdata did not; MISC participants used a Windows PC,
commercial speech-to-text system. Although the automatic speech         while SCSdata participants used a Mac; and MISC intermediaries
recognition (ASR) system was state of the art, it was still prone to    started with Bing, SCSdata intermediaries with Google, although
errors. (One common error was to inject “speech” when a partici-        all were allowed to switch to any other site.
pant was typing, as if the ASR was confused by keyboard noise.)
These errors were discovered because a close reading was needed         2.2      Terminology
to label the MISC utterances for the validation of an annotation        There has been some inconsistency in terminology. First the two
schema. The difference in transcription techniques also gave a dif-     actors of the SCSdata were referred to as the “user” (the participant
ferent notion of utterance or turn: in SCSdata these are divided        with the search task) and “retriever” (the participant with the search
manually, while in MISC they are separated by pauses in the audio       engine) [22, 24]. In later publications describing the SCSdata, “user”
signal. Utterance-level statistics may not be directly comparable.      became “seeker” and “retriever” became “intermediary” [23]. These
   The sets also differ in the pre- and post-task questions. The        latter terms match MISC.
MISC questions and responses are part of the released data, and            Other terminology is not standard. Trippas et al. used “spo-
the published description includes descriptive statistics and basic     ken conversational search” to emphasise the spoken channel, as
validity checks. We hope this is useful for future work. The SCSdata    opposed to multi-turn interactions with e.g. typing or selecting
protocol also added many pre- and post-task items (see section 1.2),    buttons. For the same scenario, Thomas et al. used the phrase
on overlapping themes but with different instruments. These have        “information-seeking conversation” to encourage a broader under-
not been examined to date so they may or may not be useful or           standing encompassing negotiation and clarification, not just a
comparable. Future SCSdata releases will not include this data.         traditional query/response “search” model. Other terms again are
   Some apparently small differences between the SCSdata and            used elsewhere in the literature. Presumably in the near future this
MISC protocols have led to observable differences in the collected      terminology, as well as the names of the different roles, will be
data. SCSdata participants were expressly prohibited from reading       standardised.
out the task statement verbatim and had to verbalise their informa-
tion request; MISC participants were given no instruction on this       2.3      Task design
matter. As a result, the MISC data include seekers reading out and
                                                                        As explained in section 1.2, the tasks used for the SCSdata were
repeating the task statements, verbatim. More importantly, once
                                                                        reused from research by Bailey et al. [2] and are based on the
both participants have the same statement, the roles of “seeker” and
                                                                        Taxonomy of Learning. Three of the five cognitive dimensions were
“intermediary” are blurred and the two act much more like peers.
                                                                        used: Remember, Understand, and Analyse. However, it has been
This has influenced the interactions in MISC, and the distribution
                                                                        suggested that there are no clear interaction differences between
of conversational moves.
                                                                        Understand and Analyse tasks [22], which is consistent with the
   The two protocols also differed at the end of each task. For MISC,
                                                                        difficulties Moffat et al. reported when classifying tasks [15].
“seekers” were asked to record an answer: this was meant partly to
encourage participants to properly complete each task, and partly       4 In the event, we have not been able to code the answers with any degree of reliability.
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
(CHIIR 2019), 14 March 2019, Glasgow, UK                                                        Johanne Trippas and Paul Thomas

Table 1: MISC search tasks. These were controlled for com-                of data collection are not always easy to describe in a paper, but the
plexity, difficulty, and likely emotional response.                       protocols for SCSdata and MISC were relatively simple and the data
                                                                          can be re-used with care. It has been interesting and informative to
        Difficulty   Complexity      Emotion     Task source              compare the two sets of transcripts, and we hope to continue this
                                                                          to investigate other conversational questions.
    0   Warm-up      (NA)            (NA)        Buhi et al. [7], via
                                                 RepAST
    1   Low          Low             Positive    Modified TREC            ACKNOWLEDGMENTS
                                                 topic 442                We thank Daniel McDuff, Mary Czerwinski, and Nick Craswell for
    2   Low          High            Negative    Broussard and
                                                                          their effort assembling MISC, and Penny Analytis for auditing the
                                                 Zhang [6], via
                                                 RepAST
                                                                          SCSdata transcriptions. We are grateful to our participants for their
    3   High         Low             (NA)        Newly created            time.
    4   High         High            Positive    White [26], via
                                                 RepAST                   REFERENCES
                                                                           [1] L. W. Anderson, D. R. Krathwohl, and B. S. Bloom. 2001. A taxonomy for learning,
                                                                               teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives.
                                                                               Longman, New York.
   The MISC tasks were gathered from different sources and one             [2] Peter Bailey, Alistair Moffat, Falk Scholer, and Paul Thomas. 2016. UQV100: A
task was created specifically for this study (Table 1). More specif-           test collection with query variability. In Proc. Int. ACM SIGIR Conf. on Research
                                                                               and Development in Information Retrieval. 725–728.
ically, the tasks used in MISC were chosen to elicit positive and          [3] Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. 2016. Open-
negative emotions and were based on two different levels of diffi-             face: An open source facial behavior analysis toolkit. In Proc. IEEE Winter Conf.
culty and complexity as seen in Table 1. Since MISC uses only two              Applications of Computer Vision. 1–10.
                                                                           [4] Kathy Brennan, Diane Kelly, and Yinglong Zhang. 2016. Factor analysis of a search
levels, it would perhaps make sense to consider Understand and                 self-efficacy scale. In Proc. ACM SIGIR Conf. on Human Information Interaction
Analyse as high complexity, and Remember as low complexity, if                 and Retrieval. 241–244.
task-to-task comparisons were needed. Alternatively, differences           [5] H. M. Brooks and N. J. Belkin. 1983. Using discourse analysis for the design of
                                                                               information retrieval interaction mechanisms. In Proc. Int. ACM SIGIR Conf. on
in interaction patterns may let us align tasks across the two sets.            Research and Development in Information Retrieval. 31–47.
We have not yet explored these possibilities.                              [6] Ramona Broussard and Yan Zhang. 2013. Seeking treatment options: Consumers’
                                                                               search behaviors and cognitive activities. J. American Society for Information
                                                                               Science and Technology 50, 1 (2013), 1–10.
3   OBSERVATIONS                                                           [7] Eric R. Buhi, Ellen M. Daley, Hollie J. Fuhrmann, and Sarah A. Smith. 2009.
                                                                               An observational study of how young people search for online sexual health
Two sets of spoken conversational searches—SCSdata and MISC—                   information. J American College Health 58, 2 (2009), 101–111.
were collected independently, by different teams, in different ge-         [8] Sheldon Cohen, Tom Kamarck, and Robin Mermelstein. 1983. A global measure
ographical locations, to support different research. It is fortunate           of perceived stress. J. Health and Social Behavior 24, 4 (Dec. 1983), 385–396.
                                                                           [9] Penny J. Daniels, H. M. Brooks, and N. J. Belkin. 1985. Using problem structures for
that the data sets are similar enough so that we can make direct               driving human-computer dialogues. In RIAO-85: Actes: Recherche d’Informations
comparisons, and use one set to verify observations from the other.            Assistée par Ordinateur. 645–660.
                                                                          [10] Birgit Endrass, Matthias Rehm, and Elisabeth André. 2009. Culture-specific
   Despite being collected with very similar goals and methods,                communication management for virtual agents. In Proc. Int. Confȯn Autonomous
relatively small differences in protocol made observable differences           Agents and Multiagent Systems—Volume 1. 281–287.
to the data and we have had to be careful with reuse and compar-          [11] Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent
                                                                               Developments in openSMILE, the Munich Open-Source Multimedia Feature
isons. This was made much easier by our familiarity with the data;             Extractor. In Proc. ACM Multimedia. 835–838.
another researcher could quite reasonably choose these two data           [12] Diane Kelly, Jaime Arguello, Ashlee Edwards, and Wan-ching Wu. 2015. De-
sets, compare them, and have difficulty. That this is possible de-             velopment and evaluation of search tasks for IIR experiments using a cognitive
                                                                               complexity framework. In Proc. Int. Conf. on the Theory of Information Retrieval.
spite careful design and description, and despite close similarity in          101–110.
protocol, may perhaps caution us about reuse in interactive studies       [13] Martha Larson and Gareth JF Jones. 2012. Spoken content retrieval: A survey of
                                                                               techniques and technologies. Foundations and Trends® in Information Retrieval 5,
generally.                                                                     4–5 (2012), 235–422.
   We were however helped by the decision to explicitly allow the         [14] Daniel McDuff, Paul Thomas, Mary Czerwinski, and Nick Craswell. 2017. Multi-
release of MISC’s raw data (not just, e.g., transcripts). Because audio        modal analysis of vocal collaborative search: a public corpus and results. In Proc.
                                                                               ACM Int. Conf. on Multimodal Interaction. 456–463.
was available, the transcription errors could be detected. Unfortu-       [15] Alistair Moffat, Peter Bailey, Falk Scholer, and Paul Thomas. 2014. Assessing
nately ethical clearance precludes a similar release for SCSdata, and          the cognitive complexity of information needs. In Proc. Australasian Document
this may limit reuse.                                                          Computing Symposium. ACM, 97–100.
                                                                          [16] National Aeronautics and Space Administration Human Systems Integration
   Communication between two people is very culture-specific [10].             Division. 2016. TLX @ NASA Ames. (2016). Retrieved January 2017 from
Even though both MISC and SCSdata were collected in English                    https://humansystems.arc.nasa.gov/groups/TLX/
                                                                          [17] Heather L O’Brien and Elaine G Toms. 2010. The development and evaluation of
speaking countries, and all participants claimed native or high-               a survey to measure user engagement. J American Society for Information Science
level English, we do not exclude that cultural differences played a            and Technology 61, 1 (2010), 50–69.
role in the differences in the two data sets. Similarly, the difference   [18] James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015.
                                                                               The development and psychometric properties of LIWC2015. Technical Report.
in participant populations (more uniform in SCSdata, more varied               University of Texas at Austin.
in MISC) may have resulted in differences in communication.               [19] Rachel Reichman. 1985. Getting computers to talk like you and me. MIT Press,
                                                                               Cambridge, Massachusetts.
                                                                          [20] Paul Thomas, Mary Czerwinksi, Daniel McDuff, Nick Craswell, and Gloria Mark.
Spoken conversational search is still an immature field of inquiry,            2018. Style and alignment in information-seeking conversation. In Proc. ACM
and we should exercise some caution re-using data sets. Nuances                SIGIR Conf. on Human Information Interaction and Retrieval. 42–51.
Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval
Data Sets for Spoken Conversational Search                                               (CHIIR 2019), 14 March 2019, Glasgow, UK


[21] Paul Thomas, Daniel McDuff, Mary Czerwinski, and Nick Craswell. 2017. MISC:
     A data set of information-seeking conversations. In Proc. Int. Workshop on Con-
     versational Approaches to Information Retrieval.
[22] Johanne R. Trippas, Lawrence Cavedon, Damiano Spina, and Mark Sanderson.
     2017. How do people interact in conversational speech-only search tasks: A
     preliminary analysis. In Proc. ACM SIGIR Conf. on Human Information Interaction
     and Retrieval. 325–328.
[23] Johanne R. Trippas, Damiano Spina, Lawrence Cavedon, Hideo Joho, and Mark
     Sanderson. 2018. Informing the design of spoken conversational search: Per-
     spective paper. In Proc. ACM SIGIR Conf. on Human Information Interaction and
     Retrieval. 32–41.
[24] Johanne R. Trippas, Damiano Spina, Lawrence Cavedon, and Mark Sanderson.
     2017. A conversational search transcription protocol and analysis. In Proc. Int.
     Workshop on Conversational Approaches to Information Retrieval.
[25] Svitlana Vakulenko, Kate Revoredo, Claudio Di Ciccio, and Maarten de Rijke.
     2019. QRFA: A data-driven model of information-seeking dialogues. In Proc.
     European Conf. on Information Retrieval. To appear.
[26] Ryen W White. 2004. Implicit feedback for interactive information retrieval. Ph.D.
     Dissertation. University of Glasgow.
[27] Stephen P. Whiteside and Donald R. Lynam. 2003. Understanding the role of
     impulsivity and externalizing psychopathology in alcohol abuse: application of
     the UPPS impulsive behavior scale. 11, 3 (2003), 669–689.

</pre>