=Paper=
{{Paper
|id=Vol-2935/paper4
|storemode=property
|title=Non-humorous Use of Laughter in Spoken Dialogue Systems
|pdfUrl=https://ceur-ws.org/Vol-2935/paper4.pdf
|volume=Vol-2935
|authors=Vladislav Maraev,Jean-Philippe Bernardy,Christine Howes
}}
==Non-humorous Use of Laughter in Spoken Dialogue Systems==
Non-humorous use of laughter in spoken dialogue systems
Vladislav Maraev1∗ , Jean-Philippe Bernardy 1 and Christine Howes1
1
Centre for Linguistic Theory and Studies in Probability (CLASP), Department of Philosophy,
Linguistics and Theory of Science, University of Gothenburg
{vladislav.maraev, jean-philippe.bernardy, christine.howes}@gu.se
Abstract respectively asking for the phone numbers of certain named
businesses. Half of the dialogues happened in a noisy envi-
In this paper we argue that laughter, an ambigu- ronment, with many mishearings and laughs induced. This
ous yet ubiquitous signal in everyday interactions, paper addresses the following research question: how can
can act as an important feature for task-oriented these laughs be accounted for in a dialogue system, which
dialogue systems. We show which components of implements a similar scenario?
a dialogue system should be affected and modi-
fied, and more specifically how particular types of (1) DEC:22_KL_loc2
laughter can be accounted for in a dialogue man- 56 Caller er the next one is er tanfield
ager as instances of short answers, feedbacks and chambers
vocalisations accompanying them. 57 Operator santias?
58 Caller tanfield like t- T A N
59 Operator sorry i don’t hear you again
1 Introduction please?
Laughter is very frequent in everyday interactions, for in- 60 Caller er T A N
stance, in the Switchboard Dialogue Act Corpus [Jurafsky 61 Operator C?
et al., 1997] corpus laughter comes about every 200 words. 62 Caller tanfield
Laughter is an ambiguous social signal, and in addition to 63 Operator A
communicating joy and pleasure intuitively associated with 64 Operator N
humour it also can communicate embarrassment, be used to 65 Caller yeah
smooth and soften everyday interactions and also bear prag- 66 Caller and then field
matic functions such as marking irony or usage of a word in 67 Operator and then seal?
a specific sense [Poyatos, 1993; Mazzocconi, 2019; Ginzburg 68 Caller chambers
et al., 2020]. 69 Operator sorry i hear you quite
For a spoken dialogue system, laughter is an important sig- poorly
nal to account for due to its contribution to the naturalness of 70 Operator let’s try again
automated dialogue. Laughter can be used in chit-chat di- 71 Operator C?
alogue due to its potential to build rapport and establish a 72 Caller yeah sorry the traffic is crazy
para-social bond between the user and artificial agent. around here
There have been attempts to produce laughs as a way to 73 Operator I know don’t worry
mimic human behaviour and align with it [Urbain et al., 74 Operator so C
2010; El Haddad et al., 2019], as well as laughing avatars 75 Operator A
mainly focussed on laughter as a reaction to jokes [Ochs and 76 Caller er
Pelachaud, 2013; Ding et al., 2014]. In this paper we take a 77 Caller tanfield T like thomas
rather different approach. We start from examples of usage
of laughter in real task-oriented dialogue and then propose Let’s look at the first laughter (line 69). We can see that the
ways how these behaviours can be reproduced in a dialogue operator’s question “and then seal?” (l.67) was not addressed
system, and, more specifically, in its dialogue management and this piece of information was not grounded. “C?” (l.71)
component. refers to the restart from the beginning (it was “Tanfield”, but
The example (1) below is an excerpt from a role-play di- she has heard “C”). The negative feedback provided by the
alogue collected by Howes et al. [2019] for their Directory operator (l.69) entails extra effort from the caller—she needs
Enquiries Corpus (DEC) [Bondarenko et al., 2020]. Dialogue to restart her request from the beginning—this obligation is
participants were playing the roles of a caller and an operator, somewhat intrusive and may require extra smoothing [Maz-
zocconi, 2019; Raclaw and Ford, 2017]. For our purposes, we
∗
Contact Author will treat this laughter as accompanying negative feedback.
Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution
4.0 International (CC BY 4.0). 33
For a dialogue system designer, this poses an empirical tors), goals and rules. KoS represents language interaction by
question, namely, would it be useful to soften negative feed- a dynamically changing context. The meaning of an utterance
back with laughter? For instance, the feedback associated is then how it changes the context. Compared to most ap-
with a local failure (e.g. speech recognition failure), such as proaches, which represent a single context for both dialogue
“Sorry, I didn’t understand” or “Sorry I didn’t hear you”. It participants, KoS keeps separate representations for each par-
may also be useful where negative feedback is the result of ticipant, using the Dialogue Game Board (DGB). Thus, the
an external query, for example, when something is not found information states of the participants comprise a private part
in the database, and can accompany a system request to start and the dialogue gameboard that represents information aris-
over, as in example (1). ing from publicised interactions. The DGB tracks, at least,
The reaction to the apology also can be accompanied by shared assumptions/visual field, moves (= utterances, form
laughter, as with the second laugh in (1) (l.73). We do not and content), and questions under discussion.
think that these days users often apologise to a dialogue sys- In dialogue, especially in a dialogue with a machine which
tem, as it is usually the dialogue system which is at fault, but involves uncertainty of automatic speech recognition (ASR)
this might be different for special cases of systems that aim at and natural language understanding components (NLU), we
more naturalistic behaviour. can not assume perfect communication. While communicat-
In this paper we consider laughter from the utilitarian per- ing, especially over an unreliable communication channel,
spective and attempt to determine which kinds of laughs can humans give each other evidence that their contributions are
be relevant for dialogue systems. Next, we will look at laugh- understood to a certain extent, sufficient for current purposes.
ter from the point of view of providing feedback, either posi- Clark [1996] and Allwood [1995] distinguish four levels of
tive or negative. action related to different degrees of grounding. Here we list
In Section 2 will start with a background on our approach them according to the action ladder [Clark, 1996], from the
to dialogue, dialogue management and laughter. Next, Sec- hearer’s perspective.
tion 3 presents a small typology of laughter types that we
1. Acceptance level determines whether the content of ut-
think should be accounted for in a task-oriented dialogue sys-
terance was accepted or rejected by the hearer.
tem. In Section 4 we describe our own dialogue management
framework and in Section 5 we show a formal account for the 2. Understanding level specifies whether the utterance
aforementioned types of laughter. We conclude with a brief was understood by the hearer
discussion of our findings and further laughter-related issues
3. Perception level determines whether the utterance was
in Section 6.
perceived by the hearer.
2 Background 4. Contact level determines whether interlocutors have es-
tablished a channel of communication.
2.1 Dialogue
The action ladder assumes that if the level above is com-
A key aspect of dialogue systems is the coherence of the sys-
plete, then all levels below are complete. For instance, if
tem’s responses. In this respect, a key component of a dia-
Bob asks “Do you like Paris” and Mary replies “Yes”, then
logue system is the dialogue manager, which selects appro-
Bob’s utterance is accepted (and also understood, perceived,
priate system actions depending on the current state and the
and their contact has been established). If she asks “Paris?”
external context.
then it might signal that Bob’s utterance was perceived but
Two families of approaches to dialogue management can not understood (and thus not accepted).
be considered: hand-crafted dialogue strategies [Allen et al.,
Larsson [2002] accounts for different levels of action
1995; Larsson, 2002; Jokinen, 2009] and statistical modelling
within the IBiS2 dialogue management framework using a
of dialogue [Rieser and Lemon, 2011; Young et al., 2010;
set of rules to update the common ground represented in the
Williams et al., 2017]. Frameworks for hand-crafted strate-
information state of the system. He uses “Interactive Com-
gies range from finite-state machines and form-filling to more
munication Management” (ICM) moves [Allwood, 1995] as
complex dialogue planning and logical inference systems,
explicit signals concerned with communicating the updates to
such as Information State Update (ISU) [Larsson, 2002] that
the common ground, and sequencing moves, e.g. restarting a
we employ here. Although there has been a lot of devel-
dialogue.
opment in dialogue systems in recent years, only a few ap-
proaches reflect advancements in dialogue theory. Our aim
is to closely integrate dialogue systems with work in theo-
2.2 Laughter
retical semantics and pragmatics of dialogue. In this paper Our focus of attention towards laughter is motivated by its
we do so by employing our own implementation of the KoS ubiquity in natural dialogue. In the British National Corpus,
theoretical dialogue framework [Ginzburg, 2012] which we laughter is quite a frequent signal regardless of gender and
discussed in [Maraev et al., 2020]. In this work we extend age—the spoken dialogue part of the British National Cor-
our implementation with rudimentary support of grounding, pus (UK English, unscripted interactions that were recorded
therefore allowing the implementation to be further extended by volunteers in various social settings, balanced for age, re-
to support certain types of laughter. gion and social class) contains approximately one occurrence
In KoS (and many other dynamic approaches to meaning), of laughter every 14 utterances. In the Switchboard Dialogue
language is treated as a game, containing players (interlocu- Act corpus [Jurafsky et al., 1997] (US English, one-on-one
34
interactions over a phone where participants that are not fa- 3 Types of laughter
miliar with each other discuss a potentially controversial sub- In this section we outline some types of laughter that can be
ject, such as gun control or school system) non-verbally vo- of special interest to task-oriented dialogue systems and can
calised dialogue acts (whole utterances that are marked as be accounted for within our proposed framework.
non-verbal) constitute 1.7% of all dialogue acts and 65% of
them contain laughter. Laughter tokens make up 0.5% of all 3.1 Laughter as a component of grounding
the tokens that occur in Switchboard Dialogue Act corpus. As we have mentioned in Section 2, and in accord with All-
wood [1995]; Clark [1996]; Larsson [2002] we consider four
Laughter production in conversation is not exclusively re- action levels that are involved in a dialogue. Here we discuss
lated to humour. But, perhaps unsurprisingly, the study of what can happen at each level of action—contact, perception,
laughter has often been linked to the study of humour and understanding and reaction—with respect to laughter.
the two terms are frequently used interchangeably. However, Contact and perception levels
laughter does not occur only in response to humour or in order Troubles related to establishing and maintaining a stable com-
to frame it. Many studies, particularly in conversation analy- munication channel can lead to laughter. One such example
sis, have shown its crucial role in managing conversations at would be delays in communication, for instance over an unre-
several levels: dynamics (turn-taking and topic-change), lex- liable network, which might lead to a person already speaking
ical (signalling problems of lexical retrieval or imprecision in at the moment when the communication is only supposed to
the lexical choice), pragmatic (marking irony, disambiguating be established. Obvious examples of such cases are caused
meaning, managing self-correction) and social (smoothing by signal jitter over video conference platforms like Zoom.
and softening difficult situations or showing (dis)affiliation) The lack of perception indicates things that haven’t been
[Glenn, 2003; Jefferson, 1984; Mazzocconi, 2019; Petitjean heard correctly (cases similar to (1)). Also, it seems that in-
and González-Martínez, 2015] terruptions or events related to that can be quite surprising and
laughter can be a natural reaction to a surprise (see Section 6).
There have been several approaches to classify types of
laughter [e.g., Poyatos, 1993; Vettin and Todt, 2004; Mazzoc- Understanding level
coni, 2019]. Mazzocconi [2019] claims that the most prob- The lack of pragmatic understanding relates to the kinds of
lematic issue with existing taxonomies is that they mix types incongruities that are caused by the violation of the principle
of laughter functions with types of laughter triggers, so she of conversational relevance. This is very useful for dialogue
roots her proposal on the function of laughter and the proposi- systems because they are prone to errors in this realm. It is
tional content of laughable—the argument the laughter pred- often the case that incorrect NLU or ASR can lead to priori-
icates about, an event or state referred to by an utterance or tising irrelevant results (for example, in cases of out-of-scope
exophorically [Glenn, 2003]. In this paper we look at laugh- user queries), which can cause user’s confusion and, there-
ter not exclusively from a perspective of a taxonomy that can fore, laughter. This type of laughter can be treated as negative
be used as a theoretical framework but from the utilitarian feedback.
perspective, looking at which kinds of laughs can be relevant This accounts for the examples (2) and (3) below. [Lars-
for dialogue systems. son, 2002] subdivides this level into three categories for the
negative feedback (context-dependent, context-independent
and pragmatic). The examples (2) and (3) above would re-
Laughter as a way for an embodied conversational agent late to the pragmatic level of misunderstanding.
(ECA) to provide emotional response has gained some atten-
tion from the Affective Computing and other research com- (2) from the dialogue between a virtual assistant (Diana)
munities. Becker-Asano and Ishiguro [2009] evaluated the and a person with ASD (Mark):
role of laughter in the perception of social robots and indi- Mark Diana, what is money?
cated that the situational context, determined by linguistic and Diana I am Diana, a virtual interlocutor.
non-verbal cues (such as gaze) played an important role. Ni- Audience (laugh)
jholt [2002] discusses the challenges of integrating humour (3) constructed example
into ECAs, and existing integration of smiling and laughter Brian Would you like tea or coffee?
in embodied conversational agents (ECA) is typically is trig- Katie yes
gered by a joke told by a user or an agent [Ding et al., 2014; Brian (laughs)
Ochs and Pelachaud, 2013]. El Haddad et al. [2019] looked at A dialogue system can also be unsure about what has been
the mimicry of smiles and laughs between the interlocutors, understood. In such cases, the system should demonstrate
which also might be used as the basis for ECA’s behaviour. a lower degree of commitment to what has been said as a
Urbain et al. [2010] takes a similar perspective, equipping part of a display of understanding. For example, in the case
ECAs with a capability to join its conversational partner’s of the feedback regarding the user input, when the system
laugh. In this work we take a contrasting approach, look- repeats the input after the user, it can be useful to include
ing at pragmatic functions of some types of laughter, namely laughter in verbatim repeats, which would mean: yes, I heard
providing feedback and answering questions, and provide a (understood) this, but I might be wrong. This can also be
formal account for such behaviour within a dialogue man- useful for a system’s actions taken based on low confidence
agement framework. results.
35
Reaction (consider for acceptance) level within Statement-non-opinion
the given DA Apology
Downplayer
On this level what has being understood can be either ac-
cepted or rejected for the current purpose. Acceptance laugh-
ter can typically be related to a reaction to humour, which is in previous utterance in next utterance
by self
out of the scope of the current paper, or apology (see next by self
section). 0.16
0.12
Ginzburg et al. [2020] consider some uses of standalone 0.08
0.04
laughter as cases of negative response to a polar question (4)
or a signal of disbelief in a previously uttered assertion (5).
(4) From Ginzburg et al. [2020], context: Bayern München
goalkeeper Manuel Neuer faces the press after his in previous utterance in next utterance
team’s (Dreierkette—three-in-the-back) defence has by other by other
proved highly problematic in the game just played
(which they won 3-2 against Paderborn).
Journalist: (smile) Dreierkette auch ‘ne Option? Figure 1: Comparison of the most common dialogue act in SWDA—
(Is the three-at-the-back also “Statement-Non-Opinion” (33.27% of all utterances) with the di-
alogue acts “Apology” (0.04%) and “Downplayer” (0.05%). The
an option?) proportion of utterances that contain laughter are shown in associa-
Manuel Neuer: fuh fuh fuh tion with each dialogue act.
(brief laugh)
(5) From Ginzburg et al. [2020] (biblical example
rephrased as a dialogue)
162 Operator still not finding it
God: You will at age 99 with your aged wife
163 Operator having problems with this one
Sarah have a son.
164 Caller okay
Abraham: (laughs)
165 Caller er maybe i can find
→ I don’t think I will at age 99 have a son
166 Caller er the place myself but thank you
very much for the information
In Section 5 we show how this kind of laughter as negative 167 Operator no problem sorry for not finding
response like (4) can be handled by the dialogue manager. the the last one
168 Caller
169 Caller no worries
3.2 Laughter and intrusion 170 Caller thank you
In natural dialogue, an intrusion is frequently associated with We also observe that laughter can clearly accompany the
laughter. In the Switchboard Dialogue Act corpus (SWDA) asking for a favour by the same speaker. In example (7) the
[Jurafsky et al., 1997] an Apology dialogue act is more re- operator asks the caller if they can start from the beginning,
lated to laughter, as compared to other dialogue acts. In which can be treated as an intrusion of some sort, therefore
Figure 1 we show how many dialogue acts are associated asking for a favour and the apology is accompanied by laugh-
with utterances1 containing laughter, for the current dia- ter.
logue act and for preceding and following utterances, de-
pending on the speaker. In addition to an apology, we show
its adjacency counterpart (second element of the utterance (7) DEC:24_LK_loc2
pair produced by the other speaker [Schegloff and Sacks, 59 Caller B as in bicycle
1973])—Downplayer—realised, for instance, by utterances 60 Operator yeah
like “Don’t worry” or “It’s alright”. 61 Caller then you have R
62 Caller I
In (6), the caller reacts with compassionate laughter to the 63 Operator R
apology given by the operator. This is a similar instance of 64 Caller G
laughter to one seen in (1): the second laugh shows that the 65 Operator I
same reaction, as in (6) can be expected from the operator. 66 Operator okay sorry no- now i lost the track
okay can we it start from the
(6) DEC:16_HG_loc2 beginning sorry
67 Caller okay
68 Caller yes we can
69 Operator maybe you can just say the uh say
1
In SWDA each utterance is typically mapped to a single dia- words
logue act. 70 Caller yeah no no problem
36
4 Dialogue manager architecture they are linear, these hypotheses can also be removed from
We believe that it is crucial to use formal tools which are most the state. In particular, we have a fixed set of rules (they re-
appropriate for the task: one should be able to express the main available even after being used). Each such rule ma-
rules of various genres of dialogue in a concise way, free, nipulates a part of the information state (captured by its pre-
to any possible extent, of irrelevant technical details. In the misses) and leaves everything else in the state alone.
view of Dixon et al. [2009] this is best done by represent- Our dialogue manager (DM) models the information-state
ing the information-state of the agents as updatable sets of of only one participant. Regardless, this participant can
propositions. Very often, dialogue-management rules update record its own beliefs about the state of other participants.
subsets (propositions) of the information state independently In general, the core of the DM is comprised of a set of linear-
from the rest. A suitable and flexible way to represent such logic rules which depend on the domain of application. How-
updates is as function types in linear logic. The domain of ever, many rules will be domain-independent (such as generic
the function is the subset of propositions to update, and the processing of answers). We show examples of such rules in
co-domain is the (new) set of propositions which it replaces. Section 4.4.
By using well-known techniques which correspond well
with the intuition of information-state based dialogue man-
4.2 Questions and answers
agement, we are able to provide a fully working prototype of In this paper, the essential components of the representation
the components of our framework: of a question are a type A, and a predicate P over A. Using a
typed intuitionistic logic, we write:
1. a proof-search engine based on linear logic, modified A : Type P : A → Prop
to support inputs from external systems (representing
The intent of the question is to find out about a value x
inputs and outputs of the agent)
of type A which makes P x true, or at least entertained by
2. a set of rules which function as a core framework for the other participant. We provide several examples in Table
dialogue management (in the style of KoS [Ginzburg, 1. It is worth stressing that the type A can be large (for ex-
2012]) ample asking for any location) or as small as a boolean (if
3. several examples which use the above to construct po- one requires a simple yes/no answer). We note in passing
tential applications of the system. that, typically, polar questions can be answered not just by a
boolean but by qualifying the predicate in question, for exam-
4.1 Linear rules and proof search ple, “maybe”, “on Tuesdays”, etc. (Table 1, last two rows).
This is formalised by letting A = Prop → Prop.
Typically, and in particular in the archetypal logic program-
ming language prolog [Bratko, 2001], axioms and rules are 4.3 Representation of questions with
expressed within the general framework of first-order logic. metavariables
However, several authors [Dixon et al., 2009; Martens, 2015]
have proposed using linear logic [Girard, 1995] instead. For In this subsection we show how a metavariable can represent
our purpose, the crucial feature of linear logic is that hypothe- what is being asked, as the unknown in a proposition. A first
ses may be used only once. use for metavariables is to represent the requested answer to
In general, the linear arrow corresponds to destructive state a question.
updates. Thus, the hypotheses available for proof search cor- Within the state of the agent, if the value of the requested
respond to the state of the system. In our application, they answer is represented as a metavariable x , then the question
will correspond to the information state of the dialogue par- can be represented as: Q A x (P x ). That is, the pending
ticipant. question (Q denotes a question constructor) is a triple of a
In linear logic, normally firing a linear rule corresponds to type, a metavariable x , and a proposition where x occurs. We
triggering an action of an agent, and a complete proof cor- stress that P x is not part of the information state of the agent
responds to a scenario, i.e. a sequence of actions, possibly yet, rather the fact that the above question is under discussion
involving action from several agents. However, the informa- is a fact. For example, after asking “Where does John live?”,
tion state (typically in the literature and in this paper as well), we have:
corresponds to the state of a single agent. Thus, a scenario
is conceived as a sequence of actions and updates of the in- haveQud : QUD (Q Location x (Live John x ))
formation state of a single agent a, even though such actions Resolving a question can be done by communicating an
can be attributed to any other dialogue participant b. (That is, answer. An answer to a question (A : Type; P : A → Prop)
they are a’s representation of actions of b.) Scenarios can be can be of either of the two following forms: i) A ShortAn-
realised as a sequence of actual actions and updates. That is, swer, which is a pair of an element X : A and its type A, rep-
an action can result in sending a message to the outside world resented as ShortAnswer A X or ii) An Assertion which is
(in the form of speech, movement, etc.). Conversely, events a proposition R : Prop, represented as Assert R. Therefore,
happening in the outside world can result in extra-logical up- one way to process a short answer is by the processShort
dates of the information state (through a model of the percep- rule:
tory subsystem).
In our implementation, we treat the information state as a processShort : (a : Type) → (x : a) → (p : Prop) →
multiset of linear hypotheses that can be queried. Because ShortAnswer a x ( QUD (Q a x p) ( p
37
question A P reply x
Where does
Location λx .Live John x in London ShortAnswer Location London
John live?
Does John λx .if x then (Live John Paris)
Bool yes ShortAnswer Bool True
live in Paris? else Not (Live John Paris)
What time is it? Time λx .IsTime x It is 5am. Assert (IsTime 5.00)
Does John ShortAnswer (Prop → Prop)
Prop → Prop λm.m (Live John Paris) yes
live in Paris? (λx .x )
Does John ShortAnswer (Prop → Prop)
Prop → Prop λm.m (Live John Paris) from January
live in Paris? (λx .FromJanuary (x ))
Table 1: Examples of questions and the possible corresponding answers. The type A is the type of possible short answers. The proposition
P x is the interpretation of a short answer x . The x column shows the formal representation of a possible answer, either in short form or
assertion form.
Above we use Π type binders to declare (meta)variables participant. Regardless, this participant can record its own be-
(written here (a : Type) →, (x : a) →, etc.). This termi- liefs about the state of other participants. In general, the core
nology will make sense to readers familiar with dependent of the DM is comprised of a set of linear-logic rules which
types. For others, such binders can be thought of as universal depend on the domain of application. However, many rules
quantification (∀a, ∀x , etc.), the difference is that the type of will be domain-independent (such as the generic processing
the bound variable is specified.2 of answers).
We demand in particular that types in the answer and in To be useful, a DM must interact with the outside world,
the question match (a occurs in both places). Additionally, and this interaction cannot be represented using logical rules,
because x occurs in p, the information state will mention the which can only manipulate data which is already integrated in
concrete x which was provided in the answer. For example, the information state. Here, we assume that the information
if the QUD was (Q Location x (Live John x )) and the that comes from sources which are external to the dialogue
system processes the answer ShortAnswer Location Paris, manager is expressed in terms of semantic interpretations of
then x unifies with Paris, and the new state will include moves, and contains information about the speaker and the
Live John Paris. addressee in a structured way. We provide 5 basic types of
To process assertions, we can use the following rule: moves, specified with a speaker and an addressee, as an illus-
tration:
processAssert : (a : Type) → (x : a) → (p : Prop) →
Assert p ( QUD (Q a x p) ( p Greet spkr addr
CounterGreet spkr addr
That is, if (1) p was asserted, and (2) the proposition q is Ask question spkr addr
part of a question under discussion, and (3) p can be unified ShortAnswer vtype v spkr addr
with q (we ensure this unification by simply using the same Assert p spkr addr
metavariable p in both roles in the above rule), then the asser-
tion resolves the question. Additionally, the metavariable x is These moves can either be received as input or produced as
made ground to a value provided by p, by virtue of unification outputs. If they are inputs, they come from the NLU compo-
of p and q. For example, “John lives in Paris” answers both of nent, and they enter the context with Heard : Move → Prop
the questions “Where does John live?” and “Does John live predicate. For example, if one hears a greeting, the propo-
in Paris?” (there is unification), but, not, for example, “What sition Heard (Greet S A) is added to the information
time is it?” (there is no unification). Note that, in both cases state/context, without any rule being fired—this is what we
(processAssert and processShort), the information state is mean by an external source.
updated with the proposition posed in the question. If they are outputs, to be further used by the NLG com-
ponent, some rule will place them in Agenda. For example,
4.4 Dialogue management to issue a counter greeting, a rule will place the proposition
In this section we integrate our question/answering frame- (CounterGreet A S ) in the Cons-list Agenda part of the
work within more complete dialogue manager (DM). We information state.
stress that this DM models the information-state of only one Thereby each move is accompanied by the information
about who has uttered it, and towards whom was it addressed.
2
The reader worried about any theoretical difficulty regarding All the moves are recorded in the Moves part of the partici-
mixing linear and dependent types is directed to Atkey [2018] and pant’s dialogue gameboard, as a Cons-list (stack).
Abel and Bernardy [2020]. Additionally, we record any move m which one has yet to
38
actively react to, in a hypothesis of the form Pending m. We on whether the fact is unique and concrete or not (defined by
cannot use the Moves part of the state for this purpose, be- operators →! and →? respectively, see Maraev et al., 2020
cause it is meant to be static (not to be consumed). Pending for further details).
thus allows one to make the difference between a move which
is fully processed and a pending one. produceAnswer :
Here we will provide a few examples of the rules which (a : Type) → (x : a) →! (p : Prop) →
are implemented in our system, and we refer our reader to (qs : List Question) →
[Maraev et al., 2020] for more detailed description. QUD (Cons (Q USER a x p) qs) ( p _
[_ :: Agenda (ShortAnswer a x SYSTEM USER);
Examples _ :: QUD qs;
We can show how basic move-adjacency can be defined in the _ :: Answered (Q USER a x p)]
example of a counter greeting preconditioned by a greeting
from the other party:3
4.5 Extending the dialogue manager with
counterGreeting : (x y : DP ) → HasTurn x _ grounding strategies
Agenda as ( Pending (Greet y x ) ( In this subsection we provide a sketch of basic grounding
Agenda (Cons (CounterGreet x y) as) strategies and moves related to them, which will be further
Another important rule accounts for pushing the content of used to model laughter.
any received Ask move on top of the stack of questions under Dialogue systems deal with confidence scores from ASR
discussion (QUD). and NLU components, which reflects the uncertainty in user
queries. For simplicity we will represent the confidence
pushQUD : (q : Question) → (qs : List Question) → score t in on the basis of three confidence threshold lev-
(x y : DP ) → Pending (Ask q x y) ( els (T1 < T2 ), where RED would correspond to t < T1 ,
QUD qs ( QUD (Cons q qs) YELLOW to T1 < t < T2 , and GREEN to T2 < t. Colour-
If the user asserts something that relates to the top QUD, coded confidence scores would accompany user moves, e.g.
then the QUD can be resolved and therefore removed from the Ask move such as “What time is it?” can be represented
the stack. The corresponding proposition p is saved as a as follows:
PendingUserFact.4 The following rule5 is an extended di- Ask (Q U Time t0 (IsTime t0 )) U S YELLOW
alogue management version of the rule previously introduced
in Section 4.3. Here we illustrate the possibility of extending the system
with Interactive Communication Management (ICM) moves
processAssert : (a : Type) → (x : a) → (p : Prop) →
and grounding strategies, replicating Larsson’s [2002] ac-
(qs : List Question) →
count for grounding and feedback. ICM moves are used for
(dp dp1 : DP ) → Pending (Assert p dp1 dp) (
coordination of the common ground in dialogue, which ex-
QUD (Cons (Q dp a x p) qs) (
presses, for instance, explicit signals for integrating the in-
[_ :: PendingUserFact p; _ :: QUD qs ]
coming information and updating the common ground (dia-
Then, other rules will take into account the logue gameboard in our implementation). The basic type for
PendingUserFact p in a system-specific way. In the the ICM move is the following:
simplest case, the system may treat p as a true proposition.
(In this paper we will consider meta-level pending user facts ICM level polarity content
instead.) where level corresponds to the level of grounding (contact,
Short answers are processed in a very similar way to asser- perception, understanding, acceptance), polarity is either
tions: positive or negative, and the optional value content corre-
processShort : (a : Type) → (x : a) → (p : Prop) → sponds to a component of the common ground in question.
(qs : List Question) → (dp dp1 : DP ) → For instance, the move (ICM Per Neg None) would corre-
Pending (ShortAnswer a x dp1 dp) ( spond to the utterance “I didn’t understand what you said” or
QUD (Cons (Q dp a x p) qs) ( “Pardon”, and the move (ICM Und Pos q) can be realised
[_ :: PendingUserFact p; _ :: QUD qs ] as the utterance “You are asking me what time is it” if the
QUD q corresponds to the question from Ask move exempli-
If the system has a fact p in its database it can produce an fied above.
answer or a domain-specific clarification request depending Next, we modify our basic pushQUD rule defined in Sec-
3 tion 4.4 to support different system behaviours depending on
Taking a linear argument and producing it again is a common
pattern, which can be spelled out A ( (A ⊗ P ). From here on we the confidence score. In the GREEN case, question from
use the syntactic sugar A _ P for it. the user Ask move is being integrated into QUD, and ICM
4
For the current purposes we only remove the top QUD, but in a move displaying positive acceptance feedback, i.e. “okay”,
more general case we can implement the policy that can potentially (ICM Acc Pos None) is being put on the Agenda. In
resolve any QUD from the stack. the YELLOW case, system should additionally report about
5 positive understanding, e.g. “You want to know about time”,
Note the use of the single colon (:) for metavariables and the
double colon for information-state hypotheses (::). so it adds (ICM Und Pos q) move on the Agenda.
39
pushQUDGreen : (q : Question) → queries with more arguments can be resolved in shorter ut-
(qs : List Question) → (x y : DP ) → terance depending on the arguments that are made ground.
Pending (Ask q x y GREEN ) ( Agenda as ( For instance, in a context of interaction at a food kiosk:
QUD qs (
[_ :: QUD (Cons q qs); ICM Und Pos
_ :: Agenda (Cons (ICM Acc Pos None) as); ] (QuestionIsNot
(Q U (Prop → Prop) m0 (m0 WantOlives))
pushQUDYellow : (q : Question) → could become a simple “Sorry, let’s forget olives.”.
(qs : List Question) → (x y : DP ) →
Pending (Ask q x y YELLOW ) ( Agenda as ( 5 Formal treatment of certain types of
QUD qs (
[_ :: QUD (Cons q qs); laughter
_ :: Agenda (Cons (ICM Acc Pos None) 5.1 Laughter as a rejection signal
(Cons (ICM Und Pos q) as)); ] Laughter as a reaction to interrogative feedback in the case
For RED confidence score, the system issues an interroga- of low confidence ASR/NLU result can be illustrated by the
tive ICM query, such as “I understood you’re asking me about following dialogue.
the time, is that correct?”. In this case a special type of QUD U: I would like to Ask q
is introduced, namely a question about whether question q is order a vegan bean
correctly understood. burger.
icmINTConfirm : (q : Question) → (x y : DP ) → S: I understood you’d ICM Und Int q
(8)
Pending (Ask q x y RED) ( Agenda as ( like to order a beef
QUD qs ( burger. Is that
[_ :: QUD (Cons (Q Bool x correct?
(if x then UND q U: HAHAHA ShortAnswer Bool False
else UNDN q)) qs); Here we can treat laughter as a short negative answer, sim-
_ :: Agenda (Cons (ICM Und Int q) as)] ilar to “No”. In the case of interrogative ICM move, such an
Processing answers related to such a type of QUD will be answer can be processed using the icmINTneg rule defined
done as usual. For instance, a short “yes” or “no” will be above.
treated here as a boolean, and depending on the answer the This can be treated as a recovery strategy for different sys-
context will contain either PendingUserFact (UND q) or tem outputs not desired by dialogue system designers. This
PendingUserFact (UNDN q). approach can be extended to other cases of user feedback,
In this sketch implementation, we do not care about confi- for instance, to cover the cases with higher confidence score
dence scores for these answers, leaving it underspecified, but where the system produces ICM Und Pos q move, but this
further, more specific dialogue rules are possible. is out of the scope of the current paper.
Regardless of the particular answer, once the ICM question Returning to the more sophisticated (4), it can be handled
is answered, it is removed from the QUD stack, so that to of by our generic rules for integrating QUDs (pushQUD). For
the QUD stack is restored to the originally asked question. that we need to consider polar questions as expecting an an-
In our system, this is taken care of by the generic handling of swer of Prop → Prop type (see Table 1). Recalling the ex-
ShortAnswer s. Thus, in the case of a positive answer to such ample:
a query, there is nothing particular to do. Journalist: (smile) Dreierkette auch ‘ne Option?
In the negative case, the ICM move about the understand- (Is the three-in-the-back also
ing that the question was not q is issued. (4) an option?)
icmINTneg : (q : Question) → (x y : DP ) → Manuel Neuer: fuh fuh fuh
(c : Confidence) → (brief laugh)
PendingUserFact (UNDN q) ( and a type for question:
Agenda as ( A : Type P : A → Prop
Agenda (Cons In this case,
(ICM Und Pos (QuestionIsNot q)) as)
A = Prop → Prop
How ICM moves are converted to natural language ut- P = λm.m IsOptionDreierkette
terances, depending on q, is a natural language generation
(NLG) issue. For instance, The brief laughter by Manuel Neuer can be represented as:
ICM Und Pos J fuhfuhfuhK = ShortAnswer
(QuestionIsNot (Prop → Prop) (λx .Laughable x )
(Q U Time t0 (IsTime t0 )))
where the modification of the proposition, resulting in
can become the (rather tedious) utterance “So, you are not (Laughable IsOptionDreierkette) has a very basic mean-
asking me what time it is”, whereas more sophisticated ing: this proposition is the laughable, without being more
40
specific about the laughter function. One can also consider In (9) the caller experiences issues with coming up with pho-
being more specific, simply treating laughter as a negation netic spellings for certain words. The first laugh (line 27)
(ShortAnswer (Prop → Prop) (λx .Not x )), but in general deserves attention, as it seems that it reflects on both pleas-
laughter has a more nuanced meaning. ant incongruity and social one (smoothing), according to the
taxonomy of [Mazzocconi, 2019]. The pleasant incongruity
5.2 Laughter which accompanies feedback is due to the fact that the phonetic spelling of “U” as in “un-
Laughter can act as a part of ICM moves’ realisation per- der” is incongruous with the preceding ones: a preposition
formed by natural language generation (NLG) component. It vs. proper nouns. The way to spell things phonetically is
seems to us that, in particular, ICM moves the use of laugh- typically culturally specific, with the most typical cases of
ter can be considered “safe”. For instance, ICM move of the cities or countries. Stereotypes and conversational conven-
form (ICM Und Pos (QuestionIsNot (Q U (Prop → tions can be expressed with the formal notions of enthymemes
Prop) m0 (m0 WantOlives))) can be realised as a natu- and topoi, following the work of Breitholtz [2020] on rea-
ral language utterance like “Okay, let’s forget olives, hehe”, soning in conversation. Breitholtz and Maraev [2019] used
whereas laughter is used as a smoothing device to mitigate these notions to analyse conversational humour as well as
the awkwardness of system failure. Larsson [2002] often canned jokes, and we find it potentially helpful to be inte-
included an apology “Sorry” in some of the ICM moves, grated into our framework in order to account for humour in
e.g. “Sorry, I didn’t understand that”. With some possible dialogue systems. Dybala et al. [2010] emphasises the impor-
caveats, we can sometimes include slight laughter in such tance of the “two-stage” approach to humour in dialogue sys-
moves, especially if a system is getting a bit repetitive and tems, where the system tracks the emotional state of the user,
produces (ICM Und Neg) too often. Considering the evi- produces humour as a reaction to certain states and analyses
dence for laughter often accompanying apology (as a separate user’s further emotional reaction.
dialogue act) presented in Section 3.2, this can mimic natural 6.2 Surprise
behaviour in dialogue.
Intuitively, laughter is related to events that are not expected
in interaction. One of the ways to establish some degree of
6 Discussion and future work natural behaviour for a dialogue system would be to react sin-
In this paper we have shown how some types of laughter can cerely to these kinds of surprising events. A possible measure
be accounted for in task-oriented spoken dialogue system. We for a system’s surprisal is how confused it is by the user in-
proposed our own proof-theoretic architecture of a dialogue put. A natural measure for this from information theory is
manager based on KoS framework and extended it with some perplexity, a probability-based metric. For N words in an
grounding strategies. Based on this, we have shown how cer- evaluation set W = w1 w2 . . . wN , the average perplexity per
tain types of laughter, can be processed within the dialogue word is computed as follows:
manager and natural language generator, namely: laughter as v
a negative feedback, laughter as a negative answer to a po- uN
uY 1
N
lar question and laughter as a signal accompanying system P P (W ) = t (1)
feedback. i=1
P (wi | w1 . . . wi−1 )
In the following subsections we discuss several issues re- Given a language model, we can employ a threshold de-
lated to laughter in spoken dialogue systems, but only merely fined by perplexity which the system can use to act as being
touching the main subject of the paper. surprised, e.g. by saying “Ha-ha, I did not expect this!”
Similarly, perplexity can be inferred from tracking a dia-
6.1 Humour logue state in a Dialogue State Tracking task [Mrkšić et al.,
We start with humour, which is usually considered in relation 2017], which is a common task in statistical approaches to di-
to jokes generated by dialogue system, but here we present alogue system. Or, following Noble and Maraev [2021], the
more subtle incongruities related to humour in task-oriented RNN trained on a large dialogue corpus as a representation of
dialogue. dialogue context can be used to calculate perplexity.
Laughter as a reaction of surprise can relate to the levels
(9) DEC:28_NM_loc2
17 Caller okay so it starts with a of feedback, for example, a user surprised by a pragmatically
18 Caller L incoherent system’s reply can laugh (Section 5.1). But here
19 Operator L? surprise is taken in isolation, as a measure on its own right.
20 Caller as in london 6.3 Awkwardness and time-saving
21 Operator yes
22 Caller A as in america In (9), “under” is produced after a long pause (l.25) and
23 Operator america therefore indicates awkwardness in producing the phonetic
24 Caller er U spelling made the operator wait—therefore making the situ-
25 Caller as in er ((pause: 1.2s)) ation uncomfortable to the caller, so laughter was used for
26 Caller er under smoothing it.
27 Caller In the follow-up excerpt (10) from the same dialogue,
28 Operator under yes user’s awkwardness continues and she accompanies it with
41
laughter. Firstly, she laughs (l.139) demonstrating that she goes for system’s laughter as an appropriate reaction to con-
has given up finding any phonetic spelling for “K”, releasing versational humour.
the turn and allowing the operator to carry on. Her second Another portion of the features can be evaluated only sub-
laugh smooths her slight embarrassment after the situation jectively, for example, it is a question of user preference
was resolved by the operator. whether it is okay for a system to accompany asking for a
favour (e.g. “Let’s start over!”) with laughter. For this pur-
(10) DEC:28_NM_loc2 pose, we can employ subjective evaluation methods such as
134 Caller O for oslo
more task-oriented SASSI [Hone and Graham, 2000] or the
135 Operator O for oslo
more chatterbot-oriented methodology proposed by Dybala
136 Caller again O for oslo
et al. [2009], which was used for humour-equipped chatbots.
137 Operator O for oslo
We optimistically expect that characteristics such as natural-
138 Caller and K for er ((pause: 1.6s))
ity and likeability would increase and annoyance would de-
139 Caller
crease.
140 Operator as in king?
141 Caller k- king yeah
142 Operator yes Acknowledgments
143 Caller thank you The research reported in this paper was supported by grant
144 Operator that’s it? 2014-39 from the Swedish Research Council, which funds
145 Caller that’s it the Centre for Linguistic Theory and Studies in Probabil-
ity (CLASP) in the Department of Philosophy, Linguistics,
We can hypothesise that in a dialogue system these exam- and Theory of Science at the University of Gothenburg. In
ples can be handled as follows. For a system, there are op- addition, we would like to thank Staffan Larsson, Jonathan
erations which the developer knows are going to take time Ginzburg and our anonymous reviewers for their useful com-
due to technical constraints, but are expected to be immedi- ments.
ate by the user. In this case, a system can produce a similar
behaviour to the one in (9) (l.25–27): “er. . . (pause) [comes
up with an answer] ”. A system can detect the pat- References
terns of filled pause + from the user and treat them Andreas Abel and Jean-Philippe Bernardy. A unified view of
as turn-release cues. It can be a signal of either that there is modalities in type systems. Proceedings of the ACM on
something that confused the user, or that she genuinely could Programming Languages, 4(ICFP), 2020.
not come up with an answer due to certain difficulties. The James F Allen, Lenhart K Schubert, George Ferguson, Peter
downplayer dialogue act (e.g. “don’t worry”) or laughter in Heeman, Chung Hee Hwang, Tsuneaki Kato, Marc Light,
response also can be appropriate as system feedback in such Nathaniel Martin, Bradford Miller, Massimo Poesio, et al.
a situation. We consider these ideas as a subject for further The TRAINS project: A case study in building a conver-
empirical investigations. sational planning agent. Journal of Experimental & Theo-
Laughter related to smoothing retrieval difficulties can be retical Artificial Intelligence, 7(1):7–48, 1995.
indicative. Consider the case of language tutoring. In the
Anki “flashcard” app, the system provides users with a word Jens Allwood. An activity based approach to pragmatics.
in one language on the front side of the card and the user 1995.
should provide a translation. The user then gets the correct Robert Atkey. Syntax and semantics of quantitative type the-
response from the back of the card and evaluates her own ory. In Proceedings of the 33rd Annual ACM/IEEE Sym-
response (was this card Hard, Good or Easy to recall). If posium on Logic in Computer Science, LICS 2018, Oxford,
we consider making a similar conversational app, indications UK, pages 56–65, 2018.
of retrieval issues—filled pauses (“er em. . . ”) and follow-up Christian Becker-Asano and Hiroshi Ishiguro. Laughter in
smoothing by laughter—can lead to the decision to flag this social robotics-no laughing matter. In Intl. Workshop on
card as “Hard” and provide corresponding feedback (11). Social Intelligence Design, pages 287–300. Citeseer, 2009.
S What is the Swedish for donkey? Anastasia Bondarenko, Christine Howes, and Staffan Lars-
U er em . . . åsna?.. son. Directory enquiries corpus, Feb 2020.
(11)
S Yes, that was tough, but it is correct!
(system marks the card as “Hard”) Ivan Bratko. Prolog programming for artificial intelligence.
Pearson education, 2001.
6.4 Approaches to evaluation Ellen Breitholtz and Vladislav Maraev. How to put an ele-
Each of the aforementioned improvements has to be a sub- phant in the title: Modeling humorous incongruity with
ject for evaluation within the dialogue system. We expect to topoi. In Proceedings of the 23rd Workshop on the Seman-
see that these improvements will be reflected in the following tics and Pragmatics of Dialogue - Full Papers, London,
evaluation criteria. United Kingdom, September 2019. SEMDIAL.
Some of the improvements would fall into an objective Ellen Breitholtz. Enthymemes and Topoi in Dialogue: The
checklist-style criteria, like being able to understand laugh- Use of Common Sense Reasoning in Conversation. Brill,
ter as negative feedback, or as a signal of surprise. The same Leiden, The Netherlands, 2020.
42
Herbert H Clark. Using language. Cambridge university Vladislav Maraev, Jean-Philippe Bernardy, and Jonathan
press, 1996. Ginzburg. Dialogue management with linear logic:
the role of metavariables in questions and clarifications.
Yu Ding, Ken Prepin, Jing Huang, Catherine Pelachaud, and
Traitement Automatique des Langues (TAL), 61(3):43–67,
Thierry Artières. Laughter animation synthesis. In Proc.
2020.
AAMS 2014, pages 773–780. International Foundation for
Autonomous Agents and Multiagent Systems, 2014. Chris Martens. Programming Interactive Worlds with Linear
Logic. PhD thesis, Carnegie Mellon University Pittsburgh,
Lucas Dixon, Alan Smaill, and Tracy Tsang. Plans, actions PA, 2015.
and dialogues using linear logic. Journal of Logic, Lan-
guage and Information, 18(2):251–289, 2009. Chiara Mazzocconi. Laughter in interaction: semantics,
pragmatics and child development. PhD thesis, Université
Pawel Dybala, Michal Ptaszynski, Rafal Rzepka, and Kenji de Paris, 2019.
Araki. Subjective, but ot worthless-on-linguistic features
of chatterbot evaluations. In 6th IJCAI Workshop on Nikola Mrkšić, Diarmuid Ó Séaghdha, Tsung-Hsien Wen,
Knowledge and Reasoning in Practical Dialogue Systems, Blaise Thomson, and Steve Young. Neural belief tracker:
page 87. Citeseer, 2009. Data-driven dialogue state tracking. In Proceedings of the
55th Annual Meeting of the Association for Computational
Pawel Dybala, Michal Ptaszynski, Rafal Rzepka, and Kenji Linguistics (Volume 1: Long Papers), pages 1777–1788,
Araki. Extending the chain: humor and emotions in human 2017.
computer interaction. International Journal of Computa-
tional Linguistics Research, 1(3):116–125, 2010. Anton Nijholt. Embodied agents: A new impetus to humor
research. In The April Fools Day Workshop on Compu-
Kevin El Haddad, Sandeep Nallan Chakravarthula, and James tational Humour, volume 20, pages 101–111. In: Proc.
Kennedy. Smile and laugh dynamics in naturalistic dyadic Twente Workshop on Language Technology, 2002.
interactions: Intensity levels, sequences and roles. In
2019 International Conference on Multimodal Interaction, Bill Noble and Vladislav Maraev. Large-scale text pre-
pages 259–263, 2019. training helps with dialogue act recognition, but not with-
out fine-tuning. In Proceedings of the 14th International
Jonathan Ginzburg, Chiara Mazzocconi, and Ye Tian. Laugh- Conference on Computational Semantics - Short Papers,
ter as language. Glossa: a journal of general linguistics, Groningen, Netherlands, 2021.
5(1), 2020.
Magalie Ochs and Catherine Pelachaud. Socially aware
Jonathan Ginzburg. The Interactive Stance. Oxford Univer- virtual characters: The social signal of smiles [so-
sity Press, 2012. cial sciences]. IEEE Signal Processing Magazine,
J.-Y. Girard. Linear Logic: its syntax and semantics, page 30(2):128–132, Mar 2013.
1–42. London Mathematical Society Lecture Note Series. Cécile Petitjean and Esther González-Martínez. Laughing
Cambridge University Press, 1995. and smiling to manage trouble in french-language class-
room interaction. Classroom Discourse, 6(2):89–106,
Phillip Glenn. Laughter in Interaction. Cambridge University
2015.
Press, Cambridge, UK, 2003.
Fernando Poyatos. Paralanguage: A linguistic and interdis-
Kate S Hone and Robert Graham. Towards a tool for the ciplinary approach to interactive speech and sounds, vol-
subjective assessment of speech system interfaces (sassi). ume 92. John Benjamins Publishing, 1993.
2000.
Joshua Raclaw and Cecilia E Ford. Laughter and the man-
Christine Howes, Anastasia Bondarenko, and Staffan Lars- agement of divergent positions in peer review interactions.
son. Good call! Grounding in a Directory Enquiries Cor- Journal of Pragmatics, 113:1–15, 2017.
pus. In Proceedings of the 23rd Workshop on the Semantics
and Pragmatics of Dialogue, London, United Kingdom, Verena Rieser and Oliver Lemon. Reinforcement learning
sep 2019. SEMDIAL. for adaptive dialogue systems: a data-driven methodology
for dialogue management and natural language genera-
Gail Jefferson. On the organization of laughter in talk about tion. Springer Science & Business Media, 2011.
troubles. In Structures of Social Action: Studies in Con-
versation Analysis, pages 346–369. 1984. Emanuel A Schegloff and Harvey Sacks. Opening up clos-
ings. Semiotica, 8(4):289–327, 1973.
Kristiina Jokinen. Constructive dialogue modelling: Speech
interaction and rational agents, volume 10. John Wiley & Jérôme Urbain, Radoslaw Niewiadomski, Elisabetta Bevac-
Sons, 2009. qua, Thierry Dutoit, Alexis Moinet, Catherine Pelachaud,
Benjamin Picart, Joëlle Tilmanne, and Johannes Wagner.
D Jurafsky, E Shriberg, and D Biasca. Switchboard dialog Avlaughtercycle. J. Multimodal User Interfaces, 4(1):47–
act corpus. International Computer Science Inst. Berkeley 58, 2010.
CA, Tech. Rep, 1997.
Julia Vettin and Dietmar Todt. Laughter in conversation: Fea-
Staffan Larsson. Issue-based dialogue management. PhD tures of occurrence and acoustic structure. Journal of Non-
thesis, University of Gothenburg, 2002. verbal Behavior, 28(2):93–115, 2004.
43
Jason D Williams, Kavosh Asadi, and Geoffrey Zweig. Hy-
brid code networks: practical and efficient end-to-end di-
alog control with supervised and reinforcement learning.
arXiv preprint arXiv:1702.03274, 2017.
Steve Young, Milica Gašić, Simon Keizer, François Mairesse,
Jost Schatzmann, Blaise Thomson, and Kai Yu. The hid-
den information state model: A practical framework for
POMDP-based spoken dialogue management. Computer
Speech & Language, 24(2):150–174, 2010.
44