Non-humorous use of laughter in spoken dialogue systems Vladislav Maraev1∗ , Jean-Philippe Bernardy 1 and Christine Howes1 1 Centre for Linguistic Theory and Studies in Probability (CLASP), Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg {vladislav.maraev, jean-philippe.bernardy, christine.howes}@gu.se Abstract respectively asking for the phone numbers of certain named businesses. Half of the dialogues happened in a noisy envi- In this paper we argue that laughter, an ambigu- ronment, with many mishearings and laughs induced. This ous yet ubiquitous signal in everyday interactions, paper addresses the following research question: how can can act as an important feature for task-oriented these laughs be accounted for in a dialogue system, which dialogue systems. We show which components of implements a similar scenario? a dialogue system should be affected and modi- fied, and more specifically how particular types of (1) DEC:22_KL_loc2 laughter can be accounted for in a dialogue man- 56 Caller er the next one is er tanfield ager as instances of short answers, feedbacks and chambers vocalisations accompanying them. 57 Operator santias? 58 Caller tanfield like t- T A N 59 Operator sorry i don’t hear you again 1 Introduction please? Laughter is very frequent in everyday interactions, for in- 60 Caller er T A N stance, in the Switchboard Dialogue Act Corpus [Jurafsky 61 Operator C? et al., 1997] corpus laughter comes about every 200 words. 62 Caller tanfield Laughter is an ambiguous social signal, and in addition to 63 Operator A communicating joy and pleasure intuitively associated with 64 Operator N humour it also can communicate embarrassment, be used to 65 Caller yeah smooth and soften everyday interactions and also bear prag- 66 Caller and then field matic functions such as marking irony or usage of a word in 67 Operator and then seal? a specific sense [Poyatos, 1993; Mazzocconi, 2019; Ginzburg 68 Caller chambers et al., 2020]. 69 Operator sorry i hear you quite For a spoken dialogue system, laughter is an important sig- poorly nal to account for due to its contribution to the naturalness of 70 Operator let’s try again automated dialogue. Laughter can be used in chit-chat di- 71 Operator C? alogue due to its potential to build rapport and establish a 72 Caller yeah sorry the traffic is crazy para-social bond between the user and artificial agent. around here There have been attempts to produce laughs as a way to 73 Operator I know don’t worry mimic human behaviour and align with it [Urbain et al., 74 Operator so C 2010; El Haddad et al., 2019], as well as laughing avatars 75 Operator A mainly focussed on laughter as a reaction to jokes [Ochs and 76 Caller er Pelachaud, 2013; Ding et al., 2014]. In this paper we take a 77 Caller tanfield T like thomas rather different approach. We start from examples of usage of laughter in real task-oriented dialogue and then propose Let’s look at the first laughter (line 69). We can see that the ways how these behaviours can be reproduced in a dialogue operator’s question “and then seal?” (l.67) was not addressed system, and, more specifically, in its dialogue management and this piece of information was not grounded. “C?” (l.71) component. refers to the restart from the beginning (it was “Tanfield”, but The example (1) below is an excerpt from a role-play di- she has heard “C”). The negative feedback provided by the alogue collected by Howes et al. [2019] for their Directory operator (l.69) entails extra effort from the caller—she needs Enquiries Corpus (DEC) [Bondarenko et al., 2020]. Dialogue to restart her request from the beginning—this obligation is participants were playing the roles of a caller and an operator, somewhat intrusive and may require extra smoothing [Maz- zocconi, 2019; Raclaw and Ford, 2017]. For our purposes, we ∗ Contact Author will treat this laughter as accompanying negative feedback. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 33 For a dialogue system designer, this poses an empirical tors), goals and rules. KoS represents language interaction by question, namely, would it be useful to soften negative feed- a dynamically changing context. The meaning of an utterance back with laughter? For instance, the feedback associated is then how it changes the context. Compared to most ap- with a local failure (e.g. speech recognition failure), such as proaches, which represent a single context for both dialogue “Sorry, I didn’t understand” or “Sorry I didn’t hear you”. It participants, KoS keeps separate representations for each par- may also be useful where negative feedback is the result of ticipant, using the Dialogue Game Board (DGB). Thus, the an external query, for example, when something is not found information states of the participants comprise a private part in the database, and can accompany a system request to start and the dialogue gameboard that represents information aris- over, as in example (1). ing from publicised interactions. The DGB tracks, at least, The reaction to the apology also can be accompanied by shared assumptions/visual field, moves (= utterances, form laughter, as with the second laugh in (1) (l.73). We do not and content), and questions under discussion. think that these days users often apologise to a dialogue sys- In dialogue, especially in a dialogue with a machine which tem, as it is usually the dialogue system which is at fault, but involves uncertainty of automatic speech recognition (ASR) this might be different for special cases of systems that aim at and natural language understanding components (NLU), we more naturalistic behaviour. can not assume perfect communication. While communicat- In this paper we consider laughter from the utilitarian per- ing, especially over an unreliable communication channel, spective and attempt to determine which kinds of laughs can humans give each other evidence that their contributions are be relevant for dialogue systems. Next, we will look at laugh- understood to a certain extent, sufficient for current purposes. ter from the point of view of providing feedback, either posi- Clark [1996] and Allwood [1995] distinguish four levels of tive or negative. action related to different degrees of grounding. Here we list In Section 2 will start with a background on our approach them according to the action ladder [Clark, 1996], from the to dialogue, dialogue management and laughter. Next, Sec- hearer’s perspective. tion 3 presents a small typology of laughter types that we 1. Acceptance level determines whether the content of ut- think should be accounted for in a task-oriented dialogue sys- terance was accepted or rejected by the hearer. tem. In Section 4 we describe our own dialogue management framework and in Section 5 we show a formal account for the 2. Understanding level specifies whether the utterance aforementioned types of laughter. We conclude with a brief was understood by the hearer discussion of our findings and further laughter-related issues 3. Perception level determines whether the utterance was in Section 6. perceived by the hearer. 2 Background 4. Contact level determines whether interlocutors have es- tablished a channel of communication. 2.1 Dialogue The action ladder assumes that if the level above is com- A key aspect of dialogue systems is the coherence of the sys- plete, then all levels below are complete. For instance, if tem’s responses. In this respect, a key component of a dia- Bob asks “Do you like Paris” and Mary replies “Yes”, then logue system is the dialogue manager, which selects appro- Bob’s utterance is accepted (and also understood, perceived, priate system actions depending on the current state and the and their contact has been established). If she asks “Paris?” external context. then it might signal that Bob’s utterance was perceived but Two families of approaches to dialogue management can not understood (and thus not accepted). be considered: hand-crafted dialogue strategies [Allen et al., Larsson [2002] accounts for different levels of action 1995; Larsson, 2002; Jokinen, 2009] and statistical modelling within the IBiS2 dialogue management framework using a of dialogue [Rieser and Lemon, 2011; Young et al., 2010; set of rules to update the common ground represented in the Williams et al., 2017]. Frameworks for hand-crafted strate- information state of the system. He uses “Interactive Com- gies range from finite-state machines and form-filling to more munication Management” (ICM) moves [Allwood, 1995] as complex dialogue planning and logical inference systems, explicit signals concerned with communicating the updates to such as Information State Update (ISU) [Larsson, 2002] that the common ground, and sequencing moves, e.g. restarting a we employ here. Although there has been a lot of devel- dialogue. opment in dialogue systems in recent years, only a few ap- proaches reflect advancements in dialogue theory. Our aim is to closely integrate dialogue systems with work in theo- 2.2 Laughter retical semantics and pragmatics of dialogue. In this paper Our focus of attention towards laughter is motivated by its we do so by employing our own implementation of the KoS ubiquity in natural dialogue. In the British National Corpus, theoretical dialogue framework [Ginzburg, 2012] which we laughter is quite a frequent signal regardless of gender and discussed in [Maraev et al., 2020]. In this work we extend age—the spoken dialogue part of the British National Cor- our implementation with rudimentary support of grounding, pus (UK English, unscripted interactions that were recorded therefore allowing the implementation to be further extended by volunteers in various social settings, balanced for age, re- to support certain types of laughter. gion and social class) contains approximately one occurrence In KoS (and many other dynamic approaches to meaning), of laughter every 14 utterances. In the Switchboard Dialogue language is treated as a game, containing players (interlocu- Act corpus [Jurafsky et al., 1997] (US English, one-on-one 34 interactions over a phone where participants that are not fa- 3 Types of laughter miliar with each other discuss a potentially controversial sub- In this section we outline some types of laughter that can be ject, such as gun control or school system) non-verbally vo- of special interest to task-oriented dialogue systems and can calised dialogue acts (whole utterances that are marked as be accounted for within our proposed framework. non-verbal) constitute 1.7% of all dialogue acts and 65% of them contain laughter. Laughter tokens make up 0.5% of all 3.1 Laughter as a component of grounding the tokens that occur in Switchboard Dialogue Act corpus. As we have mentioned in Section 2, and in accord with All- wood [1995]; Clark [1996]; Larsson [2002] we consider four Laughter production in conversation is not exclusively re- action levels that are involved in a dialogue. Here we discuss lated to humour. But, perhaps unsurprisingly, the study of what can happen at each level of action—contact, perception, laughter has often been linked to the study of humour and understanding and reaction—with respect to laughter. the two terms are frequently used interchangeably. However, Contact and perception levels laughter does not occur only in response to humour or in order Troubles related to establishing and maintaining a stable com- to frame it. Many studies, particularly in conversation analy- munication channel can lead to laughter. One such example sis, have shown its crucial role in managing conversations at would be delays in communication, for instance over an unre- several levels: dynamics (turn-taking and topic-change), lex- liable network, which might lead to a person already speaking ical (signalling problems of lexical retrieval or imprecision in at the moment when the communication is only supposed to the lexical choice), pragmatic (marking irony, disambiguating be established. Obvious examples of such cases are caused meaning, managing self-correction) and social (smoothing by signal jitter over video conference platforms like Zoom. and softening difficult situations or showing (dis)affiliation) The lack of perception indicates things that haven’t been [Glenn, 2003; Jefferson, 1984; Mazzocconi, 2019; Petitjean heard correctly (cases similar to (1)). Also, it seems that in- and González-Martínez, 2015] terruptions or events related to that can be quite surprising and laughter can be a natural reaction to a surprise (see Section 6). There have been several approaches to classify types of laughter [e.g., Poyatos, 1993; Vettin and Todt, 2004; Mazzoc- Understanding level coni, 2019]. Mazzocconi [2019] claims that the most prob- The lack of pragmatic understanding relates to the kinds of lematic issue with existing taxonomies is that they mix types incongruities that are caused by the violation of the principle of laughter functions with types of laughter triggers, so she of conversational relevance. This is very useful for dialogue roots her proposal on the function of laughter and the proposi- systems because they are prone to errors in this realm. It is tional content of laughable—the argument the laughter pred- often the case that incorrect NLU or ASR can lead to priori- icates about, an event or state referred to by an utterance or tising irrelevant results (for example, in cases of out-of-scope exophorically [Glenn, 2003]. In this paper we look at laugh- user queries), which can cause user’s confusion and, there- ter not exclusively from a perspective of a taxonomy that can fore, laughter. This type of laughter can be treated as negative be used as a theoretical framework but from the utilitarian feedback. perspective, looking at which kinds of laughs can be relevant This accounts for the examples (2) and (3) below. [Lars- for dialogue systems. son, 2002] subdivides this level into three categories for the negative feedback (context-dependent, context-independent and pragmatic). The examples (2) and (3) above would re- Laughter as a way for an embodied conversational agent late to the pragmatic level of misunderstanding. (ECA) to provide emotional response has gained some atten- tion from the Affective Computing and other research com- (2) from the dialogue between a virtual assistant (Diana) munities. Becker-Asano and Ishiguro [2009] evaluated the and a person with ASD (Mark): role of laughter in the perception of social robots and indi- Mark Diana, what is money? cated that the situational context, determined by linguistic and Diana I am Diana, a virtual interlocutor. non-verbal cues (such as gaze) played an important role. Ni- Audience (laugh) jholt [2002] discusses the challenges of integrating humour (3) constructed example into ECAs, and existing integration of smiling and laughter Brian Would you like tea or coffee? in embodied conversational agents (ECA) is typically is trig- Katie yes gered by a joke told by a user or an agent [Ding et al., 2014; Brian (laughs) Ochs and Pelachaud, 2013]. El Haddad et al. [2019] looked at A dialogue system can also be unsure about what has been the mimicry of smiles and laughs between the interlocutors, understood. In such cases, the system should demonstrate which also might be used as the basis for ECA’s behaviour. a lower degree of commitment to what has been said as a Urbain et al. [2010] takes a similar perspective, equipping part of a display of understanding. For example, in the case ECAs with a capability to join its conversational partner’s of the feedback regarding the user input, when the system laugh. In this work we take a contrasting approach, look- repeats the input after the user, it can be useful to include ing at pragmatic functions of some types of laughter, namely laughter in verbatim repeats, which would mean: yes, I heard providing feedback and answering questions, and provide a (understood) this, but I might be wrong. This can also be formal account for such behaviour within a dialogue man- useful for a system’s actions taken based on low confidence agement framework. results. 35 Reaction (consider for acceptance) level within Statement-non-opinion the given DA Apology Downplayer On this level what has being understood can be either ac- cepted or rejected for the current purpose. Acceptance laugh- ter can typically be related to a reaction to humour, which is in previous utterance in next utterance by self out of the scope of the current paper, or apology (see next by self section). 0.16 0.12 Ginzburg et al. [2020] consider some uses of standalone 0.08 0.04 laughter as cases of negative response to a polar question (4) or a signal of disbelief in a previously uttered assertion (5). (4) From Ginzburg et al. [2020], context: Bayern München goalkeeper Manuel Neuer faces the press after his in previous utterance in next utterance team’s (Dreierkette—three-in-the-back) defence has by other by other proved highly problematic in the game just played (which they won 3-2 against Paderborn). Journalist: (smile) Dreierkette auch ‘ne Option? Figure 1: Comparison of the most common dialogue act in SWDA— (Is the three-at-the-back also “Statement-Non-Opinion” (33.27% of all utterances) with the di- alogue acts “Apology” (0.04%) and “Downplayer” (0.05%). The an option?) proportion of utterances that contain laughter are shown in associa- Manuel Neuer: fuh fuh fuh tion with each dialogue act. (brief laugh) (5) From Ginzburg et al. [2020] (biblical example rephrased as a dialogue) 162 Operator still not finding it God: You will at age 99 with your aged wife 163 Operator having problems with this one Sarah have a son. 164 Caller okay Abraham: (laughs) 165 Caller er maybe i can find → I don’t think I will at age 99 have a son 166 Caller er the place myself but thank you very much for the information In Section 5 we show how this kind of laughter as negative 167 Operator no problem sorry for not finding response like (4) can be handled by the dialogue manager. the the last one 168 Caller 169 Caller no worries 3.2 Laughter and intrusion 170 Caller thank you In natural dialogue, an intrusion is frequently associated with We also observe that laughter can clearly accompany the laughter. In the Switchboard Dialogue Act corpus (SWDA) asking for a favour by the same speaker. In example (7) the [Jurafsky et al., 1997] an Apology dialogue act is more re- operator asks the caller if they can start from the beginning, lated to laughter, as compared to other dialogue acts. In which can be treated as an intrusion of some sort, therefore Figure 1 we show how many dialogue acts are associated asking for a favour and the apology is accompanied by laugh- with utterances1 containing laughter, for the current dia- ter. logue act and for preceding and following utterances, de- pending on the speaker. In addition to an apology, we show its adjacency counterpart (second element of the utterance (7) DEC:24_LK_loc2 pair produced by the other speaker [Schegloff and Sacks, 59 Caller B as in bicycle 1973])—Downplayer—realised, for instance, by utterances 60 Operator yeah like “Don’t worry” or “It’s alright”. 61 Caller then you have R 62 Caller I In (6), the caller reacts with compassionate laughter to the 63 Operator R apology given by the operator. This is a similar instance of 64 Caller G laughter to one seen in (1): the second laugh shows that the 65 Operator I same reaction, as in (6) can be expected from the operator. 66 Operator okay sorry no- now i lost the track okay can we it start from the (6) DEC:16_HG_loc2 beginning sorry 67 Caller okay 68 Caller yes we can 69 Operator maybe you can just say the uh say 1 In SWDA each utterance is typically mapped to a single dia- words logue act. 70 Caller yeah no no problem 36 4 Dialogue manager architecture they are linear, these hypotheses can also be removed from We believe that it is crucial to use formal tools which are most the state. In particular, we have a fixed set of rules (they re- appropriate for the task: one should be able to express the main available even after being used). Each such rule ma- rules of various genres of dialogue in a concise way, free, nipulates a part of the information state (captured by its pre- to any possible extent, of irrelevant technical details. In the misses) and leaves everything else in the state alone. view of Dixon et al. [2009] this is best done by represent- Our dialogue manager (DM) models the information-state ing the information-state of the agents as updatable sets of of only one participant. Regardless, this participant can propositions. Very often, dialogue-management rules update record its own beliefs about the state of other participants. subsets (propositions) of the information state independently In general, the core of the DM is comprised of a set of linear- from the rest. A suitable and flexible way to represent such logic rules which depend on the domain of application. How- updates is as function types in linear logic. The domain of ever, many rules will be domain-independent (such as generic the function is the subset of propositions to update, and the processing of answers). We show examples of such rules in co-domain is the (new) set of propositions which it replaces. Section 4.4. By using well-known techniques which correspond well with the intuition of information-state based dialogue man- 4.2 Questions and answers agement, we are able to provide a fully working prototype of In this paper, the essential components of the representation the components of our framework: of a question are a type A, and a predicate P over A. Using a typed intuitionistic logic, we write: 1. a proof-search engine based on linear logic, modified A : Type P : A → Prop to support inputs from external systems (representing The intent of the question is to find out about a value x inputs and outputs of the agent) of type A which makes P x true, or at least entertained by 2. a set of rules which function as a core framework for the other participant. We provide several examples in Table dialogue management (in the style of KoS [Ginzburg, 1. It is worth stressing that the type A can be large (for ex- 2012]) ample asking for any location) or as small as a boolean (if 3. several examples which use the above to construct po- one requires a simple yes/no answer). We note in passing tential applications of the system. that, typically, polar questions can be answered not just by a boolean but by qualifying the predicate in question, for exam- 4.1 Linear rules and proof search ple, “maybe”, “on Tuesdays”, etc. (Table 1, last two rows). This is formalised by letting A = Prop → Prop. Typically, and in particular in the archetypal logic program- ming language prolog [Bratko, 2001], axioms and rules are 4.3 Representation of questions with expressed within the general framework of first-order logic. metavariables However, several authors [Dixon et al., 2009; Martens, 2015] have proposed using linear logic [Girard, 1995] instead. For In this subsection we show how a metavariable can represent our purpose, the crucial feature of linear logic is that hypothe- what is being asked, as the unknown in a proposition. A first ses may be used only once. use for metavariables is to represent the requested answer to In general, the linear arrow corresponds to destructive state a question. updates. Thus, the hypotheses available for proof search cor- Within the state of the agent, if the value of the requested respond to the state of the system. In our application, they answer is represented as a metavariable x , then the question will correspond to the information state of the dialogue par- can be represented as: Q A x (P x ). That is, the pending ticipant. question (Q denotes a question constructor) is a triple of a In linear logic, normally firing a linear rule corresponds to type, a metavariable x , and a proposition where x occurs. We triggering an action of an agent, and a complete proof cor- stress that P x is not part of the information state of the agent responds to a scenario, i.e. a sequence of actions, possibly yet, rather the fact that the above question is under discussion involving action from several agents. However, the informa- is a fact. For example, after asking “Where does John live?”, tion state (typically in the literature and in this paper as well), we have: corresponds to the state of a single agent. Thus, a scenario is conceived as a sequence of actions and updates of the in- haveQud : QUD (Q Location x (Live John x )) formation state of a single agent a, even though such actions Resolving a question can be done by communicating an can be attributed to any other dialogue participant b. (That is, answer. An answer to a question (A : Type; P : A → Prop) they are a’s representation of actions of b.) Scenarios can be can be of either of the two following forms: i) A ShortAn- realised as a sequence of actual actions and updates. That is, swer, which is a pair of an element X : A and its type A, rep- an action can result in sending a message to the outside world resented as ShortAnswer A X or ii) An Assertion which is (in the form of speech, movement, etc.). Conversely, events a proposition R : Prop, represented as Assert R. Therefore, happening in the outside world can result in extra-logical up- one way to process a short answer is by the processShort dates of the information state (through a model of the percep- rule: tory subsystem). In our implementation, we treat the information state as a processShort : (a : Type) → (x : a) → (p : Prop) → multiset of linear hypotheses that can be queried. Because ShortAnswer a x ( QUD (Q a x p) ( p 37 question A P reply x Where does Location λx .Live John x in London ShortAnswer Location London John live? Does John λx .if x then (Live John Paris) Bool yes ShortAnswer Bool True live in Paris? else Not (Live John Paris) What time is it? Time λx .IsTime x It is 5am. Assert (IsTime 5.00) Does John ShortAnswer (Prop → Prop) Prop → Prop λm.m (Live John Paris) yes live in Paris? (λx .x ) Does John ShortAnswer (Prop → Prop) Prop → Prop λm.m (Live John Paris) from January live in Paris? (λx .FromJanuary (x )) Table 1: Examples of questions and the possible corresponding answers. The type A is the type of possible short answers. The proposition P x is the interpretation of a short answer x . The x column shows the formal representation of a possible answer, either in short form or assertion form. Above we use Π type binders to declare (meta)variables participant. Regardless, this participant can record its own be- (written here (a : Type) →, (x : a) →, etc.). This termi- liefs about the state of other participants. In general, the core nology will make sense to readers familiar with dependent of the DM is comprised of a set of linear-logic rules which types. For others, such binders can be thought of as universal depend on the domain of application. However, many rules quantification (∀a, ∀x , etc.), the difference is that the type of will be domain-independent (such as the generic processing the bound variable is specified.2 of answers). We demand in particular that types in the answer and in To be useful, a DM must interact with the outside world, the question match (a occurs in both places). Additionally, and this interaction cannot be represented using logical rules, because x occurs in p, the information state will mention the which can only manipulate data which is already integrated in concrete x which was provided in the answer. For example, the information state. Here, we assume that the information if the QUD was (Q Location x (Live John x )) and the that comes from sources which are external to the dialogue system processes the answer ShortAnswer Location Paris, manager is expressed in terms of semantic interpretations of then x unifies with Paris, and the new state will include moves, and contains information about the speaker and the Live John Paris. addressee in a structured way. We provide 5 basic types of To process assertions, we can use the following rule: moves, specified with a speaker and an addressee, as an illus- tration: processAssert : (a : Type) → (x : a) → (p : Prop) → Assert p ( QUD (Q a x p) ( p Greet spkr addr CounterGreet spkr addr That is, if (1) p was asserted, and (2) the proposition q is Ask question spkr addr part of a question under discussion, and (3) p can be unified ShortAnswer vtype v spkr addr with q (we ensure this unification by simply using the same Assert p spkr addr metavariable p in both roles in the above rule), then the asser- tion resolves the question. Additionally, the metavariable x is These moves can either be received as input or produced as made ground to a value provided by p, by virtue of unification outputs. If they are inputs, they come from the NLU compo- of p and q. For example, “John lives in Paris” answers both of nent, and they enter the context with Heard : Move → Prop the questions “Where does John live?” and “Does John live predicate. For example, if one hears a greeting, the propo- in Paris?” (there is unification), but, not, for example, “What sition Heard (Greet S A) is added to the information time is it?” (there is no unification). Note that, in both cases state/context, without any rule being fired—this is what we (processAssert and processShort), the information state is mean by an external source. updated with the proposition posed in the question. If they are outputs, to be further used by the NLG com- ponent, some rule will place them in Agenda. For example, 4.4 Dialogue management to issue a counter greeting, a rule will place the proposition In this section we integrate our question/answering frame- (CounterGreet A S ) in the Cons-list Agenda part of the work within more complete dialogue manager (DM). We information state. stress that this DM models the information-state of only one Thereby each move is accompanied by the information about who has uttered it, and towards whom was it addressed. 2 The reader worried about any theoretical difficulty regarding All the moves are recorded in the Moves part of the partici- mixing linear and dependent types is directed to Atkey [2018] and pant’s dialogue gameboard, as a Cons-list (stack). Abel and Bernardy [2020]. Additionally, we record any move m which one has yet to 38 actively react to, in a hypothesis of the form Pending m. We on whether the fact is unique and concrete or not (defined by cannot use the Moves part of the state for this purpose, be- operators →! and →? respectively, see Maraev et al., 2020 cause it is meant to be static (not to be consumed). Pending for further details). thus allows one to make the difference between a move which is fully processed and a pending one. produceAnswer : Here we will provide a few examples of the rules which (a : Type) → (x : a) →! (p : Prop) → are implemented in our system, and we refer our reader to (qs : List Question) → [Maraev et al., 2020] for more detailed description. QUD (Cons (Q USER a x p) qs) ( p _ [_ :: Agenda (ShortAnswer a x SYSTEM USER); Examples _ :: QUD qs; We can show how basic move-adjacency can be defined in the _ :: Answered (Q USER a x p)] example of a counter greeting preconditioned by a greeting from the other party:3 4.5 Extending the dialogue manager with counterGreeting : (x y : DP ) → HasTurn x _ grounding strategies Agenda as ( Pending (Greet y x ) ( In this subsection we provide a sketch of basic grounding Agenda (Cons (CounterGreet x y) as) strategies and moves related to them, which will be further Another important rule accounts for pushing the content of used to model laughter. any received Ask move on top of the stack of questions under Dialogue systems deal with confidence scores from ASR discussion (QUD). and NLU components, which reflects the uncertainty in user queries. For simplicity we will represent the confidence pushQUD : (q : Question) → (qs : List Question) → score t in on the basis of three confidence threshold lev- (x y : DP ) → Pending (Ask q x y) ( els (T1 < T2 ), where RED would correspond to t < T1 , QUD qs ( QUD (Cons q qs) YELLOW to T1 < t < T2 , and GREEN to T2 < t. Colour- If the user asserts something that relates to the top QUD, coded confidence scores would accompany user moves, e.g. then the QUD can be resolved and therefore removed from the Ask move such as “What time is it?” can be represented the stack. The corresponding proposition p is saved as a as follows: PendingUserFact.4 The following rule5 is an extended di- Ask (Q U Time t0 (IsTime t0 )) U S YELLOW alogue management version of the rule previously introduced in Section 4.3. Here we illustrate the possibility of extending the system with Interactive Communication Management (ICM) moves processAssert : (a : Type) → (x : a) → (p : Prop) → and grounding strategies, replicating Larsson’s [2002] ac- (qs : List Question) → count for grounding and feedback. ICM moves are used for (dp dp1 : DP ) → Pending (Assert p dp1 dp) ( coordination of the common ground in dialogue, which ex- QUD (Cons (Q dp a x p) qs) ( presses, for instance, explicit signals for integrating the in- [_ :: PendingUserFact p; _ :: QUD qs ] coming information and updating the common ground (dia- Then, other rules will take into account the logue gameboard in our implementation). The basic type for PendingUserFact p in a system-specific way. In the the ICM move is the following: simplest case, the system may treat p as a true proposition. (In this paper we will consider meta-level pending user facts ICM level polarity content instead.) where level corresponds to the level of grounding (contact, Short answers are processed in a very similar way to asser- perception, understanding, acceptance), polarity is either tions: positive or negative, and the optional value content corre- processShort : (a : Type) → (x : a) → (p : Prop) → sponds to a component of the common ground in question. (qs : List Question) → (dp dp1 : DP ) → For instance, the move (ICM Per Neg None) would corre- Pending (ShortAnswer a x dp1 dp) ( spond to the utterance “I didn’t understand what you said” or QUD (Cons (Q dp a x p) qs) ( “Pardon”, and the move (ICM Und Pos q) can be realised [_ :: PendingUserFact p; _ :: QUD qs ] as the utterance “You are asking me what time is it” if the QUD q corresponds to the question from Ask move exempli- If the system has a fact p in its database it can produce an fied above. answer or a domain-specific clarification request depending Next, we modify our basic pushQUD rule defined in Sec- 3 tion 4.4 to support different system behaviours depending on Taking a linear argument and producing it again is a common pattern, which can be spelled out A ( (A ⊗ P ). From here on we the confidence score. In the GREEN case, question from use the syntactic sugar A _ P for it. the user Ask move is being integrated into QUD, and ICM 4 For the current purposes we only remove the top QUD, but in a move displaying positive acceptance feedback, i.e. “okay”, more general case we can implement the policy that can potentially (ICM Acc Pos None) is being put on the Agenda. In resolve any QUD from the stack. the YELLOW case, system should additionally report about 5 positive understanding, e.g. “You want to know about time”, Note the use of the single colon (:) for metavariables and the double colon for information-state hypotheses (::). so it adds (ICM Und Pos q) move on the Agenda. 39 pushQUDGreen : (q : Question) → queries with more arguments can be resolved in shorter ut- (qs : List Question) → (x y : DP ) → terance depending on the arguments that are made ground. Pending (Ask q x y GREEN ) ( Agenda as ( For instance, in a context of interaction at a food kiosk: QUD qs ( [_ :: QUD (Cons q qs); ICM Und Pos _ :: Agenda (Cons (ICM Acc Pos None) as); ] (QuestionIsNot (Q U (Prop → Prop) m0 (m0 WantOlives)) pushQUDYellow : (q : Question) → could become a simple “Sorry, let’s forget olives.”. (qs : List Question) → (x y : DP ) → Pending (Ask q x y YELLOW ) ( Agenda as ( 5 Formal treatment of certain types of QUD qs ( [_ :: QUD (Cons q qs); laughter _ :: Agenda (Cons (ICM Acc Pos None) 5.1 Laughter as a rejection signal (Cons (ICM Und Pos q) as)); ] Laughter as a reaction to interrogative feedback in the case For RED confidence score, the system issues an interroga- of low confidence ASR/NLU result can be illustrated by the tive ICM query, such as “I understood you’re asking me about following dialogue. the time, is that correct?”. In this case a special type of QUD U: I would like to Ask q is introduced, namely a question about whether question q is order a vegan bean correctly understood. burger. icmINTConfirm : (q : Question) → (x y : DP ) → S: I understood you’d ICM Und Int q (8) Pending (Ask q x y RED) ( Agenda as ( like to order a beef QUD qs ( burger. Is that [_ :: QUD (Cons (Q Bool x correct? (if x then UND q U: HAHAHA ShortAnswer Bool False else UNDN q)) qs); Here we can treat laughter as a short negative answer, sim- _ :: Agenda (Cons (ICM Und Int q) as)] ilar to “No”. In the case of interrogative ICM move, such an Processing answers related to such a type of QUD will be answer can be processed using the icmINTneg rule defined done as usual. For instance, a short “yes” or “no” will be above. treated here as a boolean, and depending on the answer the This can be treated as a recovery strategy for different sys- context will contain either PendingUserFact (UND q) or tem outputs not desired by dialogue system designers. This PendingUserFact (UNDN q). approach can be extended to other cases of user feedback, In this sketch implementation, we do not care about confi- for instance, to cover the cases with higher confidence score dence scores for these answers, leaving it underspecified, but where the system produces ICM Und Pos q move, but this further, more specific dialogue rules are possible. is out of the scope of the current paper. Regardless of the particular answer, once the ICM question Returning to the more sophisticated (4), it can be handled is answered, it is removed from the QUD stack, so that to of by our generic rules for integrating QUDs (pushQUD). For the QUD stack is restored to the originally asked question. that we need to consider polar questions as expecting an an- In our system, this is taken care of by the generic handling of swer of Prop → Prop type (see Table 1). Recalling the ex- ShortAnswer s. Thus, in the case of a positive answer to such ample: a query, there is nothing particular to do. Journalist: (smile) Dreierkette auch ‘ne Option? In the negative case, the ICM move about the understand- (Is the three-in-the-back also ing that the question was not q is issued. (4) an option?) icmINTneg : (q : Question) → (x y : DP ) → Manuel Neuer: fuh fuh fuh (c : Confidence) → (brief laugh) PendingUserFact (UNDN q) ( and a type for question: Agenda as ( A : Type P : A → Prop Agenda (Cons In this case, (ICM Und Pos (QuestionIsNot q)) as) A = Prop → Prop How ICM moves are converted to natural language ut- P = λm.m IsOptionDreierkette terances, depending on q, is a natural language generation (NLG) issue. For instance, The brief laughter by Manuel Neuer can be represented as: ICM Und Pos J fuhfuhfuhK = ShortAnswer (QuestionIsNot (Prop → Prop) (λx .Laughable x ) (Q U Time t0 (IsTime t0 ))) where the modification of the proposition, resulting in can become the (rather tedious) utterance “So, you are not (Laughable IsOptionDreierkette) has a very basic mean- asking me what time it is”, whereas more sophisticated ing: this proposition is the laughable, without being more 40 specific about the laughter function. One can also consider In (9) the caller experiences issues with coming up with pho- being more specific, simply treating laughter as a negation netic spellings for certain words. The first laugh (line 27) (ShortAnswer (Prop → Prop) (λx .Not x )), but in general deserves attention, as it seems that it reflects on both pleas- laughter has a more nuanced meaning. ant incongruity and social one (smoothing), according to the taxonomy of [Mazzocconi, 2019]. The pleasant incongruity 5.2 Laughter which accompanies feedback is due to the fact that the phonetic spelling of “U” as in “un- Laughter can act as a part of ICM moves’ realisation per- der” is incongruous with the preceding ones: a preposition formed by natural language generation (NLG) component. It vs. proper nouns. The way to spell things phonetically is seems to us that, in particular, ICM moves the use of laugh- typically culturally specific, with the most typical cases of ter can be considered “safe”. For instance, ICM move of the cities or countries. Stereotypes and conversational conven- form (ICM Und Pos (QuestionIsNot (Q U (Prop → tions can be expressed with the formal notions of enthymemes Prop) m0 (m0 WantOlives))) can be realised as a natu- and topoi, following the work of Breitholtz [2020] on rea- ral language utterance like “Okay, let’s forget olives, hehe”, soning in conversation. Breitholtz and Maraev [2019] used whereas laughter is used as a smoothing device to mitigate these notions to analyse conversational humour as well as the awkwardness of system failure. Larsson [2002] often canned jokes, and we find it potentially helpful to be inte- included an apology “Sorry” in some of the ICM moves, grated into our framework in order to account for humour in e.g. “Sorry, I didn’t understand that”. With some possible dialogue systems. Dybala et al. [2010] emphasises the impor- caveats, we can sometimes include slight laughter in such tance of the “two-stage” approach to humour in dialogue sys- moves, especially if a system is getting a bit repetitive and tems, where the system tracks the emotional state of the user, produces (ICM Und Neg) too often. Considering the evi- produces humour as a reaction to certain states and analyses dence for laughter often accompanying apology (as a separate user’s further emotional reaction. dialogue act) presented in Section 3.2, this can mimic natural 6.2 Surprise behaviour in dialogue. Intuitively, laughter is related to events that are not expected in interaction. One of the ways to establish some degree of 6 Discussion and future work natural behaviour for a dialogue system would be to react sin- In this paper we have shown how some types of laughter can cerely to these kinds of surprising events. A possible measure be accounted for in task-oriented spoken dialogue system. We for a system’s surprisal is how confused it is by the user in- proposed our own proof-theoretic architecture of a dialogue put. A natural measure for this from information theory is manager based on KoS framework and extended it with some perplexity, a probability-based metric. For N words in an grounding strategies. Based on this, we have shown how cer- evaluation set W = w1 w2 . . . wN , the average perplexity per tain types of laughter, can be processed within the dialogue word is computed as follows: manager and natural language generator, namely: laughter as v a negative feedback, laughter as a negative answer to a po- uN uY 1 N lar question and laughter as a signal accompanying system P P (W ) = t (1) feedback. i=1 P (wi | w1 . . . wi−1 ) In the following subsections we discuss several issues re- Given a language model, we can employ a threshold de- lated to laughter in spoken dialogue systems, but only merely fined by perplexity which the system can use to act as being touching the main subject of the paper. surprised, e.g. by saying “Ha-ha, I did not expect this!” Similarly, perplexity can be inferred from tracking a dia- 6.1 Humour logue state in a Dialogue State Tracking task [Mrkšić et al., We start with humour, which is usually considered in relation 2017], which is a common task in statistical approaches to di- to jokes generated by dialogue system, but here we present alogue system. Or, following Noble and Maraev [2021], the more subtle incongruities related to humour in task-oriented RNN trained on a large dialogue corpus as a representation of dialogue. dialogue context can be used to calculate perplexity. Laughter as a reaction of surprise can relate to the levels (9) DEC:28_NM_loc2 17 Caller okay so it starts with a of feedback, for example, a user surprised by a pragmatically 18 Caller L incoherent system’s reply can laugh (Section 5.1). But here 19 Operator L? surprise is taken in isolation, as a measure on its own right. 20 Caller as in london 6.3 Awkwardness and time-saving 21 Operator yes 22 Caller A as in america In (9), “under” is produced after a long pause (l.25) and 23 Operator america therefore indicates awkwardness in producing the phonetic 24 Caller er U spelling made the operator wait—therefore making the situ- 25 Caller as in er ((pause: 1.2s)) ation uncomfortable to the caller, so laughter was used for 26 Caller er under smoothing it. 27 Caller In the follow-up excerpt (10) from the same dialogue, 28 Operator under yes user’s awkwardness continues and she accompanies it with 41 laughter. Firstly, she laughs (l.139) demonstrating that she goes for system’s laughter as an appropriate reaction to con- has given up finding any phonetic spelling for “K”, releasing versational humour. the turn and allowing the operator to carry on. Her second Another portion of the features can be evaluated only sub- laugh smooths her slight embarrassment after the situation jectively, for example, it is a question of user preference was resolved by the operator. whether it is okay for a system to accompany asking for a favour (e.g. “Let’s start over!”) with laughter. For this pur- (10) DEC:28_NM_loc2 pose, we can employ subjective evaluation methods such as 134 Caller O for oslo more task-oriented SASSI [Hone and Graham, 2000] or the 135 Operator O for oslo more chatterbot-oriented methodology proposed by Dybala 136 Caller again O for oslo et al. [2009], which was used for humour-equipped chatbots. 137 Operator O for oslo We optimistically expect that characteristics such as natural- 138 Caller and K for er ((pause: 1.6s)) ity and likeability would increase and annoyance would de- 139 Caller crease. 140 Operator as in king? 141 Caller k- king yeah 142 Operator yes Acknowledgments 143 Caller thank you The research reported in this paper was supported by grant 144 Operator that’s it? 2014-39 from the Swedish Research Council, which funds 145 Caller that’s it the Centre for Linguistic Theory and Studies in Probabil- ity (CLASP) in the Department of Philosophy, Linguistics, We can hypothesise that in a dialogue system these exam- and Theory of Science at the University of Gothenburg. In ples can be handled as follows. For a system, there are op- addition, we would like to thank Staffan Larsson, Jonathan erations which the developer knows are going to take time Ginzburg and our anonymous reviewers for their useful com- due to technical constraints, but are expected to be immedi- ments. ate by the user. In this case, a system can produce a similar behaviour to the one in (9) (l.25–27): “er. . . (pause) [comes up with an answer] ”. A system can detect the pat- References terns of filled pause + from the user and treat them Andreas Abel and Jean-Philippe Bernardy. A unified view of as turn-release cues. It can be a signal of either that there is modalities in type systems. Proceedings of the ACM on something that confused the user, or that she genuinely could Programming Languages, 4(ICFP), 2020. not come up with an answer due to certain difficulties. The James F Allen, Lenhart K Schubert, George Ferguson, Peter downplayer dialogue act (e.g. “don’t worry”) or laughter in Heeman, Chung Hee Hwang, Tsuneaki Kato, Marc Light, response also can be appropriate as system feedback in such Nathaniel Martin, Bradford Miller, Massimo Poesio, et al. a situation. We consider these ideas as a subject for further The TRAINS project: A case study in building a conver- empirical investigations. sational planning agent. Journal of Experimental & Theo- Laughter related to smoothing retrieval difficulties can be retical Artificial Intelligence, 7(1):7–48, 1995. indicative. Consider the case of language tutoring. In the Anki “flashcard” app, the system provides users with a word Jens Allwood. An activity based approach to pragmatics. in one language on the front side of the card and the user 1995. should provide a translation. The user then gets the correct Robert Atkey. Syntax and semantics of quantitative type the- response from the back of the card and evaluates her own ory. In Proceedings of the 33rd Annual ACM/IEEE Sym- response (was this card Hard, Good or Easy to recall). If posium on Logic in Computer Science, LICS 2018, Oxford, we consider making a similar conversational app, indications UK, pages 56–65, 2018. of retrieval issues—filled pauses (“er em. . . ”) and follow-up Christian Becker-Asano and Hiroshi Ishiguro. Laughter in smoothing by laughter—can lead to the decision to flag this social robotics-no laughing matter. In Intl. Workshop on card as “Hard” and provide corresponding feedback (11). Social Intelligence Design, pages 287–300. Citeseer, 2009. S What is the Swedish for donkey? Anastasia Bondarenko, Christine Howes, and Staffan Lars- U er em . . . åsna?.. son. Directory enquiries corpus, Feb 2020. (11) S Yes, that was tough, but it is correct! (system marks the card as “Hard”) Ivan Bratko. Prolog programming for artificial intelligence. Pearson education, 2001. 6.4 Approaches to evaluation Ellen Breitholtz and Vladislav Maraev. How to put an ele- Each of the aforementioned improvements has to be a sub- phant in the title: Modeling humorous incongruity with ject for evaluation within the dialogue system. We expect to topoi. In Proceedings of the 23rd Workshop on the Seman- see that these improvements will be reflected in the following tics and Pragmatics of Dialogue - Full Papers, London, evaluation criteria. United Kingdom, September 2019. SEMDIAL. Some of the improvements would fall into an objective Ellen Breitholtz. Enthymemes and Topoi in Dialogue: The checklist-style criteria, like being able to understand laugh- Use of Common Sense Reasoning in Conversation. Brill, ter as negative feedback, or as a signal of surprise. The same Leiden, The Netherlands, 2020. 42 Herbert H Clark. Using language. Cambridge university Vladislav Maraev, Jean-Philippe Bernardy, and Jonathan press, 1996. Ginzburg. Dialogue management with linear logic: the role of metavariables in questions and clarifications. Yu Ding, Ken Prepin, Jing Huang, Catherine Pelachaud, and Traitement Automatique des Langues (TAL), 61(3):43–67, Thierry Artières. Laughter animation synthesis. In Proc. 2020. AAMS 2014, pages 773–780. International Foundation for Autonomous Agents and Multiagent Systems, 2014. Chris Martens. Programming Interactive Worlds with Linear Logic. PhD thesis, Carnegie Mellon University Pittsburgh, Lucas Dixon, Alan Smaill, and Tracy Tsang. Plans, actions PA, 2015. and dialogues using linear logic. Journal of Logic, Lan- guage and Information, 18(2):251–289, 2009. Chiara Mazzocconi. Laughter in interaction: semantics, pragmatics and child development. PhD thesis, Université Pawel Dybala, Michal Ptaszynski, Rafal Rzepka, and Kenji de Paris, 2019. Araki. Subjective, but ot worthless-on-linguistic features of chatterbot evaluations. In 6th IJCAI Workshop on Nikola Mrkšić, Diarmuid Ó Séaghdha, Tsung-Hsien Wen, Knowledge and Reasoning in Practical Dialogue Systems, Blaise Thomson, and Steve Young. Neural belief tracker: page 87. Citeseer, 2009. Data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Pawel Dybala, Michal Ptaszynski, Rafal Rzepka, and Kenji Linguistics (Volume 1: Long Papers), pages 1777–1788, Araki. Extending the chain: humor and emotions in human 2017. computer interaction. International Journal of Computa- tional Linguistics Research, 1(3):116–125, 2010. Anton Nijholt. Embodied agents: A new impetus to humor research. In The April Fools Day Workshop on Compu- Kevin El Haddad, Sandeep Nallan Chakravarthula, and James tational Humour, volume 20, pages 101–111. In: Proc. Kennedy. Smile and laugh dynamics in naturalistic dyadic Twente Workshop on Language Technology, 2002. interactions: Intensity levels, sequences and roles. In 2019 International Conference on Multimodal Interaction, Bill Noble and Vladislav Maraev. Large-scale text pre- pages 259–263, 2019. training helps with dialogue act recognition, but not with- out fine-tuning. In Proceedings of the 14th International Jonathan Ginzburg, Chiara Mazzocconi, and Ye Tian. Laugh- Conference on Computational Semantics - Short Papers, ter as language. Glossa: a journal of general linguistics, Groningen, Netherlands, 2021. 5(1), 2020. Magalie Ochs and Catherine Pelachaud. Socially aware Jonathan Ginzburg. The Interactive Stance. Oxford Univer- virtual characters: The social signal of smiles [so- sity Press, 2012. cial sciences]. IEEE Signal Processing Magazine, J.-Y. Girard. Linear Logic: its syntax and semantics, page 30(2):128–132, Mar 2013. 1–42. London Mathematical Society Lecture Note Series. Cécile Petitjean and Esther González-Martínez. Laughing Cambridge University Press, 1995. and smiling to manage trouble in french-language class- room interaction. Classroom Discourse, 6(2):89–106, Phillip Glenn. Laughter in Interaction. Cambridge University 2015. Press, Cambridge, UK, 2003. Fernando Poyatos. Paralanguage: A linguistic and interdis- Kate S Hone and Robert Graham. Towards a tool for the ciplinary approach to interactive speech and sounds, vol- subjective assessment of speech system interfaces (sassi). ume 92. John Benjamins Publishing, 1993. 2000. Joshua Raclaw and Cecilia E Ford. Laughter and the man- Christine Howes, Anastasia Bondarenko, and Staffan Lars- agement of divergent positions in peer review interactions. son. Good call! Grounding in a Directory Enquiries Cor- Journal of Pragmatics, 113:1–15, 2017. pus. In Proceedings of the 23rd Workshop on the Semantics and Pragmatics of Dialogue, London, United Kingdom, Verena Rieser and Oliver Lemon. Reinforcement learning sep 2019. SEMDIAL. for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language genera- Gail Jefferson. On the organization of laughter in talk about tion. Springer Science & Business Media, 2011. troubles. In Structures of Social Action: Studies in Con- versation Analysis, pages 346–369. 1984. Emanuel A Schegloff and Harvey Sacks. Opening up clos- ings. Semiotica, 8(4):289–327, 1973. Kristiina Jokinen. Constructive dialogue modelling: Speech interaction and rational agents, volume 10. John Wiley & Jérôme Urbain, Radoslaw Niewiadomski, Elisabetta Bevac- Sons, 2009. qua, Thierry Dutoit, Alexis Moinet, Catherine Pelachaud, Benjamin Picart, Joëlle Tilmanne, and Johannes Wagner. D Jurafsky, E Shriberg, and D Biasca. Switchboard dialog Avlaughtercycle. J. Multimodal User Interfaces, 4(1):47– act corpus. International Computer Science Inst. Berkeley 58, 2010. CA, Tech. Rep, 1997. Julia Vettin and Dietmar Todt. Laughter in conversation: Fea- Staffan Larsson. Issue-based dialogue management. PhD tures of occurrence and acoustic structure. Journal of Non- thesis, University of Gothenburg, 2002. verbal Behavior, 28(2):93–115, 2004. 43 Jason D Williams, Kavosh Asadi, and Geoffrey Zweig. Hy- brid code networks: practical and efficient end-to-end di- alog control with supervised and reinforcement learning. arXiv preprint arXiv:1702.03274, 2017. Steve Young, Milica Gašić, Simon Keizer, François Mairesse, Jost Schatzmann, Blaise Thomson, and Kai Yu. The hid- den information state model: A practical framework for POMDP-based spoken dialogue management. Computer Speech & Language, 24(2):150–174, 2010. 44