Silence, Please! Interrupting In-Car Phone Conversations Soledad López Gambino1 , Casey Kennington2 , and David Schlangen1 1 CITEC, Bielefeld University, Universitätsstraße 25, Bielefeld, Germany 2 Boise State University, 1910 University Dr., Boise, Idaho, USA Abstract. Holding phone conversations while driving is dangerous not only be- cause it occupies the hands, but also because it requires attention. Where driver and passenger can adapt their conversational behavior to the demands of the situ- ation, and e.g. interrupt themselves when more attention is needed, an interlocu- tor on the phone cannot adjust as easily. We present a dialogue assistant which acts as ‘bystander’ in phone conversations between a driver and an interlocutor, interrupting them and temporarily cutting the line during potentially dangerous situations. The assistant also informs both conversation partners when the line has been cut, as well as when it has been reestablished. We show that this intervention improves drivers’ performance in a standard driving task. Keywords: in-car dialogue · driver distraction · cell phone · interruptions 1 Introduction Talking on the phone while driving introduces risks which may result in accidents [11,14] and research has also shown that cognitive load is higher when talking on a phone than when talking to a passenger [2]. This difference seems to correlate with co- location: Whereas passengers are aware of the surroundings and can adapt their speech to the demands of the driving situation, a non-situated interlocutor in a telephone con- versation does not have enough information to perform this type of adjustment. It has been suggested that this lack of situational awareness can be addressed, to some extent, by providing the interlocutor with real-time visual information of the driving situation [10]. This kind of telepresence results in speech which interferes less with the task of driving and it could be described as a way of “bringing the interlocutor into the scene”. Other efforts have focused on the potential usefulness of interrupting conversations when the circumstances on the road require it. In a Wizard-of-Oz experiment, [5] ex- plored the effects of putting phone calls on hold while the driver needs to perform a more demanding maneuver, as well as of uttering spoken alerts about upcoming situ- ations. The latter alerts proved effective in reducing errors when turning left/right. In the area of human-system dialogue, [8] implemented an information-providing system which interrupts its speech in the case of a demanding driving situation and resumes once the situation has passed. This system not only reduced impact on driving perfor- mance in comparison to a system which did not pause its speech, but it also enabled drivers to better remember the information presented by the system. 2 In the light cast by these studies, we explore the effects of employing an actual system to achieve this adaptive dynamics in human/human phone conversations: a “by- stander” agent that interrupts the conversation in potentially dangerous situations. Sub- sequently, we test the impact of this system on performance in a driving task. 2 Method 2.1 The System We developed an Interrupting Dialogue Assistant (IDA) which mediates between two participants in a telephone conversation when required by the driving situation. Inter- ruption is triggered by a signal from a component that evaluates the driving situation and judges that the undivided attention of the driver is needed. This immediately cuts the audio line between driver and caller (D and C, respectively). The system then in- forms D and C about this, as described below. Until it receives a signal indicating that the situation is clear again, the system periodically re-informs C about the line status, and also suppresses any attempts to speak. Finally, it notifies D and C when the line is open again. The states of the system are shown in Figure 1 in a diagram, with events that trig- ger transitions, as well as the actions performed by the system at each one of these transitions. Fig. 1. Interaction states Informing the Caller The system informs the caller of the state of the interaction (line open / closed) through a set of utterances which are synthesized with MaryTTS3 [13]. Corresponding to the states of the system, there are four types of system acts (see Table 2.1 for some example utterances): – Interrupting the conversation: As soon as the line is cut, the IDA informs C by stating the need for a pause in the dialogue and/or the fact that the driver is busy. – Asking for more time: While the audio line between both participants remains cut, the IDA regularly reminds C to continue waiting, in order to avoid long periods of silence and ensure clarity about the state of the line. – Preventing the caller from speaking: If C speaks at any time while the line is cut, the assistant detects this through Voice Activity Detection (VAD) and informs C of the need to wait for a few more seconds. 3 http://mary.dfki.de/index.html 3 – Resuming the conversation: On receiving the appropriate signal, the system an- nounces that the line is open again. Table 1. System utterances informing the caller about the state of the interaction (German original and English translation) Interrupting the conversation (interruption prompt) Das Gespräch muss einige Sekunden unterbrochen The conversation has to be interrupted for a few werden. seconds. Ihr Gesprächspartner ist gerade wieder beschäftigt. Your conversational partner is busy again. Diese Unterhaltung muss nochmal kurz pausiert wer- This conversation needs to be paused briefly den. again. Asking for more time (wait prompt) Bitte eine Sekunde mehr Geduld. Please be patient for one more second. Ihr Gesprächspartner kann Sie noch nicht hören. Your conversational partner can’t hear you yet. Die Leitung ist bald wieder offen. The line will soon be reconnected. Preventing the caller from speaking (stay-quiet prompt) Moment bitte. One moment, please. Noch nicht. Not yet. Bitte warten. Please wait. Resuming the conversation (resumption prompt) Sie können weiter sprechen. You can go on speaking. Jetzt kann der Fahrer wieder hören The driver can now hear you again. Die Unterhaltung geht jetzt weiter The conversation now continues. Informing the Driver The system also provides information to the driver, although it does so in a different way. Interruption of the audio line is communicated through a bell sound instead of verbally, as we considered that additional speech would be more distracting than a sound [3]. Once drivers have finished maneuvering and are able to resume the conversation, the system produces a short utterance such as Los geht’s ("Off we go"). 2.2 Tasks Driving Task To test driving performance, we use a variant of the standard Lane Change Task (LCT) [4], implemented in a driving simulation environment (OpenDS4 ). This task consists in reacting to a signal positioned on a gate above the road. The driver, otherwise instructed to stay in the middle lane of a straight five-lane road, must move to the lane indicated by the light, remain there until a tone is sounded, and then re- turn again to the middle lane. Following [6], we introduce an extra level of difficulty, by instructing drivers to perform lane changes at a speed of 60 km/h while the default 4 http://www.opends.eu 4 driving speed was 40 km/h and the maximum possible speed was 70 km/h. The driving equipment consisted of a 40-inch 16:9 screen and a Thrustmaster PC Racing Wheels Ferrari GT Experience steering wheel and pedal. Fig. 2. Lane change signal as presented on screen; the green light above the far right lane informs the driver to move into that lane. Speaking Task To ensure that a lively, continuous conversation would take place be- tween our experiment participants (driver and caller), we instructed them to engage in a role play activity. They were provided with discussion topics beforehand, and instructed to express opposing opinions about them, i.e. to contradict each other. Discussion topics were selected to be related to the experience and interests of the subject population (uni- versity students) and to be engaging but not extremely sensitive. The caller was given responsibility for the flow of the discussion and instructed to keep it as entertaining as possible and to switch between topics when necessary. 2.3 Experiment Structure and Conditions To effectively evaluate our hypothesis that an assistant such as the one described above would result in better driving, we designed three experimental conditions: NO-TALK Driving only (control condition; including lane changes), no conversation UNINTERRUPTED Simultaneous driving (including lane changes) and conversation. INTERRUPTED Simultaneous driving and conversation, but the latter gets interrupted as soon as a lane change is announced and resumed when this maneuver is completed. The conditions were presented in blocks, as shown in Figure 3. The first block was always NO-TALK, whereas the order of the second and third conditions varied: For half of the participants, the second block corresponded to the UNINTERRUPTED condition and the third one, to the INTERRUPTED condition whereas, for the other half, these two blocks were inverted. Each of the three blocks lasted approximately 10 minutes and was made up of 11 trials: three practice trials and eight experiment trials. Only experiment trials are considered in the results. 5 The approximate duration of a whole experiment was 40 minutes. Before each phase, participants were given instructions. After completion of all three phases, they filled out a questionnaire. Finally, they swapped roles (the caller became the driver and viceversa) and the whole process was repeated. Fig. 3. Experiment stages 2.4 Participants and Setup Sixteen subjects participated in the study, which results in eight pairs participating twice each (due to role-swapping). Driver and caller were placed in two separate rooms, and audio was sent between them through networked computers via Robotic Service Bus [RSB] [15].5 All participants were students between 20 and 29 years of age and native speakers of German. Ten were female and six male. All of them except for one had a driver’s license. The interrupter was developed using the control component of OpenDial [9] incor- porated into InproTK6,7 [1,7], implementing the state machine described above. 3 Results The total number of trials recorded was 528: eleven in each of the three conditions, for each of the 16 participants. We excluded training trials from the analysis of driv- ing performance, which left us with 384. In addition, it was necessary to exclude some episodes where the driver never reached the target lane, since this made it impossible to calculate lane-changing time. This resulted in 365 trials useable for analysis. Further- more, given that two of the road lanes are adjacent to the middle whereas the other two (the external lanes) are not, some further episodes had to be excluded in order to ensure an equal number of changes to adjacent and non-adjacent lanes in all conditions. This left us with 342 trials: 114 for each condition, out of which 64 were changes to adjacent lanes and 50, to non-adjacent ones. 5 https://code.cor-lab.org/projects/rsb 6 http://opendial.googlecode.com 7 https://bitbucket.org/inpro/inprotk 6 3.1 Interruptions There were eleven interruptions for each driver-caller pair: Three correspond to the training phase and eight, to the experiment trials. Out of a total of 176 interruptions for the 16 participants, the driver was speaking at the moment of the interruption in 75 instances and the caller, in 88; both were speaking simultaneously in three cases, and both were silent in 10 cases. From the moment when callers started hearing the inter- ruption prompt, it took them an average of 1.01 seconds to stop talking (SD 1.07). In addition, the caller spoke during the interrupted phase and was told by the system to wait in 26 instances. When callers were interrupted, they left the ongoing word incom- plete in 22.7% of the cases, finished the word but left an incomplete syntactic clause in 34.1%, and produced a full clause in 43.2%. The mean duration of the interrupted periods (from the interruption prompt to the resumption prompt) was 11.321 seconds (SD 0.637); this was, of course, subject to how fast the driver was able to complete the maneuver. 3.2 Driving Performance For every trial, we calculated lane changing time, which we defined as the time from the moment the lane changing signal appears until the driver reaches the target lane. Lane changing times were almost half a second shorter for the INTERRUPTED con- dition (4.059 s., SD 1.349) than for the NON-INTERRUPTED condition (4.552 s., SD 1.646), i.e. drivers were able to complete the change faster when the interrupter was em- ployed. This difference is significant (t-test, t(15) = 3.37, p< 0.01). On the other hand, no statistically significant difference was found between lane changing times in the INTERRUPTED and in the NO-TALK (4.381 s., SD 1.423) conditions, which shows performance when the interrupter was employed was as fast as during the driving-only task, in which no speech was involved. These results can be interpreted as suggesting that our interrupting assistant enabled drivers to complete lane changes sooner by grant- ing them the possibility to concentrate only on the driving, which would constitute an advantage in real life, since swiftness is normally associated with minimization of risks in overtaking maneuvers. On the other hand, it is also possible that the presence of an auditory stimulus (the bell) simultaneous with the visual lane changing signal (the ar- row) might have contributed to a faster reaction in the interrupted condition than in the non-interrupted one, in which the moment to change lanes is only announced visually. 3.3 Subjective Evaluation At the end of each phase of the experiment, participants filled out a questionnaire, rating how pleasant they had found the interruptions, on a scale with 1 meaning extremely unpleasant and 5 meaning extremely pleasant. They rated both the experience of being interrupted as caller and that of being interrupted as driver. The results are shown in Figure 4. Whereas ratings for drivers were varied (M = 2.94, SD = 0.97), the majority of subjects rated the interruptions for the caller as a 2 (M = 2.13, SD = 0.7), and hence were more unanimously displeased with them. 7 In an open post-experiment question, some participants suggested that other inter- ruption modes (such as a sound signal only), interruption manners (with more fore- warning), or interruption utterances might be more acceptable. This remains to be eval- uated. 12 12 10 10 Number of votes Number of votes 8 8 6 6 4 4 2 2 0 1 2 3 4 0 1 2 3 4 5 Caller's rating (1 to 5) Driver's rating (1 to 5) Fig. 4. Scores for pleasantness of interruptions, as rated by drivers and callers It is also important to note that interruptions leave callers temporarily without any tasks to perform, whereas drivers still have their main task to concentrate on: This could also be a reason why callers get more frustrated by interruptions. Finally, pleasantness scores for drivers were correlated neither with number of successful trials nor with the frequency with which the driver had been interrupted (as opposed to the caller). 4 Discussion and Further Work The results presented show that verbal interjections coming from a system do efficiently interrupt an ongoing conversation. They also show that, in a driving situation, doing so improves performance during difficult driving tasks. Further research needs to be done in order to better understand its influence on performance and to find ways in which it can be enhanced. It is essential to cast more light on users’ emotional responses to these interruptions (as callers as well as drivers) and to find ways to minimize frustration and stress. The feedback obtained through the questionnaires raises issues regarding both the content and the mode of interruptions. Among the suggestions, it is possible to identify two trends: Some participants recommended being more explicit as to the reasons be- hind the need for the interruption, whereas others suggested ideas which might appear (at least initially) precisely the opposite, such as shortening the utterances or using a sound signal instead of speech. Exploration of these different strategies and their ef- fects on users will be a next step. Furthermore, it might be possible to combine these seemingly opposed suggestions, for example by producing shorter utterances which 8 still convey more precise information about the situation of the driver (driver overtak- ing, please wait) or sounds which stand for specific driving events. Secondly, it is necessary to explore ways in which speakers can be helped, when resuming the conversation, to remember what was being said before they were inter- rupted. This kind of assistance might also be beneficial for driving performance, as some drivers in our experiment reported that they had to make a considerable effort during interrupted lane changes to keep the state of the conversation in their minds. It might here prove beneficial to monitor the speaker’s production and, depending on the severity of the danger situation, decide whether to interrupt as soon as the alert signal becomes available or to wait for a specific moment in the dialogue, such as a transition- relevance place [12], in a similar way to [5]. Finally, some participants reported a desire for increased control over the system, for example by giving drivers the possibility to activate the interruptions themselves. This is related to [8], who found that granting users control over when a dialogue system resumes its speech after an interruption can improve user satisfaction without harming driving performance. It clearly remains a challenge to find ways in which this can be done without introducing too much additional cognitive effort for the driver. 5 Acknowledgments This work was supported by the Cluster of Excellence Cognitive Interaction Technology ‘CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG). We gratefully acknowledge Sina Zarrieß’s help with results and her always insightful remarks, Oliver Eickmeier’s assistance with scenario generation and Robert Eickhaus’ help with recording the interactions. Finally, thanks to Julian Hough, Ting Han and Spyros Kousidis for valuable discussions and tips. References 1. Baumann, T., Schlangen, D.: The InproTK 2012 release. In: Proceedings of the NAACL- HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012). pp. 29–32. ACL (2012) 2. Drews, F., Pasupathi, M., Strayer, D.: Passenger and cell phone conversations in simulated driving. Journal of Experimental Psychology 14(4), 392–400 (2008) 3. Graham, R.: Use of auditory icons as emergency warnings: evaluation within a vehicle col- lision avoidance application. Ergonomics 42, 1233–1248 (1999) 4. International Organization for Standardization (ISO): Road vehicles – ergonomic aspects of transport information and control systems – simulated lane change test to assess in-vehicle secondary task demand. Standard (2010) 5. Iqbal, S.T., Horvitz, E., Ju, Y.C., Mathews, E.: Hang on a sec!: effects of proactive mediation of phone conversations while driving. In: CHI (2011) 6. Kennington, C., Kousidis, S., Baumann, T., Buschmeier, H., Kopp, S., Schlangen, D.: Better Driving and Recall When In-car Information Presentation Uses Situationally-Aware Incre- mental Speech Output Generation. In: AutomotiveUI 2014: Proceedings of the 6th Interna- tional Conference on Automotive User Interfaces and Interactive Vehicular Applications. pp. 7:1–7:7 (2014) 9 7. Kennington, C., Kousidis, S., Schlangen, D.: Inprotks: A toolkit for incremental situated processing. In: Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). pp. 84–88. Association for Computational Linguistics, Philadelphia, PA, U.S.A. (June 2014), http://www.aclweb.org/anthology/W14-4312 8. Kousidis, S., Kennington, C., Baumann, T., Buschmeier, H., Kopp, S., Schlangen, D.: A Multimodal In-Car Dialogue System That Tracks The Driver’s Attention. In: Proceedings of the 16th International Conference on Multimodal Interfaces. pp. 26–33 (2014) 9. Lison, P.: A hybrid approach to dialogue management based on probabilistic rules. Computer Speech & Language 34(1), 232 – 255 (2015) 10. Maciej, J., Nitsch, M., Vollrath, M.: Conversing while driving: The importance of visual information for conversation modulation. Traffic Psychology and Behaviour 14, 512–524 (2011) 11. McEvoy, S.P., Stevenson, M.R., McCartt, A.T., Woodward, M., Haworth, C., Palamara, P., Cercarelli, R.: Role of mobile phones in motor vehicle crashes resulting in hospital atten- dance: A case-crossover study. BMJ 331, 428 (2005) 12. Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735 (1974), http://www.jstor.org/ stable/412243 13. Schröder, M., Trouvain, J.: The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology 6, 365–377 (2003) 14. Strayer, D.L., Drews, F.A., Crouch, D.J.: A comparison of the cell phone driver and the drunk driver. Human Factors 48, 381–91 (2006) 15. Wienke, J., Wrede, S.: A Middleware for Collaborative Research in Experimental Robotics. In: IEEE/SICE International Symposium on System Integration (SII2011). pp. 1183–1190. IEEE (2011)