Socially-Aware Interfaces for Supporting Collocated Interaction Gianluca Schiavo1 1 CIMeC, Center for Mind/Brain Sciences, University of Trento, and FBK – Fondazione Bruno Kessler, Trento, Italy gianluca.schiavo@unitn.it Abstract. One of the important challenges in ubiquitous computing is to improve the computer’s access to information available in the social context. The goal of my PhD project is to investigate how to design interfaces that support collocated multi-users interactions taking into account users’ non- verbal behaviour (specifically, gaze, postures, and body movements). In particular, the research activities are twofold: to understand which non-verbal cues and social signals reflect cooperation, engagement, and group cohesion in collocated group activities and to design systems that can handle and utilise this information. To this end, I present an integrated research approach for designing multi-user interactions based on the social signal processing approach. I also discuss the progress to date targeting the development of systems able to sense and react to social context. 1 Introduction Multi-user interfaces are collaborative computing interfaces that support simultaneous participation of multiple users who are collocated in the same place. Examples of technologies that integrate multi-user interfaces are interactive tabletops, public displays (e.g. large wall displays) and peripheral displays. These technologies represent a further step beyond the traditional desktop metaphor: multi-user systems differ from standard personal computers providing a shared workspace and a platform for the cooperation and collaboration of several users engaged in the same task. Since they are co-located technologies, multi-users interfaces preserve the freedom of naturally interacting and communicating among users. For these reasons, they are assumed to be better for group activities since they can support equal and flexible forms of collaboration compared to individual technologies [17]. However, designing for multi-user interfaces can raise different issues to single- user interfaces. There are in fact a number of social dynamics characterizing the interaction of users with the same interface that should be taken into account in the design process. Many researchers [13, 17] have exanimated the social interactions that occurred while users are engaged with multi-user interfaces investigating dynamics such as: group formation, how users approach the screen and share the space around the display; territoriality, the ways in which the space of the surface and around the device is spatially organised in personal and shared regions for interaction; the Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. importance of the awareness of others’ activity and the impact that the public visibility of the interaction has on communication and on task organization; the styles of interaction, how people use the technology in parallel or collaborative interaction and the extend to which participation and tasks are distributed across group members. These are all relevant social dynamics that should be addressed when dealing with multi-user interfaces. In fact, multi-user interfaces, more than single-user ones, should be designed with the social context in mind, leveraging the benefits of group interactions. But one notable limitation of most multi-users interfaces is their limited ability to adapt to the complex and dynamic social context in which they are used. My research project aims to investigate systems’ capabilities for sensing the social context, based on the assumption that multi-user interfaces should provide information in a way that best suits the users’ needs both as individuals and as group members. The research considers both psychological and computational models of group behaviour with an emphasis on the application of multi-user interfaces that integrate these models in order to sense and react to their surrounding social environment. From this perspective, automatic machine sensing converges with psychological and social research, addressing how multi-user technology is enabled to sense and properly react to social context. 1.1 Exploring Social Signals Context-awareness is one of the core concepts in ambient intelligence computing. Indeed, ambient intelligence envisions computing systems that are meant to have abilities to handle information by modelling, inferring and learning what is going on in contexts where such technologies are situated. In designing context-aware systems, the notion of context should include both physical and social features [14]. Physical aspects encompass location and environmental conditions, whereas social context comprises all the characteristics of a social group co-present in the same environment and mainly includes social interactions and group dynamics. But while considerable research on methods and applications for sensing physical factors has been conducted, social context is comparatively under-investigated in the HCI research, given the difficulties of sensing and handling information available from the social domain. Extensive research has been conducted at the intersection of psychology and computer science to investigate social context and the behaviour of groups of people interacting with technology, notably from the research fields of nonverbal communication [10, 2], affective computing [9] and the recent area of social signal processing (SSP) [16, 8]. Research in psychology and in social cognition regarding the mechanisms of non- verbal behaviour has suggested that some social cues are the result of automatic processes. Experimental studies have proved that many social constructs and actions are determined by the display and the interpretation of specific behavioural cues like gaze, postures, and body movements [2]. Affective computing is a domain within computer science pioneered by the research of Rosalind Picard [9] that studies the development of machines able to Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. recognize and respond appropriately to emotions. Advances in this field have provided systems with the ability to detect and recognize different emotional states from various verbal and non-verbal behavioural cues like facial expressions, body postures, and physiological signals. Following these results, new approaches have been developed to further explore applications of affective interfaces moving from individual to group perspective. In line with this view is the social signal analysis approach [16]. Social signal processing (SSP) is indeed the area of research that investigates the automatic identification and analysis of social signals that humans display during their social interactions, adopting systems that sense multimodal behaviours and recognise the underlying socio-emotional patterns. According to related literature, social signals are verbal and non-verbal behaviours that, directly or indirectly, convey information about social facts, i.e. social actions, social interactions, social emotions and relationships [10]. In this view, typical social signals are turn-taking and backchannel behaviours, mimicry, dominance and mutual synchronization. These social signals are produced and expressed using different modalities, including verbal and vocal features, facial expressions, head movements, gaze, body and spatial behaviour. Social signal processing is mainly focused on the computational analysis of verbal and non-verbal behaviour, and in the modelling of these signals in significant social dynamics. More specifically, SSP research explores method for recognition and identification of human behaviour in naturalistic data, using algorithms and computing systems able to infer social signals from simple features extracted from the acoustical and visual scene analysis. This is made possible by the use of systems that can automatically sense and interpret cues to infer and analyse human behaviour. This domain draws on methods and theoretical approaches from different disciplines: on the one hand, cognitive science and ethnography can help to define and understand behavioural cues [10]; on the other hand, computer science, artificial intelligence and engineering allow the development of context-aware systems. In the SSP research domain, investigations have been conducted to explore computational models of many social interactions in small groups such as joint attention and interest, social relations and group cohesion [4], but applications and studies in the HCI domain are still scarce. My research project involves social signal processing (SSP) as a method for capturing multi-modal human behaviours and recognizing the underlining socio- emotional patterns. The purpose is to investigate and design multi-user systems that adapt their interface, taking into account the users’ social contexts. In the research field of HCI, considerable research has been conducted in intelligent and context-aware interfaces but only a few works have tried to automatically measure social aspects of human-human interaction, using this information to inform multi-user interfaces and support group activities. The research of Pentland et al. [8, 7] have used sociometric badges to monitor communication patterns and other social signals during team activity, reporting a graphical representation of group dynamics to the members themselves. Similarly, DiMicco et al. [3] have investigated peripheral displays that visualize the amount of participation (in terms of vocal activity) of the member of a small Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. group during a meeting, with the purpose of stimulating individual reflections on the on-going activity, thus harnessing social collaboration. Considering physiological signals, Slovak et al. [15] have investigated how communicating signals as the heartbeat rate can implicitly change the experience of a social situation, and the behaviour displayed toward other people interacting. Notably, Balaam et al. [1] showed how a multi-user public display can enhance interactional synchrony displaying subtle feedback about users’ behaviour. The findings from this study suggest that social dynamics, like rapport, can be leveraged by machine to support group behaviour, without requiring a direct and exclusive interaction with the users. In fact, in this research, and similarly in other studies that investigated persuasive and empathic technologies [6], the focus is specifically on human-human interaction rather than on explicit human-machine interaction. My research thus considers both psychological and computational models of group behaviour with an emphasis on the application of multi-user interfaces that integrate these models in order to sense and react to their surrounding social environment. In the next sections, the research challenges, the approach and the progress to date are presented. 2 Research Challenges Among the research challenges, the following three are of particular relevance for this project: • Defining the information that certain social cues can provide to multi-user systems. While there are many social cues that have been described in the psychological literature, it is important to identify a basic subset of signals that promise to be relevant for the characterization of individual and group states, and for which automatic extraction seems feasible. • Exploring novel interaction techniques for multi-user systems that leverage social signal processing. Exploring aspects related to indirect interaction, including the mechanisms for communicate with the members of the group in a non-invasive but effective way. • Designing guidelines for applications of socially aware multi-user interfaces. The project should provide some best practices and guidelines for the design and the evaluation of socially-aware interfaces. 3 Research Approach The project adopts an integrated approach for designing and implementing socially- aware multi-user interfaces. A. Sensing Individual and Group Behaviour The first step of the research approach is to define which verbal and non-verbal behaviours can be reliably detected by the system. This first step comprises both Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. theoretical and technical challenges. The theoretical challenge is to understand and define the relevant behaviours and social signals that should be addressed. To this end, research from non-verbal behaviour and communication can provide insight on which behavioural cues should be considered and on which information can be extracted from them. The technical challenges are identifiable in how to accurately detect the intended behaviour in real world scenarios, which might be characterized by high levels of noise and extended inter-subjects variability. B. Modelling Behaviour and Social Signals Once intended non-verbal cues are extracted from the social context, models of the raw data are required to highlight significant patterns in the data and to fuse cues that carry different amounts of information with respect to the social signals of interest. In this specific challenge, affective computing and social signal processing are relevant research fields dealing with models of multimodal behaviour of single users and groups. C. Integrating Social Signals in Multi-User Interfaces The last step of the research approach is to integrate the information obtained from the previous steps (“sensing” and “modelling”) in order to provide the interface with meaningful information about the social context. The interface should in fact actively support both human-computer and human-human interaction, with the goal to augment the interaction process. To this end, the interface should enact feedback mechanisms in the background to support the activity. Such mechanisms should be investigated to explore what feedback modalities and temporal characteristics are most effective. Moreover, factors determining acceptability and effectiveness of the system should be assessed [5] using different methodologies: from questionnaires and interviews to observational methods using behavioural coding of audio-video recordings. 3 Progress To Date During the first year of my PhD I have made some progress and presented my work in conferences related to HCI. In the following paragraphs I will discuss the progress to date targeting the development of systems that sense and react to social context. Three studies will be thus presented as well as their design and evaluation processes. The Engagement Study In this study, focused more on the first two steps of the research approach, we investigated a novel approach in order to detect three affective states that characterize the activity of video game playing: engagement, stress and boredom. We developed a system that estimates the player’s affective state through recognizing non-verbal Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. behavioural cues using off-the-shelf hardware, such as a webcam, a keyboard and a mouse. The system was tested in an empirical study with the aim to gather data and model the player's affective states. Facial expressions, head movements, keyboard and mouse activities were recorded while the participants played a single shooter video game. We used an adapted version of the experience sampling methodology to gather the ground truth and trained an SVM (Support Vector Machine) that recognized the affective states with an accuracy of 73%. The results showed that affective states as engagement, stress and boredom may be estimated by taking non-verbal behaviours as head movements and facial expressions into consideration. The Mediaplayer: Sensing and Reacting to Users’ Interest In this second study the concept of engagement was considered in a broader perspective, taking into account the behaviour of groups of people and exploring socially-aware interfaces for public displays. Public displays are encouraging multi-user systems for public and semi-public spaces because of their: (i) ubiquitous potential, they can provide ubiquitous access to information; (ii) social-aware potential, they are media that can support both individual and group interactions in public and social contexts; (iii) context-aware potential, they are situated artefacts deeply embedded in their specific physical and social environment. In this work [11], we mostly focused on the two latter points, proposing a social-aware public display that provides different level of information accordingly to the perceived interest of the user(s) and the social context We designed and developed a public display, named the Mediaplayer, which detects the audience interest and adapts the on-screen content accordingly. The Mediaplayer is a public display that tracks the surrounding area by means of a 3D depth sensor (Microsoft’s Kinect sensor). The depth information is used to detect audience’s non-verbal cues, including users’ spatial position, gaze orientation and group configuration (Fig. 1). This information is then used to estimate the level of attention and interest of the audience and to automatically adapt the interface providing more information on the screen if the users show more interest to the display. In order to explore in an ecological setting how users interact with a context-aware public display, a field study was carried on. The Mediaplayer was deployed during a cultural event in a public space. Fig. 1. User in front of the Mediaplayer (left image), the related depth image from the Kinect sensor (middle) and the final elaborated image with information about the user’s distance and head orientation (right). Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. We compared the adaptive system with a control condition, consisting of a non- adaptive system where the same content is offered without any adaptation. In the control condition, the information about the visual scene was collected as described above, but not used by the system. In order to minimize the influence of time, people affluence and light conditions on the results, the two conditions were counterbalanced during the study, switching automatically every 60 minutes. Roughly 350 people interacted with the Mediaplayer and ecological data about behaviour of individuals and groups in front of the screen was collected [11]. The field study revealed that the context-aware public display is more appealing for the audience than the control condition: more people approached the display while in the adaptive mode, people showed higher levels of interest and considered the experience more engaging as reported in a post-interaction questionnaire. This study showed that behavioural-based measures are valuable data to inform and adapt a public display in a socially-aware way, improving users’ engagement with the technology. Agora2.0: A public display for Civic Participation Agora2.0 is a platform for civic participation composed of two equally relevant features: an online system for voting ideas and an interactive public display deployed in a public space relevant to the community [12]. The system allows the public administration staff and the citizenry to post polls and gather opinions about local issues through questions that are answered online or onsite. Agora2.0 was evaluated in a realistic deployment in a public setting, where the system was used by actual citizens and their public administration (Fig. 2). The aim of this project was to encourage civic engagement bridging a virtual space for public deliberation with a physical space typically used by the community to discuss local issues. Agora2.0 is still a research-in-progress: nevertheless, its deployment showed some relevant points that can be considered when designing and deploying a big display in a public space. The next step in this direction would be to provide a system like Agora2.0 with the adaptability of the Mediaplayer in order to combine the potentialities of the two systems. Socially-aware public displays could have the potential to attract and maintain the audience attention to the system, promoting users’ engagement and participation to the platform. This scenario opens also opportunities for exploring new interaction techniques for public systems. These techniques can be either implicit or explicit, and they should support collaboration and cooperation among users. Fig. 2. A user (left) and a group of users (right) interacting with Agora2.0 Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. Future Steps: the Conversation Table Future steps in this project will involve the exploration of a socially-aware system for collocated group activities taking place around a table. In an on-going study, the focus shifts from a public situation to a more semi-public context, targeting team activities. This scenario provides some advantages on the technical side and an optimal setting for controlled studies. The conversation table is a socially aware system enhanced with sensors and peripheral displays that support the group communication activity. The system is equipped with Kinect sensors to get information about non-verbal behaviour of group’s members. The conversation table continuously monitors the group social dynamics and uses this knowledge to plan and deploy strategies to support and influence the group behaviour through peripheral displays. The system follows the three main steps of the research approach: • Recording natural individual and group interaction data detecting multimodal behavioural cues; • Learning models from these features to detect relevant social signals (e.g. dominance, equitable communication, subgroups); • Generating appropriate feedbacks in real-time using the outputs of the models to influence the group activity. Each of these steps presents challenges: • The multimodal features extracted in real-time from the recording signals (including vocal activity, gaze orientation, and head movements) need to be identified and modelled in meaningful social signals such as group attention, mood and cohesion. • By combining these multimodal features and interpretations, the system needs to reason about, plan and realize the actions to perform in response to a series of constraints (e.g. correct timing, deliver the appropriate stimuli to the specific target). To this end, continuous perception and interpretation of the group dynamics is required to keep the system effective. • The development and evaluation of such real-time continuous systems also requires careful attention to the research methodology in terms of experimental design and evaluation methods. We are currently in an advanced phase of development and we are planning to test the system in the near future. 4 Expected Contributions I believe that attending the CHItaly 2013 Doctoral Consortium will give me a chance to share my work and vision on socially-aware systems with other students as well as with senior researchers in the consortium. The participation to the DC will surely provide me with an opportunity to get fresh perspectives on my project from researchers with different backgrounds, hopefully getting guidance on future directions of my project. Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. Short Bio I am a psychologist, currently a second year PhD candidate in cognitive science at the University of Trento (Italy) and at FBK - Fondazione Bruno Kessler, working under the supervision of Dr. Massimo Zancanaro. I completed my MSc degree in experimental psychology and cognitive sciences at the University of Padova (Italy) in 2011. During my studies, I worked at the Human Technology Lab (HTLab) at the University of Padova (Italy) and I completed an internship at the Helsinki Institute of Information Technology (HIIT), Finland. My current research interests are oriented towards human-computer interaction (HCI) and co-located multi-user systems (e.g. interactive tabletops, public displays and proximity-based technologies). The key point of my PhD project is to explore how to design socially-aware multi-user interfaces for supporting co-located interaction, applying theoretical frameworks and tools from cognitive science and machine learning, such as social signal processing. As part of my research, I am investigating the role of non-verbal behaviour in human-computer as well in human- human interactions to investigate how multi-user systems can leverage this information. References 1. Balaam, M., Fitzpatrick, G., Good, J., & Harris, E. Enhancing interactional synchrony with an ambient display. In Proc. CHI 2011, ACM Press (2011), 867-876. 2. Bargh, J. A., Schwader, K. L., Hailey, S. E., Dyer, R. L., & Boothby, E. J. Automaticity in social-cognitive processes. Trends in Cognitive Sciences 16, 12 (2012), 593–605. 3. DiMicco, J. M., Hollenbach, K. J., Pandolfo, A., & Bender, W. The impact of increased awareness while face-to-face. Human–Computer Interaction 22, 1-2 (2007), 47-96. 4. Gatica-Perez, D. Automatic non-verbal analysis of social interaction in small groups: A review. Image and Vision Computing 27, 12 (2009), 1775-1787. 5. Höök, K. Steps to take before intelligent user interfaces become real. Interacting with Computers 12, 4 (2000), 409-426. 6. Janssen, J. H. A three-component framework for empathic technologies to augment human interaction. Journal on Multimodal User Interfaces 6, 3-4 (2012), 143-161. 7. Kim, T., Hinds, P., & Pentland, A. S. Awareness as an antidote to distance: Making distributed groups cooperative and consistent. In Proc. of CSCW 2012, ACM Press (2012), 1237-1246 8. Pentland, A., with T. Heibeck. Honest Signals: How They Shape Our World, MIT Press, 2008. 9. Picard, R. W. Affective computing. MIT press, 2000. 10. Poggi, I., & D’Errico, F. Social signals: A psychological perspective. In Salah, A.A., Gevers, T. (eds.) Computer Analysis of Human Behavior. Springer (2011), 185-225. Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 11. Schiavo, G., Mencarini, E., Vovard, K., & Zancanaro, M. Sensing and reacting to users' interest: an adaptive public display. In Proc. EA CHI 2013, ACM Press (2013), 1545-1550. 12. Schiavo, G., Milano, M., Saldivar, J., Nasir., T., Zancanaro, M., & Convertino, G. Agora 2.0: Enhancing Civic Participation through a Public Display. In Proc. of C&T’13, ACM Press (2013). 13. Scott, S.D., Grant, K.D., & Mandryk, R.L. System Guidelines for Co-located, Collaborative Work on a Tabletop Display. In Proc. of ECSCW'03. Springer (2003). 14. Schmidt, A. Context-Aware Computing: Context-Awareness, Context-Aware User Interfaces, and Implicit Interaction. In M. Soegaard, and R. F. Dam (eds.). The Encyclopedia of Human-Computer Interaction. The Interaction Design Foundation, 2013. 15. Slovák, P., Janssen, J., & Fitzpatrick, G. Understanding heart rate sharing: towards unpacking physiosocial space (2012). In Proc. CHI 2012, ACM Press (2012), 859-868. 16. Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D'Errico, F., & Schröder, M. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing 3, 1 (2012), 69- 87. 17. Yuill, N., & Rogers, Y. Mechanisms for collaboration: A design and evaluation framework for multi-user interfaces (2012). ACM Transactions on Computer-Human Interaction (TOCHI), 19(1), 1-25. Proc. of CHItaly 2013 Doctoral Consortium, Trento (Italy), September 16th 2013 (published at http://ceur-ws.org). Copyright © 2013 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.