=Paper=
{{Paper
|id=Vol-2760/paper3
|storemode=property
|title=Person-Independent Multimodal Emotion Detection for Children with High-Functioning Autism
|pdfUrl=https://ceur-ws.org/Vol-2760/paper3.pdf
|volume=Vol-2760
|authors=Annanda Sousa,Mathieu d'Aquin,Manel Zarrouk,Jennifer Holloway
|dblpUrl=https://dblp.org/rec/conf/ijcai/SousadZH20
}}
==Person-Independent Multimodal Emotion Detection for Children with High-Functioning Autism==
Person-Independent Multimodal Emotion Detection for Children with High-Functioning Autism Annanda Sousa1 , Mathieu d’Aquin1 , Manel Zarrouk2 and Jennifer Holloway3 1 Data Science Institute - National University of Ireland - Galway 2 Institut Galilée - Université Paris 13 3 School of Psychology - National University of Ireland - Galway {a.defreitassousa1, mathieu.daquin, jennifer.holloway}@nuigalway.ie, zarrouk@lipn.univ-paris13.fr Abstract experience for the user in a more human-to-human-like inter- action. The use of affect-sensitive interfaces carries the promise of enhancing human-computer interac- Even with all the advancement of ED for users with typical tion by delivering a system capable of identify- neurological development, usually referred to as neurotyp- ing a user’s emotions and adapt its content ac- ical, when applying those systems to children with autism cordingly. Today’s technology shows great poten- they do not perform well, mainly because of this particu- tial to support children with autism, for example lar population’s way to express emotions [Liu et al., 2008], by using computer systems to improve their so- motivating the need to develop ED systems specifically tai- cial skills. Generally, however, this technology lored for children with autism. Autism Spectrum Disorder does not encompass the potential of affect-sensitive (ASD) is a developmental disorder with spectrum manifes- interfaces. This is mainly due to Emotion De- tation of traits, characterised by impairments in social inter- tection (ED) models built for the general popula- action, communication and repetitive patterns of behaviour tion usually not performing well when applied to and interests. High-Functioning Autism (HFASD) is defined children with autism, who express emotions dif- as ASD without significant cognitive and language impair- ferently. The aim of this project is therefore to ments [Gaus, 2011]. build a person-independent Multimodal Emotion Among the results of a recent meta-analysis [Trevisan et Detection system tailored for children with high- al., 2018] that compared the facial expression production be- functioning autism for the ultimate goal of applying tween a typical development (TD) population and people with it to design affect-sensitive interfaces dedicated to ASD, we can find evidence that people with ASD display fa- children with autism. This is a work in progress and cial expressions less often and less frequently than people the project expects to build upon the current body with TD. Also, their expressions are found to be lower in of knowledge on methods to apply ED systems to quality and less accurate. In the work of [Grossard et al., this specific subset of the general population. We 2020], the results show that a Random Forest model needs expect to apply the overall theoretical and practical more facial landmarks to classify facial expressions from design perspectives that arise from this research in- children with ASD than it needs from children with typical vestigation (e.g. analysis of modalities and features development. Providing more evidence that ED systems de- extraction, behavioural cues based features, fusion veloped for children with typical development do not perform layers and classifier techniques) to propose a guid- well when applied to children with ASD. ing framework for future studies. Nowadays, the development of computer-based interven- tions tools for the treatment of children with autism has in- creased, turning technology to an important ally when it 1 Introduction comes to teaching those children abilities they lack in social Automatic Emotion Detection (ED) aims to automatically and emotional areas [Frauenberger et al., 2012]. There are identify people’s cognitive states or emotions, e.g. happiness, several examples of computer systems [Hopkins et al., 2011], anger, fear using different types of media inputs such as texts, virtual reality (VR) environments [Boyd et al., 2018], tablets video, audio and sensor signals. When combining more than and mobile applications [Hourcade et al., 2012], and even one type of data, they are called Multimodal Emotion Detec- robotic agents that interact with children with ASD as inter- tion systems and usually outperform unimodal systems. vention tools [Rudovic et al., 2018; Marinoiu et al., 2018]. Automatic ED is advancing to become an important com- Studies have shown evidence demonstrating the effectiveness ponent of Human-Computer Interaction (HCI) through affect- of such tools to support ASD [Ma et al., 2019]. Addition- sensitive systems. An affect-sensitive system detects the ally, new methods are emerging on the use of technology user’s emotions and automatically adapts its interaction with to support people on the autism spectrum beyond assistive the human based on their emotions. This kind of features and intervention tools, shifting the focus from just “fixing the has the potential to enhance HCI, creating an individualised problem” to a more holistic approach [Frauenberger et al., Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 14 2016]. This includes, for instance, investigating ways to de- emotions, they are not the most relevant in the context of sign technologies to support children with autism considering autism. Therefore, they chose to target different emotion their special interests and strengths. states more suitable to the autism context, e.g. liking, anx- Being able to automatically identify emotions from chil- iety, engagement [Liu et al., 2008] and calmness [Chu et al., dren with autism can represent an important role in enhanc- 2018]. Another common characteristic is the fact that they ing and individualising HCI between children with ASD and only used one modality of data input for emotion identifica- computer interfaces specially designed to support their needs tion: physiological signals (e.g. heart hate, skin conductiv- and particularities [Sharmin et al., 2018]. Regardless, most ity) [Liu et al., 2008; Bekele et al., 2016; Kushki et al., 2015; technological tools that have been developed to support chil- Sarabadani et al., 2018] and video media input (e.g. facial ex- dren on the autism spectrum do not use automatic ED which pressions, eye gaze, head movement) [Dawood et al., 2018; could be of great relevance to turn them into significant sup- Chu et al., 2018; Ahmed and Goodwin, 2017]. Also, they plementary support to classic interventions that are usually all used machine learning techniques to create the classifier expensive and very much dependent on human presence. An- model, which is the state-of-the-art of general Emotion De- other point of importance is that creating ED systems tailored tection models (i.e. models for the neurotypical population). for children with autism is another step towards inclusion: One more common point is that all of them needed to develop Tools based on ED are being developed focusing on neurotyp- and conduct an experiment to elicit emotions from children ical people, which will not be usable by children with ASD with autism in order to create an annotated dataset. Despite if not adapted to their ways of expressing emotions. Some that, none of the datasets were made available for the research examples of ED application areas are Gaming, Health and community mostly due to privacy issues. Mental Health, which currently does not include the popula- Together these studies provide important evidences to tion on the autism spectrum. show that it is viable to model and automatically identify In the field of Emotion Detection, creating a person- emotions of children with ASD. However, such studies re- independent model is one of many well-known chal- main limited when considering two points: input multimodal- lenges [Cambria et al., 2017]. This challenge refers to build- ities and generalisability of the model. To the best of our ing a model that performs on identifying emotions from peo- knowledge, none of their models used multimodal input data ple which data were not present on the model training dataset. for emotion identification and most of the works created mod- At a high level, it is related to the fact that people express els that are individual-specific. emotions in an individualised manner. General patterns on Multimodal inputs have been used and explored in the expressing emotions are typically applicable for most people, Emotion Detection field, where studies showed that multi- e.g. smiling usually means happiness, however only consid- modal Emotion Detection usually outperform unimodal emo- ering general patterns is not enough to build an Emotion De- tion detection models. Regarding individual-specific ap- tection (ED) system that takes into considerations individual proach, it means that the ED model created was trained sepa- and specific cues to express emotions. rately for each individual child, becoming very good at iden- This fact is still true for people inside the Autism Spectrum tify emotions from that specific child, but not performing well Disorder. On the one hand, people with ASD do not express when applied to other children. This creates a huge impedi- emotions in a similar way to people with typical development. ment for using person-specific Emotion Detection systems in On the other hand, as for the general population, there is not realistic conditions because every time a new child would use a uniform way how people with autism express their emo- the system, it would require the model to be trained on their tions. As a consequence, creating a person-independent ED annotated data. systems that models and reflects how this specific population expresses emotions is needed. 3 Research Objectives This research seeks to advance ED systems tailored to chil- 2 Related Work dren with autism by exploring ways to design and develop Previous studies have developed ED systems tailored to chil- a person-independent Multimodal Emotion Detection system dren with autism. The studies created their ED models envi- to be used by children with autism. The ultimate goal of this sioning different applications: to allow the creation of affect- research is to enable the benefits of ED on HCI for children sensitive computer-based intervention tools [Liu et al., 2008] with ASD. Hence, during this project, we aim to answer the and affect-sensitive e-learning platforms for children with following Research Question: ASD [Dawood et al., 2018; Chu et al., 2018], to generate RQ1: How to create a multimodal Emotion Detection knowledge to support the assessment of autism [Samad et system which: al., 2018], to support the treatment of anxiety, a common co- i) Is tailored to how children with high-functioning autism occurring condition in people with ASD [Kushki et al., 2015], express emotions; and also to allow the creation of a VR-based platform as in- ii) Is person-independent, i.e. reach an equivalent or tervention tool [Bekele et al., 2016; Saadatzi et al., 2013]. higher accuracy than state-of-the-art person-independent ED They all have in common that they did not focus on systems for the neurotypical population, when applied to identifying the 7 basic emotions (i.e. fear, happiness, sad- children with ASD not involved in training the model. ness, anger, disgust, surprise and contempt) because they argued that, although the ED field focuses on those basic To be able to answer RQ1, we further need to explore an- 15 swers to the following research question: RQ2: How to build a ground truth dataset annotated with the emotion states we aim to identify? RQ3: Which modality input(s) and features are more rele- vant for cues extraction in the context of multimodal Emotion Detection for children with autism? RQ4: Which data fusion methods work better in the context of multimodal Emotion Detection for children with autism? 4 The proposed system Considering the objectives stated above, the proposed multi- modal ED system tailored for children with high-functioning autism will involve four input modalities: video, audio, text Figure 1: The four emotions zones for regulation framework. and physiological signals (i.e. heart rate measure). These four modalities were selected for the feasibility of achieving be familiarised with this framework because it is commonly data acquisition by the user’s family and are widely used on used in the context of autism, hence making the tagging task the field of ED. Based on those input, our model will use by the parents more comfortable. Thirdly, considering the features extracted from: facial expressions, body movements, children’s well-being, it is less harmful for the emotional the words content of the speech, the tone of the voice and the comfort of children with HFASD, during the emotion elici- heart rate values. All of them are broadly used in the ED field. tation experiment, to elicit the four emotion zones than other Following the previous related works, we will not focus strong negative emotions, e.g. fear, anxiety (more about data on identifying the 7 basic emotions, i.e. surprise, happiness, collection in Section 6). anger, disgust, contempt, sadness and fear. Instead, we will use a framework of emotion zones for regulation [Kuypers, 5 Methodology 2013] that is extensively used in psychology to help children with ASD to learn emotion regulation. It is common for chil- The methodology’s pipeline to address our research goal is dren with ASD to present impairments in emotion regulation depicted on Figure 2, encompassing five different stages. that is manifested by they finding it hard both to understand their emotions and to calm down after they leave a calming state [Scarpa and Reyes, 2011]. A child with ASD necessi- tates being in a calming emotional state to be able to listen, to interact and to learn. The zones of regulation framework has 4 different zones, that are represented by colours (See Figure 1). One of the emotions zones is the calming zone, represented by the green colour. This is the ideal state, where the child is calm, relaxed and ready to work, to listen and to interact. Another emotion zone is the warning zone (yellow colour). In this state, the child is presenting signals of agitation or excitement. This state can originate from both positive and negative emotions. It can start from intense happiness or excitement, and also from frustration. The following zone is the high-agitation zone, with a red colour. In here the child is really upset or an- gry, presenting serious difficulties in keeping control of their emotions. The last zone is the slowing zone, with a blue colour, in which the child is on low energy and showing emo- tional signals of being sad, tired, sick or bored. The child, Figure 2: The pipeline of this research project. here, might move slower than usual, stop speaking or show delays in responding to interaction. In order to answer our Research Questions, we will follow This project is developing a classifier able to identify which the methodology: of the four emotions zones a child with HFASD is engaged 1. to conduct a study with human participants to elicit, cap- with using multimodal inputs of data. By choosing to use this ture and tag different emotion zones expressions, for emotions zones’ framework we obtain some benefits: firstly, dataset creation (RQ2); the framework additionally includes guidelines on activities to lead children back to the calming zone, making it easy 2. to use the standard ED methodology to build a multi- to incorporate such activities within an affect-sensitive inter- modal ED classifier by iterating over the steps: face. Second, parents of children with ASD are more likely to (a) features extraction (RQ3); 16 (b) fusion information layer design (RQ4); computer-based task environment, i.e. the child will interact (c) training and testing of machine learning models us- with a computer for the task’s execution. ing annotated dataset (RQ1); We developed web-based software to serve as a task envi- (d) evaluation by designing experiments to analyse the ronment. During the experiment, the child will interact with relation between the type of data input, features a computer using the task environment interface. This soft- and data fusion techniques, and the accuracy of the ware is a sequence of tasks expected to elicit each of the emo- model and compare to previous works. tion zones. Between each zone elicitation, we will add calm- ing content to help the child to calm down between emotion Therefore, our first challenge to address is to obtain the zone’s tasks, to both minimise any stress and set a baseline ground truth dataset (phase 1). To achieve this, we have fin- of emotions between the elicitation part. Also, it finalises the ished the design and planning of the experiment for data col- session with calming content. The tasks for eliciting each lection (See Section 6). For the subsequent phases of this zone are as follow video content for zones green, blue and research, we plan to follow the general approach of investi- calming activity, a game for zone yellow and a set of Math gating the state-of-the-art methods applied for the population questions for the red zone. We selected the eliciting tasks without ASD, evaluate their performance within our dataset with the input of psychologists with vast experience on work and propose on how we can extend those methods to the pop- with children with ASD. ulation of children with HFASD. We decided the emotion zones’ elicitation order by con- sidering first the participant’s well-being. So, the green zone 6 Data collection starts the session to be sure we will not cause any negative emotion in the beginning, scaring the child. We then create As a required component for meeting the aim of this research, a crescendo of emotion zones by eliciting the yellow zone we have to create an annotated dataset featuring children with followed by the red zone. This way, by asking the child to ASD expressing emotions because previous related works did solve a demanding worksheet (red zone task elicitation), they not make available any working dataset. To do so, we need to will already be over-excited by having played the game be- conduct a behavioural experiment with human participants to fore (yellow zone task elicitation). The blue zone was se- elicit, capture and tag emotions. lected to be the last because, by the end of the session, it is There is no way to directly observe an emotion because expected that the participant will already show signs of being it is an internal experience of an individual, what we can do tired, therefore becoming easier to elicit the blue zone. To be is to define and capture behavioural indices of the presence the most effective in eliciting the four emotion zones, before of a given emotion. Also, emotions do not just appear out of the session we will ask the parents to answer a questionnaire nowhere, they are usually an individual response of a physical to outline examples of content that usually makes their child or mental event, i.e. an event in the real world or thought, thus move to a certain emotion zone. Based on the content of this we need to evoke them. questionnaire we will adapt the task environment’s content to During the experiment, we intend to collect the behavioural be individualised for each child. indices that children with ASD engage with when they are We will annotate the data collected into four different cat- in different emotion zones, together with the measurement egories, each of them representing one of the emotion zones. of their heart rate. Examples of behavioural indices are fa- The annotation will also include behavioural markers, we cial expressions and body movements such as smiling, flap- will require from the annotator to select which behaviour ping hands, head movement. We will ask the participants to they observed that supported their selection of a given emo- perform tasks expected to evoke the emotion zones while we tion zone. None of the previous works used the emotion capture the participant’s behaviour using different data inputs, zone’s framework as target emotions to identify, hence com- i.e., video, audio and heart rate. We will extract features from paring results with their works will not be straightforward. these data to train a multimodal emotion detection system to To minimise this gap and have some measure of comparison identify the four emotion zones from a child with ASD. we will in parallel annotate the dataset to include the hap- For the study, we will recruit 12 children of the age of 8 piness/unhappiness/neutral emotion. Happiness/unhappiness to 12 years old and their parents/guardians as participants1 . is a measure of Quality of Life (QoF) [Ramey et al., 2019], The aimed participant number is an average of the number of and have being used as independent variable to analyse the subjects selected by the related studies (See Section 2). These effectiveness of interventions in Psychology. We decided to works reported that it was challenging to recruit participants, not only target the happiness/unhappiness emotion for this and they had to operate with a small number of subjects for project because this emotion alone does not have the power their models. To be considered part of this study, the child to represent if a child with ASD is in an optimal state for must 1) have a previous history of diagnosis of ASD, 2) not learning. A child with ASD can become overexcited and agi- have a history of language or intellectual disability, 3) have tated because of happiness and not being able to stay still for their parents or guardian consent to the participation of the learning until they calm down, for example. study. Participation in the study involves performing emo- The children’s parents or guardians will perform the anno- tion eliciting tasks during three different sessions. Each ses- tation task after the eliciting sessions. We will also recruit sion is expected to last around 30 minutes. We will use a psychology students to act as blind annotators. It is part of our future work to define which agreement measure we will 1 http://emotion-asd.datascienceinstitute.ie/ use to annotate the dataset. In this case, the parents are the 17 specialists of identifying their child’s emotions because they work, will be able to identify the child’s emotional zones and know them, but parents also can have biases that an annota- suggest/present activities to bring the child back to a calming tor who does not know the child would not present. Thus, it emotional state based on which emotional state the child is at is important to define metrics of which annotation has more the moment. weight in case of disagreement. Also, it is part of this project’s scope to make the multi- We have developed web-based software to support the an- modal dataset available to the research community. In order notation task. The annotator will watch the video record from to protect the data subject’s privacy rights, the dataset will the study session and will have four different clickable but- be formed by the extracted features from the original raw au- tons on the screen representing each emotion zone. We will dio/video files together with heart rate measures. Therefore, instruct the annotator to click on the button to select an emo- it will only contain non-identifiable data. tion zone, as soon as they identify the emotion zone in ques- Finally, this project expects to build upon the current body tion. When they select a zone, the system asks the annota- of knowledge on methods to apply Emotion Detection sys- tor to indicate which behavioural indices were present that tems to this specific subset of the general population. We guided them on their emotion zone decision. Some examples expect to apply the overall theoretical and practical design of behavioural indices they can point are a body movement, a perspectives that arise from this research investigation (e.g. facial expression, hands’ movements, a word said, etc. analysis of modalities and features extraction, behavioural To create the multimodal annotated dataset, we will fol- cues based features, fusion layers and classifier techniques) low the methodology used by the authors of the RECOLA to propose a guiding framework for future studies. dataset [Ringeval et al., 2013]. RECOLA is a multimodal Currently, we had to temporally pause the experiments for annotated dataset that has the same modalities we intend to data collection. So, we are working on the next phases of include in this study, i.e. video, audio and physiological sig- the methodology pipeline, investigating the state-of-the-art nals, and it was used as a benchmark dataset for several mul- person-independent multimodal emotion detection systems timodal emotion detection challenges. They divided the ses- for the general population to later propose how to adapt them sions’ records into videos of 5 minutes and annotated fixed to the population with ASD. time windows of 400 milliseconds. They also balanced the training, validation and test datasets according to the annota- Acknowledgements tion distribution. This publication has emanated from research conducted with Before running the experiments, we are going to conduct the financial support of Science Foundation Ireland (SFI) un- pilot sessions with the participation of children with typical der Grant Number SFI/12/RC/2289 P2, co-funded by the Eu- development from the same age range of 8-12 years. By run- ropean Regional Development Fund. ning a pilot session, we intend to test the experiment proto- col, data collection, data synchronisation and data analysis We are grateful to Aindrias Cullen for providing us with steps. We expect to verify if the format of the data we will comprehensive advice on data protection legislation, so we collect can be used within the data transformation, analysis could design a project that is compliant with GDPR. We thank and creation of a multimodal emotion detection system. We Dr Ciara Gunning for providing us with specialised advice on will also test the task environment software and the annota- how to work with children with ASD and how to design the tion software. With the information collected during the pilot data collection experiment, as well as her support for recruit- sessions, we will iterate over the experiment protocol, to add ing participants for this study. any needed improvement identified during pilots. The col- lected pilot data will not be published and will be dealt with References the same planned measures for data protection and privacy as [Ahmed and Goodwin, 2017] Alex A Ahmed and Matthew S the data from the posterior study. The results from the pilot Goodwin. Automated detection of facial expressions dur- will not be included in the project’s results. ing computer-assisted instruction in individuals on the This research was reviewed by the Institution’s Research autism spectrum. In Proceedings of the 2017 CHI Con- Ethics Committee and the Data Protection Office at NUI Gal- ference on Human Factors in Computing Systems, pages way and obtained full approval. 6050–6055. ACM, 2017. [Bekele et al., 2016] Esubalew Bekele, Joshua Wade, Dayi 7 Conclusion Bian, Jing Fan, Amy Swanson, Zachary Warren, and Ni- We presented a work in progress of an emotion detection sys- lanjan Sarkar. Multimodal adaptive social interaction in tem tailored for children with high-functioning autism. The virtual environment (MASI-VR) for children with Autism model’s novelty involves mainly two points: the inclusion of spectrum disorders (ASD). Proceedings - IEEE Virtual several data input modalities, and it is a personal-independent Reality, 2016-July:121–130, 2016. model. The input modalities involved in the proposed model [Boyd et al., 2018] LouAnne E Boyd, Saumya Gupta, are video, audio and heart rate. The main foreseen con- Sagar B Vikmani, Carlos M Gutierrez, Junxiang Yang, tribution of this research work is the creation of a person- Erik Linstead, and Gillian R Hayes. vrsocial: Toward im- independent Multimodal Emotion Detection model to be inte- mersive therapeutic vr systems for children with autism. In grated into affect-sensitive systems that support children with Proceedings of the 2018 CHI Conference on Human Fac- autism. The affect-sensitive systems, thanks to this research’s tors in Computing Systems, page 204. ACM, 2018. 18 [Cambria et al., 2017] Erik Cambria, Devamanyu Hazarika, [Kuypers, 2013] Leah Kuypers. The zones of regulation: A Soujanya Poria, Amir Hussain, and RBV Subramanyam. framework to foster self-regulation. Sensory Integration Benchmarking multimodal sentiment analysis. In Interna- Special Interest Section Quarterly, 36(4):1–4, 2013. tional Conference on Computational Linguistics and Intel- [Liu et al., 2008] Changchun Liu, Karla Conn, Nilanjan ligent Text Processing, pages 166–179. Springer, 2017. Sarkar, and Wendy Stone. Physiology-based affect recog- [Chu et al., 2018] Hui Chuan Chu, William Wei Jen Tsai, nition for computer-assisted intervention of children with Min Ju Liao, and Yuh Min Chen. Facial emotion recog- Autism Spectrum Disorder. International Journal of Hu- nition with transition detection for students with high- man Computer Studies, 2008. functioning autism in adaptive e-learning. Soft Computing, [Ma et al., 2019] Tengteng Ma, Hasti Sharifi, and Debaleena 22(9):2973–2999, 2018. Chattopadhyay. Virtual humans in health-related inter- [Dawood et al., 2018] Amina Dawood, Scott Turner, and ventions: A meta-analysis. In Extended Abstracts of the Prithvi Perepa. Affective Computational Model to Extract 2019 CHI Conference on Human Factors in Computing Natural Affective States of Students with Asperger Syn- Systems, page LBW1717. ACM, 2019. drome (AS) in Computer-based Learning Environment. [Marinoiu et al., 2018] Elisabeta Marinoiu, Mihai Zanfir, IEEE Access, 6:67026–67034, 2018. Vlad Olaru, and Cristian Sminchisescu. 3d human sens- [Frauenberger et al., 2012] Christopher Frauenberger, Judith ing, action and emotion recognition in robot assisted ther- Good, Alyssa Alcorn, and Helen Pain. Supporting the de- apy of children with autism. In Proceedings of the IEEE sign contributions of children with autism spectrum con- Conference on Computer Vision and Pattern Recognition, ditions. In Proceedings of the 11th International Confer- pages 2158–2167, 2018. ence on Interaction Design and Children, pages 134–143. [Ramey et al., 2019] Devon Ramey, Olive Healy, Russell ACM, 2012. Lang, Laura Gormley, and Nathan Pullen. Mood as a de- pendent variable in behavioral interventions for individ- [Frauenberger et al., 2016] Christopher Frauenberger, Judith uals with asd: a systematic review. Review Journal of Good, and Narcis Pares. Autism and technology: Be- Autism and Developmental Disorders, pages 1–19, 2019. yond assistance & intervention. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in [Ringeval et al., 2013] Fabien Ringeval, Andreas Sondereg- Computing Systems, pages 3373–3378. ACM, 2016. ger, Juergen Sauer, and Denis Lalanne. Introducing the RECOLA multimodal corpus of remote collaborative and [Gaus, 2011] Valerie L Gaus. Cognitive behavioural ther- affective interactions. 2013 10th IEEE International Con- apy for adults with autism spectrum disorder. Advances ference and Workshops on Automatic Face and Gesture in Mental Health and Intellectual Disabilities, 5(5):15–25, Recognition, FG 2013, (i), 2013. 2011. [Rudovic et al., 2018] Ognjen Rudovic, Jaeryoung Lee, [Grossard et al., 2020] Charline Grossard, Arnaud Dapogny, Miles Dai, Björn Schuller, and Rosalind W Picard. Person- David Cohen, Sacha Bernheim, Estelle Juillet, Fanny alized machine learning for robot perception of affect and Hamel, Stéphanie Hun, Jérémy Bourgeois, Hugues Pel- engagement in autism therapy. Science Robotics, 3(19), lerin, Sylvie Serret, Kevin Bailly, and Laurence Chaby. 2018. Children with autism spectrum disorder produce more am- [Saadatzi et al., 2013] Mohammad Nasser Saadatzi, biguous and less socially meaningful facial expressions: Karla Conn Welch, Robert Pennington, and James An experimental study using random forest classifiers. Graham. Towards an affective computing feedback Molecular Autism, 11(1):1–14, 2020. system to benefit underserved individuals: an example [Hopkins et al., 2011] Ingrid Maria Hopkins, Michael W teaching social media skills. In International Conference Gower, Trista A Perez, Dana S Smith, Franklin R Amthor, on Universal Access in Human-Computer Interaction, F Casey Wimsatt, and Fred J Biasini. Avatar assistant: pages 504–513. Springer, 2013. improving social skills in students with an asd through a [Samad et al., 2018] Manar D. Samad, Norou DIawara, computer-based intervention. Journal of autism and de- Jonna L. Bobzien, John W. Harrington, Megan A. With- velopmental disorders, 41(11):1543–1555, 2011. erow, and Khan M. Iftekharuddin. A Feasibility Study [Hourcade et al., 2012] Juan Pablo Hourcade, Natasha E of Autism Behavioral Markers in Spontaneous Facial, Vi- Bullock-Rest, and Thomas E Hansen. Multitouch tablet sual, and Hand Movement Response Data. IEEE Transac- applications and activities to enhance the social skills of tions on Neural Systems and Rehabilitation Engineering, children with autism spectrum disorders. Personal and 26(2):353–361, 2018. ubiquitous computing, 16(2):157–168, 2012. [Sarabadani et al., 2018] Sarah Sarabadani, [Kushki et al., 2015] Azadeh Kushki, Ajmal Khan, Jessica Larissa Christina Schudlo, Ali-Akbar Samadani, and Brian, and Evdokia Anagnostou. A Kalman filter- Azadeh Kushki. Physiological detection of affective ing framework for physiological detection of anxiety- states in children with autism spectrum disorder. IEEE related arousal in children with autism spectrum dis- Transactions on Affective Computing, 2018. order. IEEE Transactions on Biomedical Engineering, [Scarpa and Reyes, 2011] Angela Scarpa and Nuri M Reyes. 62(3):990–1000, 2015. Improving emotion regulation with cbt in young chil- 19 dren with high functioning autism spectrum disorders: A pilot study. Behavioural and cognitive psychotherapy, 39(4):495–500, 2011. [Sharmin et al., 2018] Moushumi Sharmin, Md Monsur Hossain, Abir Saha, Maitraye Das, Margot Maxwell, and Shameem Ahmed. From research to practice: Informing the design of autism support smart technology. In Pro- ceedings of the 2018 CHI Conference on Human Factors in Computing Systems, page 102. ACM, 2018. [Trevisan et al., 2018] Dominic A Trevisan, Maureen Hoskyn, and Elina Birmingham. Facial expression production in autism: A meta-analysis. Autism Research, 11(12):1586–1601, 2018. 20