1. Introduction

Comparison of Human Experts and AI in Predicting Autism from Facial Behavior

Evangelos Sariyanidi

Casey J. Zampella

Ellis DeJardin

John D. Herrington

0 1

Robert T. Schultz

0 1

Birkan Tunc

0 1 0 Center for Autism Research, The Children's Hospital of Philadelphia , United States 1 University of Pennsylvania , United States

Advances in computational behavior analysis via artificial intelligence (AI) promise to improve mental healthcare services by providing clinicians with tools to assist diagnosis or measurement of treatment outcomes. This potential has spurred an increasing number of studies in which automated pipelines predict diagnoses of mental health conditions. However, a fundamental question remains unanswered: How do the predictions of the AI algorithms correspond and compare with the predictions of humans? This is a critical question if AI technology is to be used as an assistive tool, because the utility of an AI algorithm would be negligible if it provides little information beyond what clinicians can readily infer. In this paper, we compare the performance of 19 human raters (8 autism experts and 11 non-experts) and that of an AI algorithm in terms of predicting autism diagnosis from short (3-minute) videos of = 42 participants in a naturalistic conversation. Results show that the AI algorithm achieves an average accuracy of 80.5%, which is comparable to that of clinicians with expertise in autism (83.1%) and clinical research staf without specialized expertise (78.3%). Critically, diagnoses that were inaccurately predicted by most humans (experts and non-experts, alike) were typically correctly predicted by AI. Our results highlight the potential of AI as an assistive tool that can augment clinician diagnostic decision-making.

eol>autism assistive healthcare technologies digital phenotyping

1. Introduction

too much. Its core traits include observable diferences in social communication, social reciprocity, nonverbal Modern medical disciplines typically rely on a variety communication, and relationships, as well as restricted of technological tools to assist in diagnosis and moni- patterns of interests and activities [6]. The current retor treatment progress. From brain imaging technolo- liance on assessment and interpretation of overt behavior gies to blood and genetic tests, instruments that assist makes autism an excellent candidate for computational medical decision-makers are a cornerstone of modern behavior analysis approaches. Coupling computationallymedicine. In the domain of psychiatry and psychology, derived biomarkers with expert clinician judgment may however, medical decision-making relies nearly exclu- provide an extremely potent approach to autism care, sively on observational or paper-and-pencil instruments. by enhancing the currently limited reliability of clinical Thus, recent advances in computer vision and artificial in- assessments (e.g., DSM-5 field trials Kappa = 0.69) [ 7], telligence (AI) are poised to rapidly advance research and shortening lengthy diagnostic evaluations, and improvclinical decision-making in psychiatry by introducing ing sensitivity for capturing change over the course of reliable and granular tools within a new paradigm: com- treatment and development. putational behavior analysis [ 1, 2, 3, 4, 5 ]. Such tools can This potential has spurred a plethora of studies that capture and quantify human behavior with extraordinary aim to diagnose autism via AI pipelines based on variprecision, even from brief video recordings. ous behavioral modalities and sensors [8]. Notably, to

Autism spectrum disorder (ASD), like nearly all psy- our knowledge, no study has directly compared AI algochiatric conditions, is defined by observable behavioral rithms and human raters with respect to overall prediccues—what a person does well or not well, too little or tive capacity or specific decisions on individual cases. A Evangelos Sariyanidi, Casey J. Zampella, Ellis DeJardin, John D. Her- comparison of this kind is important when it comes to rington, Robert T. Schultz and Birkan Tunc. 2023. Comparison of using AI as an assistive technology for clinical decisionHuman Experts and AI in Predicting Autism from Facial Behavior. In making, as it can determine whether or not AI provides Joint Proceedings of the ACM IUI 2023 Workshops. Sydney, Australia, significant incremental utility beyond existing tools. AI 9$pasagreisy. anide@chop.edu (E. Sariyanidi); zampellac@chop.edu algorithms can maximize and cooperate synergistically (C. J. Zampella); dejardine@chop.edu (E. DeJardin); with human assessment by complementing and augmentherringtonj@chop.edu (J. D. Herrington); schultzrt@chop.edu ing human decisions. On the other hand, clinicians would (R. T. Schultz); tuncb@chop.edu (B. Tunc) have little interest in or benefit from incorporating AI © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License algorithms if their decisions –and errors– highly overlap CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) with their own. We aim to address this issue by examin- Schedule - 2nd Edition, Module 4 (ADOS-2) [11] and ing whether or not AI detects diagnostic indicators that adhering to DSM-V criteria for ASD [12]. All aspects may go unnoticed by human observation. of the study were approved by the Institutional Review

In this paper, our main contribution is comparing the Board The Children’s Hospital of Philadelphia (CHOP). performance of AI and humans with knowledge of autism Two participants were excluded from analysis due to their in accurately classifying autism from a 3-minute get-to- lack of consent for this particular set of experiments or know-you conversation with a non-clinician conversa- their data being unavailable for processing, yielding a tion partner. Specifically, we implemented a computer ifnal sample of 42 participants (ASD: N=15, NT: N=27). vision pipeline for predicting autism using features of Participants underwent a battery of tasks that assessed facial behavior during conversations with a sample of social communication competence, including a slightly = 42 adults – 15 individuals with autism spectrum modified version of the Contextual Assessment of Sodisorder (ASD) and 27 neurotypical (NT) individuals. We cial Skills (CASS) [13]. The CASS is a semi-structured then recruited a total of 19 human raters (8 expert clin- assessment of conversational ability designed to mimic icians, 11 non-experts with experience with autism) to real-life first-time encounters. Participants engaged in predict the diagnostic status of the same participants. The two 3-minute face-to-face conversations with two difexpert raters were doctoral level clinicians with extensive ferent confederates (research staf, blind to participant training on autism, while most of the non-experts were diagnostic status and unaware of the dependent variBA level researchers still learning about autism. Raters ables of interest). In the first conversation (interested watched the same videos of participants’ faces during con- condition), the confederate demonstrates social interest versations that were fed to the computer vision pipeline, by engaging both verbally and non-verbally in the conwithout sound to allow for a fairer comparison with the versation. In the second conversation (bored condition), AI algorithm. the confederate indicates boredom and disengagement

Results suggest that the AI pipeline based on partici- both verbally (e.g., one-word answers, limited follow-up pant facial behavior predicts diagnostic status with 80.5% questions) and physically (e.g., neutral afect, limited eyeaccuracy. This accuracy was comparable to the 80.3% contact and gestures). All analyses throughout this paper overall accuracy achieved by human raters (83.1% for are based on the interested condition only. experts and 78.3% for non-experts), demonstrating the During the CASS, participants and confederates were potential of AI to detect facial behavioral patterns that seated facing one another. Audio and video of the CASS diferentiate adults with autism from neurotypical peers were recorded using an in-house device comprising two in the context of a casual, get-to-know-you conversa- 1080p HD (30 fps) cameras (Fig. 1), which was placed tion. Moreover, we show that the prediction errors of AI between the participant and confederate on a floor stand. and humans had little overlap, indicating that the AI can The two cameras of the device point in opposite direcprovide complementary information that could prompt tions to allow simultaneous recording of the participant and assist clinicians with their evaluations and decision- and the confederate. However, the AI analyses in this making. The fact that all the results of this paper are paper are conducted on the video data of the participant extracted from a brief naturalistic conversation is a sig- only. In other words, even if the context of the conversanificant contribution, as a 3-minute conversation with tion is dyadic, our AI-based analysis is not dyadic since a non-expert is a highly scalable paradigm, and thus a it discards the information from the confederate and fopromising option as a screening or (preliminary) diagnos- cuses only on the participant. We refer to this type of tic procedure. The results of this paper motivate further analysis as monadic analysis. research eforts to understand the decision mechanisms CASS confederates included 10 undergraduate stuof AI algorithms, particularly for uncovering subtle be- dents or BA-level research assistants (3 males, 7 females, havioral patterns in psychiatric conditions. all native English speakers). Confederates were semirandomly selected, based on availability and clinical judgment. In order to provide opportunities for participants 2. Participants and Procedure to initiate and develop the conversation, confederates were trained to speak for no more than 50% of the time Forty-four adults participated in the present study (ASD: and to wait 10s to initiate the conversation. If convern=17, NT: n=27, all native and fluent English speakers). sational pauses occurred, confederates were trained to Participant groups did not difer significantly on mean wait 5s before re-initiating the conversation. Otherwise, chronological age, full-scale IQ estimates (WASI-II) [9], confederates were told to simply naturally engage in verbal IQ estimates, or sex ratio (Table 1). Participant the conversation. Prior to each conversation, study staf diagnostic status (ASD or NT) was confirmed as part provided the following prompt to the participants and of this study using the Clinical Best Estimate process confederates before leaving the room: “Thank you both [ 10 ], informed by the Autism Diagnostic Observation so much for coming in today. Right now, you will have 3

3. Prediction of Autism Diagnosis

3.1. Human Raters presented to human raters in a random order on high resolution monitors.

Raters were instructed to watch each video just once and to make a decision as to whether the study participant had autism or not. They were told that all participants were either confirmed to have autism through clinical evaluation by a licensed expert, or were recruited specifically as neurotypical controls ( i.e., clear cases of individuals without autism). Raters were not allowed to go back and review earlier videos. They were instructed to watch all videos within 1 to 3 viewing sessions, with nearly all being completed in 1 or 2 sessions.

We recruited a total of 19 human raters to view the videos

from the sample of = 42 participants. Eight of the raters were autism clinical experts, doctoral level clinicians with extensive training at the Center for Autism Research (CAR) of CHOP. The remaining 11 (non-expert) raters had some familiarity with autism but not specialized training and worked at CAR. Most of these non- 3.2. Computer vision expert raters were BA-level psychology students learning about autism. 3.2.1. Quantification of facial behavior

The videos that were shown to the human raters were Our goal is to quantify all observable facial behavior prepared as follows: First, we cropped the videos of the of a participant, which includes facial expressions and participant and their corresponding confederate conver- head movements. Also, we did not want to limit analysis sation partner so that only the heads and necks were to emotion-related expressions (e.g., the six basic emovisible. Next, we combined the synchronized videos of tions), as other kinds of facial movements (e.g., commuthe heads/faces of the participant and confederate into a nicative expressions, speech-related mouth movements) single video file per participant such that participant and are also important for diagnosing autism [14]. Thereconfederate were positioned side by side (Fig. 1, right). fore, we quantify behavior using a 3D morphable model The audio was removed in order to allow human raters (3DMM) [15] as 3DMMs contain expression bases (e.g., to focus on the facial behavior, as was the case for the AI [16]) that can quantify any facial movement. Moreover, algorithm. The videos for all = 42 participants were 3DMMs can simultaneously model facial identity, pose, and expression. This increases the precision of parsing fa- labels would be very limiting. Since automated AU detecA 3DMM method produces a dense mesh of three- from nose and cheek regions, as the potential extra infor19 components 7 components

19 components 15 components

Moreover, 3DI can take the parameters of the camera as input, which is critical for increasing the accuracy with which facial expressions and pose are decoupled [19]. dimensional points X ∈ R3× to represent the face in a given video frame I. ( is 23, 660 for the 3DI method). This 3D mesh is a function of the facial pose (i.e., a rotation matrix R ∈ R3× 3 and a translation vector

∈ R3× 1), the facial identity of the person X¯ and the facial expression variation in the image ΔX ∈ R3× :

X = R( X¯ + ΔX) + T,

(1) where the columns of the matrix T ∈ R3× are identically . The matrices of interest in the scope of our study are the matrix of head rotation R and the expression variation, ΔX. 3DMMs represent expression variation as a linear sum, ΔX =

W, where ∈ R× 1 is the vector representing the expression. The expression basis W used by 3DI method is constructed via PCA [16], which limits the interpretability as PCA components are not localized–we cannot associate any PCA component with a specific facial region. To make the results of our study more interpretable, we modified the expression model in a way that the resultant expression model, W′, contains 60 localized basis components as shown in Fig. 2.

Using this model, we represent the expression variation

in the image with the vector ′ that minimizes the norm ||ΔX −

W′′||2. We ignore the 7 components that correspond to the nose and cheek regions (Fig. 2), and we ifnally represent the expression variation in a video of frames with a matrix E of size × horizontally concatenating the expression vectors from all the frames. Finally, using the rotation matrix R estimated at each frame, we compute the yaw, pitch and roll angles per frame, and represent head rotation throughout the video with a matrix Φ of size 3 × . The facial movement variation and head rotation of a person throughout the video are represented together with a matrix Y of 53, obtained by size 56 × , obtained as

Y = However, our analysis is based on correlation of time series (Section 3.2.2), which requires a representation where AU intensity needs to be provided—binary AU tion systems (e.g., OpenFace [20]) provide AU intensity only for a relatively small number of AUs, we preferred to use the 3DMM-based features instead of the AUs. One could also consider to add the AU features to the features Y above, but we refrained from doing so, because the number of our correlation features increases exponentially with the number of rows in Y (Section 3.2.2). This also explains why we refrained from adding the features mation that would be provided by these regions may not justify the exponential increase in the dimensionality of the feature space. That said, the utility of all such extra information should be explored in future AI pipelines that can be trained with data from larger samples. 3.2.2. Correlation features An important aspect of social communication is how diferent modalities of communicative behavior are integrated and coordinated. For example, the ADOS, the gold standard clinical assessment for autism diagnosis, includes criteria that evaluate how an individual combines speech with gestures and eye contact with facial expression [14]. Similarly, the coordination of behavior within a communicative modality (e.g., movements across diferent parts of the face) is important; for example, atypical aspects of facial expressions can be characteristic of autism [21, 22]. Thus, to capture coordination across diferent types of facial and head movements within a person, we apply windowed cross-correlation [23] on the matrix Y. That is, considering the th and th row of

Y as two time series, we compute the cross correlation

between the two, over time windows of length and a step size of /2 (i.e., consecutive time windows have an overlap of 50%). We then compute the average , and standard deviation , of the maximal cross-correlation values (w.r.t. lag) per window. To distinguish between the cases where, say, a mouth movement was followed with a pose variation from the opposite direction, we allow only forward lag on the second time series in the pair, thus ( , , , ) is in general diferent from ( ,, ,).

In sum, since Y has 56 rows, we have 56 × 56 ordered pairs, and with 2 features (i.e., mean and standard deviation) per pair, the total number of features that represent the behavior of a participant is = 6272. 3.2.3. Classification 1.2

1 We predict the diagnostic group of participants (ASD vs. 0 NT) using a linear SVM classifier by simply using the 0 0.2 0.4 0.6 0.8 1 1.2 default value for SVM (i.e., = 1). We report results Mean accuracy (AI) based on nested cross-validation, where the only hyperparameter that is being optimized is the time window , Figure 3: The average prediction accuracy of human raters and we optimize over values of = 1, 2, 4, 6 seconds. against the average prediction accuracy of the AI pipeline, per The time window length that was selected in most cross participant. The average prediction for the AI results in this validation folds was = 2. figure are computed by repeating 5-fold cross-validation 1000

While more advanced AI models based on deep learn- times, and averaging over the predicted 1000 predictions per ing could be used, the sample size is insuficient for reli- participant. ably training deep learning models from scratch. Moreover, to our knowledge, there is no publicly available pre-trained deep learning model that is directly applicable for our problem, thus taking an existing model and re-training only a part of it (e.g., the classification layer) with our data is also not an approach within reach. out of these five human mispredictions were correctly predicted by the AI, including the first participant in the list, whose diagnosis was predicted correctly by only 21% of the human raters. In other words, participants that were dificult for most human raters to accurately classify 4. Results and Discussion were not particularly dificult for the AI. This suggests that the decision mechanism of AI is diferent than that Table 2 shows the prediction accuracy of the human of the humans, and the following results further support raters and the AI method. The results for the AI method this point of view. are obtained via 10-fold cross validation (repeated 100 Fig. 3 plots the average prediction accuracy of human times with shufling participant order). The average ac- raters against the average accuracy of the AI algorithm curacy of expert clinicians is slightly higher than that of per participant. The correlation between these quantities non-experts. Of note, the average accuracy of all human is not strong ( = 0.35) and is mostly driven by the parraters (expert and non-expert) is similar to that of the AI ticipants that are correctly classified by both humans and approach. The average positive predictive value, nega- the AI (i.e., the top right points of the plot). For example, tive predictive value, sensitivity and specificity of the AI if we remove the subjects that are correctly classified by model are respectively 0.86, 0.79, 0.55, 0.95. at least 95% of the human raters, the correlation drops to

We next investigate whether the errors of the human = 0.19. The lack of points in the lower-left quadrant of raters coincide with the errors of the AI algorithm. Ta- the Fig. 3 supports the conclusion that the diagnoses that ble 3 shows the participants whose diagnoses were in- were dificult to predict for humans were not typically accurately predicted by most human raters (i.e., average dificult for the AI, and vice versa. prediction accuracy < 50%), along with the correct diagno- This outcome further supports that the decision mechsis and diagnosis predicted by AI. Results show that four anism of the AI is diferent than that of the humans, and is a desirable outcome if AI is to be used as an assistive among the top features. Fig. 5b plots the proportion technology for human clinical decision-making, since of the eye-, brow-, mouth- and pose-related features in it implies that human decisions can be augmented with the top-10, top-100, top-1000 most important features, as the help of AI. For example, in a potential application well as their proportion in the entire pool of 6272 features. for autism screening from similar short social videos, For example, while the baseline rate of pose features is humans and AI could simultaneously make predictions, only ∼ 5.3% (i.e., ∼ 5.3% of the entire set of 6272 features and humans could re-evaluate their decision if it is incon- are pose-related), we see that the top 10 features contain sistent with the decision of the AI algorithm. However, a pose-related feature at a ratio of ∼ 13.3% (see caption arguably, a scenario of this kind is conceivable only if of Fig. 5 for the computation pose-related features), inthe AI algorithm produces a semantically interpretable dicating that the pose features have ∼ 2.5 times more output—that is, the algorithm lists the detected behav- presence in the top-10 features compared to their baseioral patterns that lead to a diagnostic decision of autism line. Similarly, the baseline rate of mouth-related features vs. NT. Otherwise, without any explanation of the pre- is ∼ 25.5%, but ∼ 40% of the top-10 features are related diction, it would be dificult for a clinician to determine to the mouth, indicating that mouth features also have to what degree the result of the AI algorithm should be greater representation in the set of important features taken into account. compared to their baseline. In sum, our analyses sug

In order to shed some light on the decision mecha- gest that the AI algorithm places high emphasis on posenism of the AI, we analyze the features that were domi- and mouth-related features when classifying between nant in the SVM classifier—the features that had greater autism and NT groups. Further analysis to uncover why weight. Fig. 4 shows the weights of all the features and these features are important is beyond the scope of this Fig. 5a shows the 10 features that had the greatest (ab- study, as this would require more granular expression solute) weight across cross-validation folds along with models (e.g., 3D versions of localized bases [26]), because their names. While a complete analysis of the seman- the approach that we designed from an existing model tic interpretation of each feature is a dificult task, we does not allow us to pinpoint the facial movements of can still gain some insight into the SVM decisions by interest beyond the level of the partitioned regions in inspecting these results. First, note that pose-pose fea- Fig. 2; for example, we cannot distinguish between parts tures (i.e., features that summarize correlation between of the mouth, such as upper lip or mouth corner. Still, two head rotation angles) have the greatest weight on our analyses allowed a degree of interpretation that coraverage (Fig. 4 top), indicating that head movements are roborates previous findings on the importance of mouthimportant for distinguishing behavioral patterns of autis- related movements [2, 4], as well as the central role that tic vs NT participants. Moreover, correlation features head movements have in social orienting, attention and combining the pose and eye emerge as important both backchannel feedback (e.g., nodding) [27, 28, 24, 25, 29]. in Fig. 4 and in Fig. 5a, supporting previous literature suggesting that blinking and nodding are important nonverbal behaviors in conversations [24], and head and eye 5. Conclusions and Future Work movements are indicators of social attention [25]. Second, mouth-related features also emerged as important. In this paper, we studied the prediction of autism from For example, six out of 10 correlation features in Fig. 5a facial movement behavior during casual conversations. are related to mouth, with three of them being pairs of Specifically, we compared the predictive accuracy of exmouth-mouth features. pert and non-expert human raters with that of an AI

We next analyze which, if any, of the four feature cat- algorithm. Results show that, while both humans and the egories (eyes, brows, mouth, pose) have greater presence AI are capable of distinguishing individuals with autism spectrum disorder (ASD) from neurotypical (NT) indi

Average and standard dev. of features per facial regions viduals with high accuracy, their errors do not overlap, Furthermore, research on younger participants is needed, suggesting that the decision mechanism of an AI algo- given that early diagnosis improves access to efective rithm may be diferent than that of a human. Thus, AI early interventions and thus can improve developmental technologies have the potential to provide complemen- outcomes. Another future direction is to investigate the tary information to a clinician and become an assistive benefits of dyadic analysis, where, unlike our monadic tool for decision making. Arguably, the most immediate analysis (Section 2), the behavior of confederate is also application based on our results is a new, semi-automatic taken to account. Finally, user research is necessary to screening technology for autism, where an individual is test if and to what degree clinician diagnoses can be imadvised for further diagnostic evaluation in the event that proved through the use of AI assistive tools. a (non-expert) human or the AI model predicts that the individual exhibits autism-specific behavior. However, in a real life scenario, the problem of interest would be more Acknowledgments dificult as a potential patient may not be NT but may not have ASD either. Thus, future research is needed to This work is partially funded by the National Institutes of identify the performance of humans and AI models in Health (NIH), Ofice of the Director (OD), National Instipredicting ASD diagnosis from neurodiverse samples. tute of Child Health and Human Development (NICHD),

Our results directly motivate further future research and National Institute of Mental Health (NIMH) of US, unin multiple directions. The most pressing future direc- der grants R01MH118327, R01MH122599, 5P50HD105354tion from the perspective of making AI an efective as- 02 and R21HD102078. sistive tool is examination of the behaviors that lead to a predicted diagnosis. Having interpretable outputs is References necessary for using AI technologies in clinics, as clinicians should understand how the AI algorithm makes [ 1 ] M. S. Mast, D. Gatica-Perez, D. Frauendorfer, a prediction before taking this prediction into account. L. Nguyen, T. Choudhury, Social Sensing for PsyS. Guter, J. Tjernagel, L. A. Green-Snyder, S. Bishop, 2018, pp. 59–66.

A. Esler, K. Gotham, R. Luyster, F. Miller, J. Olson, [21] A. Metallinou, R. B. Grossman, S. Narayanan, QuanJ. Richler, S. Risi, A Multisite Study of the Clinical tifying atypicality in afective facial expressions of Diagnosis of Diferent Autism Spectrum Disorders, children with autism spectrum disorders, in: 2013 Archives of General Psychiatry 69 (2012) 306. IEEE international conference on multimedia and URL: http://archpsyc.jamanetwork.com/article. expo (ICME), IEEE, 2013, pp. 1–6. aspx?doi=10.1001/archgenpsychiatry.2011.148. [22] T. Guha, Z. Yang, A. Ramakrishna, R. B. Grossman, doi:10.1001/archgenpsychiatry.2011.148. D. Hedley, S. Lee, S. S. Narayanan, On quantify[11] C. Lord, M. Rutter, P. S. DiLavore, S. Risi, K. Gotham, ing facial expression-related atypicality of children S. L. Bishop, Autism diagnostic observation sched- with autism spectrum disorder, in: 2015 IEEE interule, second edition (ADOS-2), Western Psychologi- national conference on acoustics, speech and signal cal Services, Torrance, CA, 2012. processing (ICASSP), IEEE, 2015, pp. 803–807. [12] APA, Diagnostic and Statistical Manual of Mental [23] S. M. Boker, J. L. Rotondo, M. Xu, K. King, WinDisorders, 5th Edition: DSM-5, American Psychi- dowed cross-correlation and peak picking for the atric Association, Washington, D.C, 2013. analysis of variability in the association between be[13] A. B. Ratto, L. Turner-Brown, B. M. Rupp, G. B. havioral time series, Psychological Methods 7 (2002) Mesibov, D. L. Penn, Development of the Contex- 338–355. doi:10.1037/1082-989X.7.3.338. tual Assessment of Social Skills (CASS): a role play [24] A. Gupta, F. L. Strivens, B. Tag, K. Kunze, J. A. Ward, measure of social skill for individuals with high- Blink as you sync: Uncovering eye and nod synfunctioning autism., Journal of autism and devel- chrony in conversation using wearable sensing, in: opmental disorders 41 (2011) 1277–86. URL: http:// Proceedings of the 23rd International Symposium link.springer.com/10.1007/s10803-010-1147-zhttp: on Wearable Computers, 2019, pp. 66–71. //www.ncbi.nlm.nih.gov/pubmed/21287253. [25] T. Foulsham, M. Gejdosova, L. Caunt, Reading and doi:10.1007/s10803-010-1147-z. misleading: Changes in head and eye movements [14] V. Hus, C. Lord, The autism diagnostic observation reveal attentional orienting in a social context, Vischedule, module 4: revised algorithm and stan- sion 3 (2019) 43. dardized severity scores, Journal of autism and [26] E. Sariyanidi, H. Gunes, A. Cavallaro, Learning developmental disorders 44 (2014) 1996–2012. bases of activity for facial expression recognition, [15] B. Egger, W. A. Smith, A. Tewari, S. Wuhrer, M. Zoll- IEEE Transactions on Image Processing 26 (2017) hoefer, T. Beeler, F. Bernard, T. Bolkart, A. Ko- 1965–1978. rtylewski, S. Romdhani, et al., 3d morphable face [27] A. Krogsager, N. Segato, M. Rehm, Backchannel models—past, present, and future, ACM Trans. head nods in Danish first meeting encounters Graph. 39 (2020) 1–38. with a humanoid robot: The role of physical [16] C. Cao, Y. Weng, S. Zhou, Y. Tong, K. Zhou, Face- embodiment, Lecture Notes in Computer warehouse: A 3d facial expression database for vi- Science (including subseries Lecture Notes in sual computing, IEEE Trans. Vis. Comput. Graph. Artificial Intelligence and Lecture Notes in 20 (2013) 413–425. Bioinformatics) 8511 LNCS (2014) 651–662. URL: [17] E. Sariyanidi, H. Gunes, A. Cavallaro, Automatic https://link-springer-com.proxy.library.upenn. analysis of facial afect: A survey of registration, edu/chapter/10.1007/978-3-319-07230-2_62. doi:10. representation, and recognition, IEEE Transac- 1007/978-3-319-07230-2_62/COVER. tions on Pattern Analysis and Machine Intelligence [28] K. B. Martin, Z. Hammal, G. Ren, J. F. Cohn, J. Cas37 (2015) 1113–1133. doi:10.1109/TPAMI.2014. sell, M. Ogihara, J. C. Britton, A. Gutierrez, D. S. 2366127. Messinger, Objective measurement of head move[18] E. Sariyanidi, C. J. Zampella, R. T. Schultz, B. Tunc, ment diferences in children with and without Inequality-constrained and robust 3d face model autism spectrum disorder, Molecular Autism 9 iftting, in: Eur. Conf. Comput. Vis., 2020, pp. 433– (2018) 14. doi:10.1186/s13229-018-0198-4. 449. [29] J. Hale, J. A. Ward, F. Buccheri, D. Oliver, A. F. C. [19] E. Sariyanidi, C. J. Zampella, R. T. Schultz, B. Tunc, Hamilton, Are You on My Wavelength? InterperCan facial pose and expression be separated with sonal Coordination in Dyadic Conversations, Jourweak perspective camera?, in: IEEE Conf. Comput. nal of Nonverbal Behavior 44 (2020) 63–83. doi:10.

Vis. Pattern Recog., 2020, pp. 7173–7182. 1007/S10919-019-00320-3/FIGURES/6. [20] T. Baltrusaitis, A. Zadeh, Y. C. Lim, L.-P. Morency,

Openface 2.0: Facial behavior analysis toolkit, in: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE,

chology: Automated Interpersonal Behavior As- [5] D. Q.

McDonald , C. J.

Zampella , E. Sariyanidi,

sessment , Psychological Science 24 ( 2015 ) 154 - 160 . A. Manakiwala , E. DeJardin, J. D. Herrington , R. T.

doi:10 .1177/0963721414560811. Schultz , B.

Tunç , Head Movement Patterns dur[2] C. J.

Zampella , L.

Bennetto , J. D.

Herrington , Com- ing Face-to-Face Conversations Vary with Age , in:

fect Coordination in Youth With Autism Spectrum ference on Multimodal Interaction , 2022 .

Disorder , Autism research : oficial journal of the [6]

S. E.

Levy ,

D. S.

Mandell , R. T. Schultz, Autism,

International Society for Autism Research 13 ( 2020 ) Lancet 374 ( 2009 ) 1627 - 1638 .

2133- 2142 . URL: https://pubmed.ncbi.nlm.nih.gov/ [7]

D. A.

Regier ,

W. E.

Narrow ,

D. E.

Clarke , H. C.

32666690 /. doi: 10 .1002/AUR.2334. Kraemer , S. J.

Kuramoto , E. A.

Kuhl , D. J.

Kupfer , [3] C. J.

Zampella , E.

Sariyanidi , A. G.

Hutchinson , G. K.

DSM-5 field trials in the United States and Canada,

Learning Diferences in Autism Spectrum Disor- 170 ( 2013 ) 59 - 70 . URL: http://psychiatryonline.

der, in: Companion Publication of the International org/doi/abs/10.1176/appi.ajp. 2012 .12070999http:

Conference on Multimodal

Interaction

, Associa- //www.ncbi.nlm.nih.gov/pubmed/23111466.

tion for Computing Machinery , Inc, 2021 , pp. 362 - doi :10.1176/appi.ajp. 2012 . 12070999 .

370. URL: https://doi.org/10.1145/3461615.3485426. [8] R. A . J. de Belen ,

Bednarz , A . Sowmya,

doi:10.1145/3461615 .3485426. D. Del Favero , Computer vision in autism spectrum [4] J.

Parish-Morris , E.

Sariyanidi , C.

Zampella , G. K.

disorder research: a systematic review of published

Bartley , E.

Ferguson , A. A.

Pallathra , L. Bateman, studies from 2009 to 2019, Translational psychiatry

Plate ,

Cola ,

Pandey ,

E. S.

Brodkin , R. T. 10 ( 2020 ) 1 - 20 .

Schultz , B.

Tunc , Oral-Motor and Lexical Diver- [9] D.

Wechsler , Wechsler Abbreviated Scale of Intelli-

with Autism Spectrum Disorder , in: Proceedings San Antonio, TX, 2011 .

of the Fifth Workshop on Computational Linguis- [10]

Lord ,

Petkova ,

Hus ,

Gan ,

Lu , D. M.

Stroudsburg , PA, USA, 2018 , pp. 147 - 157 . URL: http: ren,

Klin ,

Saulnier , E. Hanson, R. Hundley,

//aclweb.org/anthology/W18-0616. doi: 10 .18653/ J. Piggot, E. Fombonne,

Steiman ,

Miles , S. M.

v1/ W18 -0616. Kanne,

R. P.

Goin-Kochel ,

S. U.

Peters ,

E. H.

Cook ,