Driving Road Safety Forward: Video Data Privacy Task at MediaEval 2021 Alex Liu1 , Andrew Boka1 , Asal Baragchizadeh2 , Chandini Muthukumar2 , Victoria Huang2 , Arjun Sarup1 , Regina Ferrell3 , Gerald Friedland1 , Thomas P. Karnowski3 , Meredith M. Lee1 , Alice J. O’Toole2 1 Division of Computing, Data Science, and Society, University of California, Berkeley, USA 2 School of Behavioral and Brain Sciences, The University of Texas at Dallas, USA 3 Oak Ridge National Laboratory, USA ABSTRACT This paper gives an overview of the Driving Road Safety Forward: Video Data Privacy Task organized as part of the Benchmarking Initiative for Multimedia Evaluation (MediaEval) 2021. The goal of this video data task is to explore methods for obscuring driver identity in driver-facing video recordings while preserving human behavioral information. Figure 1: A sample frame and masking methods same data acquisition system as the larger SHRP2 dataset mentioned 1 INTRODUCTION above, which currently has limited access in a secure enclave. For The lifetime odds for dying in a car crash are 1 in 107 [7]. Each year, the data in this Task, there are drivers in choreographed situations vehicle crashes cost hundreds of billions of dollars [5]. Research designed to emulate different naturalistic driving environments. shows that driver behavior is a primary factor in 23 of crashes and Actions include talking, coughing, singing, dancing, waving, eat- a contributing factor in 90% of crashes [6]. ing, and various other actions that are typical among drivers [8]. Video footage from driver-facing cameras presents a unique op- Through this unique partnership, annotated data from ORNL will be portunity to study driver behavior. Indeed, in the United States, available to registered participants, alongside experts from the data the Second Strategic Highway Research Program (SHRP2) worked collection and processing team who will be available for mentoring with drivers across the country to collect more than 1 million hours and any questions. of driver video [1, 2]. Moreover, the growth of both sensor tech- nologies and computational capacity provides new avenues for 3 EVALUATION OVERVIEW exploration. However, video data analysis and interpretation re- To assess the de-identification of faces and measure the consis- lated to identifiable human subjects bring forward a variety of tency in preserving driver actions and emotions, there will be a multifaceted questions and concerns, spanning privacy, security, preliminary automated evaluation as well as a human evaluation. bias, and additional implications [9]. The goal of the Task will be to The scores for each of the automated and human evaluations will develop identity masking methods that effectively conceal the iden- be combined for an overall assessment, prioritizing the human as- tity of the driver, while simultaneously preserving facial actions sessment of de-identification and action preservation. This Task that can be informative for understanding driver behaviors that is heavily reliant on human evaluation, and we encourage partici- contribute to accidents and other driving actions that pose potential pants to include in their submission any ideas, methods, and results safety hazards. This Task aims to advance the state-of-the-art in from their own evaluation approaches. This includes any available video de-identification, encouraging participants from all sectors data from participants, descriptions of methodology, assumptions, to develop and demonstrate techniques to mask facial identity and and results. This information will be shared with reviewers and the preserve facial action using the provided data. Successful methods project organizers for additional discussion and opportunities for balancing driver privacy with fidelity of relevant information have seed funding for further research. the potential to not only broaden researcher access to existing data, Although we encourage all Task participants to think creatively but also inform the trajectory of transportation safety research, and holistically about how the expectations of privacy, the risk policy, and education initiatives [3]. from potential attackers, and various threat models may evolve, our starting assumptions are that: (1) The drivers are not known to 2 DATA the potential attacker; there is no relationship between the attacker The dataset consists of both high- and low-resolution driver video and the driver; the driver is not a public figure. (2) Information data prepared by Oak Ridge National Laboratory (ORNL) for this from the driver’s surroundings will not influence the attacker’s Driver Video Privacy Task. The videos were captured using the ability to identify the driver. (3) Access to the data is limited to registered users who have signed a Data Use Agreement specifying Copyright 2021 for this paper by its authors. Use permitted under Creative Commons they will not attempt to learn the identity of individuals in the License Attribution 4.0 International (CC BY 4.0). MediaEval’21, December 13-15 2021, Online videos. (4) Attackers have access to basic computational resources. (5) There is a low probability of attackers launching an effective MediaEval’21, December 13-15 2021, Online A. Liu et al. crowd-sourcing strategy to re-identify the drivers, in part due to the Task participants will be given summary data on the overall Data Use Agreement and context in which the data were collected. accuracy of their submission, as well as complete data on their performance for the individual videos. These data should be helpful 4 DE-IDENTIFICATION TESTING for troubleshooting and improving the performance of the mask- Human evaluation is adapted from the methodology as described by ing algorithm. We will also make available summary data on the Baragchizadeh et al. in Evaluation of Automated Identity Masking performance of the other participants, so that individual Task par- Method (AIM) in Naturalistic Driving Study (NDS) [4]. ticipants can determine how their algorithm performed relative to other submissions. 4.1 Human participants The computational face recognition evaluation consists of face Undergraduate student volunteers will be recruited from the Univer- recognition and face detection steps. In the face recognition step, sity of Texas at Dallas (UTD) to participate in the study in exchange masked faces from selected frames are compared with the gallery for research credit. All procedures will be approved by the Institu- face of the driver. We log the number of instances where the match- tional Review Board (IRB) of UTD. In all cases, a minimum of 10 ing metric between the gallery face and the unmasked face indicate students will evaluate each video for identity masking success. a match. In the face detection step, we attempt to detect faces in the 4.2 Procedure masked video, and compute the intersection of union (IoU) score. We compute the cumulative score for IoU across tested frames. Evaluations will be conducted using the masked videos submitted by the Task participants. For each submission1 , a subset of at least 5 ACTION PRESERVATION TESTING 116 masked videos will be evaluated by human participants using an identity-matching procedure. The selected video subset will The human evaluation procedures and results analysis for action be identical for all Task participants and will be chosen by the preservation will be similar to those described for de-identification organizers of the evaluation. The particular set of videos to be used testing, with the following changes. On each trial, a masked video in the evaluation will not be revealed until all submissions have will be presented alongside a list of possible actions. The participant been processed. will be asked to select the “most obvious” action they detect in the On each trial, the participant will be asked to match the identity video. Again, the results will be compiled as the difference between of the person shown in a masked video to one of 5 high-resolution the accuracy of identifying the action in the unmasked video (from static facial images presented simultaneously at the top of the screen. the previous UTD evaluation) and the masked video (from the Task The participants will be offered responses to indicate which of the participant) submission. static images shows the person pictured in the video, or to indicate The automated approach to measuring action preservation will that the person pictured in the video does not appear in the set of use a deep-learning based gaze estimator [10]. The action preserva- the static images. The participant will have the option of replaying tion is estimated by extracting the predicted gaze-vectors from both the video as many times as they want before entering a response. the original un-filtered video and de-identified video and measur- The static face images will be matched demographically to the ing the Euclidean difference between the two unit vectors. Scoring person in the video so that gender, race, and age cannot provide closer to 0 implies quality of action preservation since the gaze cues to the identity of the person in the video. Each static face image estimation is relatively unchanged. will be cropped to show only the internal face, so that identification cannot be based on peripheral face cues such as hair style. 6 DISCUSSION AND OUTLOOK With the increased availability, prominence, and applicability of 4.3 Results Analysis data in our daily lives, multidisciplinary connections and engage- Identification of the original (unmasked) videos from this dataset ment are critical to harnessing societal benefit from advances in was assessed in a previous study at the Univ. of Texas at Dallas, technology and methodology. This focused video de-identification using the methods described here for the Task evaluation. The task serves as a key example of how data science collaborations identification accuracy results for these unmasked videos will be designed to bridge research and practice can simultaneously help used to assess the success of Task participants in masking the address a pragmatic need, while sparking new lines of inquiry and face identities. It is important to note that matching the identity research trajectories. The Driving Road Safety Forward: Video Data of the faces between the unmasked videos and the face images Privacy Task strives to raise awareness about transportation fatali- is not perfect. This is due to differences in the image/appearance ties and how data might enable thoughtful discussion, analysis, and conditions between the static face images (high resolution, cropped actions for the betterment of our community safety. to show only the internal face) and the videos (inside the car, high- and low-resolution, variable expression, etc.). Therefore, masking ACKNOWLEDGMENTS success will be measured for each trial as the difference between the Special thanks to our collaborators and advisors, including David identification of the unmasked video (from the previous evaluation Kuehn, Charles Fay, Daniel Morgan, Natalie Evans Harris, Lauren at UTD) and the identification of the masked video (from the human Smith, René Bastón, Mae Tanner, David E. Culler, the U.S. Depart- evaluation to be conducted on the masked videos submitted by Task ment of Transportation (USDOT), and the National Science Foun- participants). dation (NSF) Big Data Hubs network. This effort is made possible 1 subject to the constraint that the algorithm is submitted by the deadline published on through community volunteers, NSF Grants 1916573, 1916481, and the MediaEval website 1915774, and an inter-agency agreement between NSF and USDOT. Driving Road Safety Forward: Video Data Privacy MediaEval’21, December 13-15 2021, Online REFERENCES [1] 2021. About safety Data: Strategic Highway research Program 2. (2021). http://www.trb.org/StrategicHighwayResearchProgram2SHRP2/ SHRP2DataSafetyAbout.aspx [2] 2021. A Brief Look at the History of SHRP2. (2021). http://shrp2. transportation.org/pages/History-of-SHRP2.aspx [3] 2021. Exploratory Advanced Research Program Video Analytics Research Projects. (2021). https://www.fhwa.dot.gov/publications/ research/ear/15025/15025.pdf [4] Asal Baragchizadeh, Thomas P Karnowski, David S Bolme, and Alice J O’Toole. 2017. Evaluation of Automated Identity Masking Method (AIM) in Naturalistic Driving Study (NDS). In 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 378–385. [5] Lawrence Blincoe, Ted R Miller, Eduard Zaloshnja, and Bruce A Lawrence. 2015. The economic and societal impact of motor vehicle crashes, 2010 (Revised). Technical Report. [6] Thomas A Dingus, Feng Guo, Suzie Lee, Jonathan F Antin, Miguel Perez, Mindy Buchanan-King, and Jonathan Hankey. 2016. Driver crash risk factors and prevalence evaluation using naturalistic driving data. Proceedings of the National Academy of Sciences 113, 10 (2016), 2636–2641. [7] NSC Safety Facts. 2021. Odds of dying. (2021). https://injuryfacts.nsc. org/all-injuries/preventable-death-overview/odds-of-dying/ [8] R. Ferrell, D. Aykac, Thomas Karnowski, and N. Srinivas. 2021. A Publicly Available, Annotated Data Set for Naturalistic Driving Study and Computer Vision Algorithm Development. (2021). https://info. ornl.gov/sites/publications/Files/Pub122418.pdf [9] K. Finch. 2016. A visual guide to practical data de-identification. (2016). https://fpf.org/blog/ a-visual-guide-to-practical-data-de-identification [10] Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation. (2017). arXiv:cs.CV/1611.08860