-

PAL: Privacy-preserving Audio, Visual, and Physiological Contexts for Wearable Context-aware Behavior Change Support

Mina Khan

minakhan01@gmail.com 0 1

Glenn Fernandes

glennfer@mit.edu 0 1

Pattie Maes

pattie@media.mit.edu 0 1 0 College Station '21: Joint Proceedings of the ACM IUI 2021 Workshops 1 MIT Media Lab , 75 Amherst St, Cambridge, MA, USA. 02139 , USA

Context tracking and context-aware interventions are key for behavior change. However, there is a lack of context-aware wearable systems for just-in-time interventions using privacy-preserving on-device deep learning. We created a wearable device with multimodal context sensing (using an egocentric camera, microphone, and optical pulse sensor), an on-device deep learning accelerator (for privacypreserving context detection), and open-ear audio output (for seamless just-in-time interventions). We also added a mobile phone app for sensor control, intervention set-up, data visualization, and geolocation and physical activity sensing. Our system supports custom context tracking and custom context-based interventions. Our open-source system can be combined with diferent deep learning models for context tracking and context-based interventions in diferent behavioral, environmental, and physiological contexts using audio, visual, and heart-rate sensing. Our system combines multimodal context sensing, privacy-preserving on-device deep learning, and just-in-time interventions for a more convergent, holistic, and translational approach towards real-world context-aware behavior change support.

eol>microphone heart-rate egocentric camera privacy-preserving just-in-time audio interventions context-based on-device deep learning

1. Introduction data to the cloud for deep learning raises privacy concerns.

Context tracking and context-aware interven- We present a wearable system, called PAL, tions are key for behavior change, and re- with multimodal context sensing (audio, egosearch has used emotional, behavioral, and centric visual, and heart-rate), on-device deep environmental contexts for user tracking, user learning (for privacy-preserving context demodeling, and context-aware interventions [1, tection), and open-ear audio output (for seam2]. Recent advances in deep learning allow us to recognize diferent user contexts, especially using computer vision and audio processing. However, sending audio or visual less just-in-time interventions). Even though there are several multimodal context-sensing systems, none combine audio, visual, and physiological context sensing with privacypreserving on-device deep learning and justin-time interventions. PAL also includes a mobile app for sensor control, intervention setup, and geolocation and physical activity sensing.

We have made our wearable system opensource, modular, and extensible so that developers, researchers, and users may use it for custom context tracking and behavior change put, heart rate, geolocation, and physical acinterventions in the real world. Our modu- tivity. Egocentric visual context and audio lar and open-source system facilitates a con- input can give key information about the usvergent, holistic, and translational approach er’s behavioral and environmental contexts, towards real-world behavior change to meet especially using deep learning. We also intethe multiple, diverse, and customized behav- grated an optical pulse sensor as heart rate ior change support needs of users and variability is a helpful sensor for detecting researchers [3]. user’s physiological states [4]. Finally, we added geolocation and physical activity sensing as they are commonly used by existing 2. Related Work applications [4] and could be easily accessed via a mobile phone.

There are several self-tracking applications, We added an on-device deep learning proe.g., for memory support and activity track- cessor so that sensitive user data, especially ing [2] and mental health tracking [4]. Be- audio-visual data, could be processed on dehavior change interventions are also common vice, not on cloud, in a privacy-preserving, [5], including context-aware interventions [1]. ofline, and real-time manner. Deep learning is also used, e.g., for activity We added open-ear audio output to proand context recognition [6]. However, there vide real-time, minimally-disruptive, and priare no deep learning systems for just-in-time vate interventions in real-world mobile conbehavior change interventions using privacy- texts. preserving on-device deep learning, multimodal audio, visual, and physiological context sensing, and open-ear interventions. We 4. Implementation created PAL, an open-source wearable system with multimodal context sensing, privacy- We created a wearable system, called PAL (Fpreserving on-device deep learning, and open- igure 1). PAL’s wearable device supports coear audio interventions, to enable convergent ntext-tracking and interventions, and PAL’s and customizable deep learning-based context- mobile/ web app enables data visualizations, aware behavior change support. sensor control, and intervention-setting. The open-source implementation is available here: https://github.com/minakhan01/PAL_Weara 3. Design ble_Opensource/.

We used Google Coral for on-device deep We aimed to leverage deep learning for per- learning. We placed a camera on the ear to sonalized and privacy-preserving context-aw- capture similar contexts as the eyes, without are behavior change support. We chose mul- being too prominent on the face. We added timodal context sensors, on-device deep learn- an optical heart rate sensor on the earlobe as ing, and open-ear audio interventions for con- earlobe Pulse Photoplethysmographic (PPG) text-aware support, and enable custom goal- Heart Rate Variability (HRV) is comparable setting, intervention contexts, and interven- to Electrocardiographic (ECG) HRV [7]. We tion messages using a customizable if-CON- used a mini-speaker for open-ear audio outTEXT-then-INTERVENTION framework. put, a microphone for audio input, and Goo

We decided on five types of contexts for gle’s Places and Activity Recognition APIs for sensing – egocentric camera view, audio in- geolocation and physical activity sensing. context tracking, e.g., food tracking using food detection models or mood tracking using speech processing, and for context-based behavior change support, e.g., using just-in-time adaptive interventions tailored to the user’s contexts [1] . More sensors can also be added to PAL’s wearable system or paired via Bluetooth, e.g. for continuous glucose monitoring along with visual food tracking.

The on-ear camera captures ~70% of a visual context ~1m away. We performed initial evaluations of PAL’s pulse rate sensor with 5 people (3 male, 2 female, =23.4 years, =2.56 6. Conclusion years). We compared PAL’s PPG-HRV to HRV from Series 4 Apple Watch ECG and Zephyr We combined previous research in contextBiopatch ECG. Apple Watch does 30s record- aware behavior change support and recent ings and we did 10 30s recordings per partic- advances in on-device deep learning to creipant, with each participant simultaneously ate a wearable system, called PAL, with mulwearing PAL and Apple Watch. For Zephyr timodal context sensing, privacy-preserving Biopatch, we had 30 minute recordings per on-device deep learning, and open-ear audio participant, with each participant simultane- interventions. PAL supports customizable cously wearing PAL, Empatica E4, and Zephyr. ontext tracking and context-aware intervenThe results are in Table 1. PAL consumes tions to enable diverse and personalized con~0.3A and a 2500 mAh battery lasts ~5 hours. text-aware behavior change support. We used PAL to improve real-world habit-support interventions using personalized visual con5. Applications texts [8], and have open-sourced PAL so that other users, researchers, and developers can We used PAL with low-shot, custom-trainable, use PAL’s multimodal context sensing, privand on-device deep learning models for vi- acy-preserving on-device deep learning, and sual context detection and our findings show open-ear audio interventions for diverse, conthat interventions in personalized visual con- vergent, and customizable context-aware betexts improve real-world habit-support inter- havior change support in wearable contexts. ventions [8]. PAL can be further used for more uating heart rate variability in healthy subjects in short- and Long-Term record[1] I. Nahum-Shani, S. N. Smith, B. J. Spring, ings, Sensors 18 (2018).

L. M. Collins, K. Witkiewitz, A. Tewari, [8] M. Khan, G. Fernandes, A. Vaish, S. A. Murphy, Just-in-Time adaptive M. Manuja, A. Stibe, P. Maes, Iminterventions (JITAIs) in mobile health: proving context-aware habit-support Key components and design principles interventions using egocentric visual for ongoing health behavior support, contexts, in: International Conference Ann. Behav. Med. 52 (2018) 446–462. on Persuasive Technology, Springer, [2] C. Xia, P. Maes, The design of artifacts 2021.

for augmenting intellect, in: Proceedings of the 4th Augmented Human International Conference, AH ’13, Association for Computing Machinery, New York, NY, USA, 2013, pp. 154–161. URL: https://doi.org/10.1145/2459236.2459263.

doi:10.1145/2459236.2459263. [3] M. Khan, G. Fernandes, P. Maes, Users want diverse, multiple, and personalized behavior change support: Need-finding survey, in: International Conference on

Persuasive Technology, Springer, 2021. [4] E. Reinertsen, G. D. Cliford, A review of physiological and behavioral monitoring with digital sensors for neuropsychiatric illnesses, Physiol. Meas. 39 (2018) 05TR01. [5] C. Pinder, J. Vermeulen, B. R. Cowan,

R. Beale, Digital behaviour change interventions to break and form habits, ACM Trans. Comput. -Hum. Interact. 25 (2018) 15:1–15:66. [6] V. Radu, C. Tong, S. Bhattacharya,

N. D. Lane, C. Mascolo, M. K. Marina, F. Kawsar, Multimodal Deep Learning for Activity and Context Recognition, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1 (2018) 157:1–157:27. URL: https://doi.org/10.1145/3161174. doi:10.

1145/3161174. [7] B. Vescio, M. Salsone, A. Gambardella,

A. Quattrone, Comparison between electrocardiographic and earlobe pulse photoplethysmographic detection for eval