<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PAL: Privacy-preserving Audio, Visual, and Physiological Contexts for Wearable Context-aware Behavior Change Support</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mina Khan</string-name>
          <email>minakhan01@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Glenn Fernandes</string-name>
          <email>glennfer@mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pattie Maes</string-name>
          <email>pattie@media.mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College Station '21: Joint Proceedings of the ACM IUI 2021 Workshops</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MIT Media Lab</institution>
          ,
          <addr-line>75 Amherst St, Cambridge, MA, USA. 02139</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Context tracking and context-aware interventions are key for behavior change. However, there is a lack of context-aware wearable systems for just-in-time interventions using privacy-preserving on-device deep learning. We created a wearable device with multimodal context sensing (using an egocentric camera, microphone, and optical pulse sensor), an on-device deep learning accelerator (for privacypreserving context detection), and open-ear audio output (for seamless just-in-time interventions). We also added a mobile phone app for sensor control, intervention set-up, data visualization, and geolocation and physical activity sensing. Our system supports custom context tracking and custom context-based interventions. Our open-source system can be combined with diferent deep learning models for context tracking and context-based interventions in diferent behavioral, environmental, and physiological contexts using audio, visual, and heart-rate sensing. Our system combines multimodal context sensing, privacy-preserving on-device deep learning, and just-in-time interventions for a more convergent, holistic, and translational approach towards real-world context-aware behavior change support.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;microphone</kwd>
        <kwd>heart-rate</kwd>
        <kwd>egocentric camera</kwd>
        <kwd>privacy-preserving</kwd>
        <kwd>just-in-time</kwd>
        <kwd>audio interventions</kwd>
        <kwd>context-based</kwd>
        <kwd>on-device deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
data to the cloud for deep learning raises
privacy concerns.</p>
      <p>Context tracking and context-aware interven- We present a wearable system, called PAL,
tions are key for behavior change, and re- with multimodal context sensing (audio,
egosearch has used emotional, behavioral, and centric visual, and heart-rate), on-device deep
environmental contexts for user tracking, user learning (for privacy-preserving context
demodeling, and context-aware interventions [1, tection), and open-ear audio output (for
seam2]. Recent advances in deep learning allow
us to recognize diferent user contexts,
especially using computer vision and audio
processing. However, sending audio or visual
less just-in-time interventions). Even though
there are several multimodal context-sensing
systems, none combine audio, visual, and
physiological context sensing with
privacypreserving on-device deep learning and
justin-time interventions. PAL also includes a
mobile app for sensor control, intervention
setup, and geolocation and physical activity
sensing.</p>
      <p>We have made our wearable system
opensource, modular, and extensible so that
developers, researchers, and users may use it for
custom context tracking and behavior change put, heart rate, geolocation, and physical
acinterventions in the real world. Our modu- tivity. Egocentric visual context and audio
lar and open-source system facilitates a con- input can give key information about the
usvergent, holistic, and translational approach er’s behavioral and environmental contexts,
towards real-world behavior change to meet especially using deep learning. We also
intethe multiple, diverse, and customized behav- grated an optical pulse sensor as heart rate
ior change support needs of users and variability is a helpful sensor for detecting
researchers [3]. user’s physiological states [4]. Finally, we
added geolocation and physical activity
sensing as they are commonly used by existing
2. Related Work applications [4] and could be easily accessed
via a mobile phone.</p>
      <p>There are several self-tracking applications, We added an on-device deep learning
proe.g., for memory support and activity track- cessor so that sensitive user data, especially
ing [2] and mental health tracking [4]. Be- audio-visual data, could be processed on
dehavior change interventions are also common vice, not on cloud, in a privacy-preserving,
[5], including context-aware interventions [1]. ofline, and real-time manner.
Deep learning is also used, e.g., for activity We added open-ear audio output to
proand context recognition [6]. However, there vide real-time, minimally-disruptive, and
priare no deep learning systems for just-in-time vate interventions in real-world mobile
conbehavior change interventions using privacy- texts.
preserving on-device deep learning,
multimodal audio, visual, and physiological
context sensing, and open-ear interventions. We 4. Implementation
created PAL, an open-source wearable
system with multimodal context sensing, privacy- We created a wearable system, called PAL
(Fpreserving on-device deep learning, and open- igure 1). PAL’s wearable device supports
coear audio interventions, to enable convergent ntext-tracking and interventions, and PAL’s
and customizable deep learning-based context- mobile/ web app enables data visualizations,
aware behavior change support. sensor control, and intervention-setting. The
open-source implementation is available here:
https://github.com/minakhan01/PAL_Weara
3. Design ble_Opensource/.</p>
      <p>We used Google Coral for on-device deep
We aimed to leverage deep learning for per- learning. We placed a camera on the ear to
sonalized and privacy-preserving context-aw- capture similar contexts as the eyes, without
are behavior change support. We chose mul- being too prominent on the face. We added
timodal context sensors, on-device deep learn- an optical heart rate sensor on the earlobe as
ing, and open-ear audio interventions for con- earlobe Pulse Photoplethysmographic (PPG)
text-aware support, and enable custom goal- Heart Rate Variability (HRV) is comparable
setting, intervention contexts, and interven- to Electrocardiographic (ECG) HRV [7]. We
tion messages using a customizable if-CON- used a mini-speaker for open-ear audio
outTEXT-then-INTERVENTION framework. put, a microphone for audio input, and
Goo</p>
      <p>We decided on five types of contexts for gle’s Places and Activity Recognition APIs for
sensing – egocentric camera view, audio in- geolocation and physical activity sensing.
context tracking, e.g., food tracking using food
detection models or mood tracking using
speech processing, and for context-based
behavior change support, e.g., using just-in-time
adaptive interventions tailored to the user’s
contexts [1] . More sensors can also be added
to PAL’s wearable system or paired via
Bluetooth, e.g. for continuous glucose monitoring
along with visual food tracking.</p>
      <p>The on-ear camera captures ~70% of a
visual context ~1m away. We performed initial
evaluations of PAL’s pulse rate sensor with 5
people (3 male, 2 female,  =23.4 years,  =2.56 6. Conclusion
years). We compared PAL’s PPG-HRV to HRV
from Series 4 Apple Watch ECG and Zephyr We combined previous research in
contextBiopatch ECG. Apple Watch does 30s record- aware behavior change support and recent
ings and we did 10 30s recordings per partic- advances in on-device deep learning to
creipant, with each participant simultaneously ate a wearable system, called PAL, with
mulwearing PAL and Apple Watch. For Zephyr timodal context sensing, privacy-preserving
Biopatch, we had 30 minute recordings per on-device deep learning, and open-ear audio
participant, with each participant simultane- interventions. PAL supports customizable
cously wearing PAL, Empatica E4, and Zephyr. ontext tracking and context-aware
intervenThe results are in Table 1. PAL consumes tions to enable diverse and personalized
con~0.3A and a 2500 mAh battery lasts ~5 hours. text-aware behavior change support. We
used PAL to improve real-world habit-support
interventions using personalized visual
con5. Applications texts [8], and have open-sourced PAL so that
other users, researchers, and developers can
We used PAL with low-shot, custom-trainable, use PAL’s multimodal context sensing,
privand on-device deep learning models for vi- acy-preserving on-device deep learning, and
sual context detection and our findings show open-ear audio interventions for diverse,
conthat interventions in personalized visual con- vergent, and customizable context-aware
betexts improve real-world habit-support inter- havior change support in wearable contexts.
ventions [8]. PAL can be further used for more
uating heart rate variability in healthy
subjects in short- and Long-Term
record[1] I. Nahum-Shani, S. N. Smith, B. J. Spring, ings, Sensors 18 (2018).</p>
      <p>L. M. Collins, K. Witkiewitz, A. Tewari, [8] M. Khan, G. Fernandes, A. Vaish,
S. A. Murphy, Just-in-Time adaptive M. Manuja, A. Stibe, P. Maes,
Iminterventions (JITAIs) in mobile health: proving context-aware habit-support
Key components and design principles interventions using egocentric visual
for ongoing health behavior support, contexts, in: International Conference
Ann. Behav. Med. 52 (2018) 446–462. on Persuasive Technology, Springer,
[2] C. Xia, P. Maes, The design of artifacts 2021.</p>
      <p>for augmenting intellect, in:
Proceedings of the 4th Augmented Human
International Conference, AH ’13,
Association for Computing Machinery, New
York, NY, USA, 2013, pp. 154–161. URL:
https://doi.org/10.1145/2459236.2459263.</p>
      <p>doi:10.1145/2459236.2459263.
[3] M. Khan, G. Fernandes, P. Maes, Users
want diverse, multiple, and personalized
behavior change support: Need-finding
survey, in: International Conference on</p>
      <p>Persuasive Technology, Springer, 2021.
[4] E. Reinertsen, G. D. Cliford, A review
of physiological and behavioral
monitoring with digital sensors for
neuropsychiatric illnesses, Physiol. Meas. 39 (2018)
05TR01.
[5] C. Pinder, J. Vermeulen, B. R. Cowan,</p>
      <p>R. Beale, Digital behaviour change
interventions to break and form habits, ACM
Trans. Comput. -Hum. Interact. 25 (2018)
15:1–15:66.
[6] V. Radu, C. Tong, S. Bhattacharya,</p>
      <p>N. D. Lane, C. Mascolo, M. K. Marina,
F. Kawsar, Multimodal Deep
Learning for Activity and Context
Recognition, Proceedings of the ACM on
Interactive, Mobile, Wearable and Ubiquitous
Technologies 1 (2018) 157:1–157:27. URL:
https://doi.org/10.1145/3161174. doi:10.</p>
      <p>1145/3161174.
[7] B. Vescio, M. Salsone, A. Gambardella,</p>
      <p>A. Quattrone, Comparison between
electrocardiographic and earlobe pulse
photoplethysmographic detection for
eval</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>