<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automated Pain Detection in Facial Videos of Children using Human-Assisted Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaojing Xu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kenneth D. Craig</string-name>
          <email>kcraig@psych.ubc.ca</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damaris Diaz</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthew S. Goodwin</string-name>
          <email>m.goodwin@northeastern.edu</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Murat Akcakaya</string-name>
          <email>akcakaya@pitt.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Busra Tugce Susam</string-name>
          <email>tugcebusraiu@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeannie S. Huang</string-name>
          <email>jshuang@ucsd.edu</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Virginia R. de Sa</string-name>
          <email>desa@ucsd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Cognitive Science, University of California San Diego</institution>
          ,
          <addr-line>La Jolla, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Electrical and Computer Engineering, University of California San Diego</institution>
          ,
          <addr-line>La Jolla, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Electrical and Computer Engineering, University of Pittsburgh</institution>
          ,
          <addr-line>Pittsburgh, PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Department of Health Sciences, Northeastern University</institution>
          ,
          <addr-line>Boston, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Department of Psychology,University of British Columbia Vancouver</institution>
          ,
          <addr-line>BC</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Rady Childrens Hospital and Department of Pediatrics, University of California San Diego</institution>
          ,
          <addr-line>CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Accurately determining pain levels in children is di cult, even for trained professionals and parents. Facial activity provides sensitive and speci c information about pain, and computer vision algorithms have been developed to automatically detect Facial Action Units (AUs) de ned by the Facial Action Coding System (FACS). Our prior work utilized information from computer vision, i.e. automatically detected facial AUs to develop classi ers to distinguish between pain and no-pain conditions. However, application of pain/no-pain classi ers based on automated AU codings across di erent environmental domains resulted in diminished performance. In contrast, classi ers based on manually coded AUs demonstrated reduced environmentally-based variability in performance. To improve classi cation performance in the current work, we applied transfer learning by training another machine learning model to map automated AU codings to a subspace of manual AU codings to enable more robust pain recognition performance when only automatically coded AUs are available for the test data. With this transfer learning method, we improved the Area under the ROC Curve (AUC) on independent data (new participants) from our target data domain from 0:69 to 0:72.</p>
      </abstract>
      <kwd-group>
        <kwd>automated pain detection</kwd>
        <kwd>transfer learning</kwd>
        <kwd>facial action units</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the classic model of machine learning, scientists train models on training data
to accurately detect a desired outcome and apply learned models to new data
measured under identical circumstances to validate their performance. Given the
real world and its notable variation, it is tempting to apply learned models to
data measured under similar but not identical circumstances. However,
performance in such circumstances often deteriorates because of unmeasured factors
not accounted for between original and new datasets. Nevertheless, lessons can
be learned from similar scenarios. Transfer learning or inductive transfer in
machine learning focuses on storing knowledge gained while solving one problem
and applying it to a di erent but related problem [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We describe application of
transfer learning to the important clinical problem of pain detection in children.
      </p>
      <p>
        Accurate measurement of pain severity in children is di cult, even for trained
professionals and parents. This is a critical problem as over-medication can result
in adverse side-e ects, including opioid addiction, and under-medication can lead
to unnecessary su ering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The current clinical gold standard and most widely employed method of
assessing clinical pain is patient self-report [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, this method is subjective
and vulnerable to bias. Consequently, clinicians often distrust pain self-reports,
and nd them more useful for comparisons over time within individuals, rather
than comparisons between individuals [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Further, infants and older children
with communication/neurologic disabilities do not have the ability or capacity
to self-report pain levels [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As a result, to evaluate pain in populations
with communication limitations, observational tools based on behavioral
nonverbal indicators associated with pain have been developed [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Of the various modalities of nonverbal expression (e.g., bodily movement,
vocalizations), facial activity can provide the most sensitive, speci c, and
accessible information about the presence, nature, and severity of pain across the life
span, from infancy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] through to advanced age [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Observers largely consider
facial activity during painful events to be a relatively spontaneous reaction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Evaluation of pain based on facial indicators requires two steps: (1)
Extraction of facial pain features and (2) pain recognition based on these features. For
step (1), researchers have searched for reliable facial indicators of pain, such as
the anatomically-based, objectively coded Facial Action Units (AUs) de ned by
the Facial Action Coding System (FACS) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (Visualizations of the facial
activation units can be found at
https://imotions.com/blog/facial-action-codingsystem/). However, identifying these AUs traditionally requires time intensive
o ine coding by trained human coders, limiting their application in real-time
clinical settings. Recently, algorithms to automatically detect AUs [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have been
developed and implemented in software such as iMotions (imotions.com)
allowing automatical output of AU probabilities in real-time based on direct recording
of face video. In step (2), Linear models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], SVM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and Neural Networks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
have been used to recognize pain based on facial features. In this paper, we rst
combine iMotions and Neural Networks to build an automated pain recognition
model.
      </p>
      <p>
        Although a simple machine learning model based on features extracted by
a well-designed algorithm can perform well when training data and test data
have similar statistical properties, problems arise when the data follow di
erent distributions. We discovered this issue when training videos were recorded
in one environment/setting and test videos in another. One way to deal with
this problem is to use transfer learning, which discovers \common knowledge"
across domains and uses this knowledge to complete tasks in a new domain with
a model learned in the old domain [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In this paper, we show that features
extracted from human-coded (manual) AU codings are less sensitive to domain
changes than features extracted from iMotions (automated) AU codings, and
thus develop a simple method that learns a projection from automated features
onto a subspace of manual features. Once this mapping is learned, future
automatically coded data can be automatically transformed to a representation that
is more robust between domains.
      </p>
      <p>To summarize, our contributions in this paper include:
{ Demonstration that environmental factors modulate the ability of
automatically coded AUs to recognize clinical pain in videos
{ Demonstration that manually coded AUs (especially previously established
\pain-related" ones) can be used to successfully recognize pain in video with
machine learning across di erent environmental domains
{ Development of a transfer learning method to transfer automated features
to the manual feature space that improves automatic recognition of clinical
pain across di erent environmental domains
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>
        Participants
143 pediatric research participants (94 males, 49 females) aged 12[
        <xref ref-type="bibr" rid="ref10 ref15">10,15</xref>
        ]
(median [25%, 75%]) years old and primarily Hispanic (78%) who had undergone
medically necessary laparoscopic appendectomy were videotaped for facial
expressions during surgical recovery. Participating children had been hospitalized
following surgery for post-surgical recovery and were recruited for participation
within 24 hours of surgery at a pediatric tertiary care center. Exclusion
criteria included regular opioid use within the past 6 months, documented mental or
neurologic de cits preventing study protocol compliance, and any facial anomaly
that might alter computer vision facial expression analysis. Parents provided
written informed consent and youth gave written assent. The local institutional
review board approved the research protocol.
2.2
      </p>
      <sec id="sec-2-1">
        <title>Experimental Design and Data Collection</title>
        <p>
          Data were collected over 3 visits (V): V1 within 24 hours after appendectomy;
V2 within the calendar day after the rst visit; and V3 at a follow-up visit 25
[19, 28] (median [25%, 75%]) days postoperatively when pain was expected to
have fully subsided. Data were collected in two environmental conditions: V1
and V2 in hospital and V3 in the outpatient lab. At every visit, two 10-second
videos (60 fps at 853x480 pixel resolution) of the face were recorded while manual
pressure was exerted at the surgical site for 10 seconds (equivalent of a clinical
examination). In the hospital visits (V1, V2), the participants were lying in
the hospital bed with the head of the bed raised. In V3, they were seated in a
reclined chair. Participants rated their pain level during the pressure using a 0-10
Numerical Rating Scale, where 0 = no-pain and 10 = worst pain ever. Following
convention for clinically signi cant pain [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], videos with pain ratings of 0-3 were
labeled as no-pain, and videos with pain ratings of 4-10 were labeled as pain, for
classi cation purposes. 324 pain videos were collected from V1/2, 195 no-pain
videos were collected from V1/2, and 235 no-pain videos were collected from
V3. Figure 1 demonstrates the distribution of pain and no-pain videos across
environmental domains.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Feature Extraction</title>
        <p>For each 10-second video sample, we extracted AU codings per frame to obtain
a sequence of AUs. This was done both automatically by iMotions software
(www.imotions.com) and manually by a trained human in a limited subset. We
then extracted features from the sequence of AUs.</p>
        <p>
          Automated Facial Action Unit Detection: The iMotions software
integrates Emotient's FACET technology (www.imotions.com/emotient) which was
formally known as CERT [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. In the described work, the iMotions software was
used to process the videos to automatically extract 20 AUs (AU 1, 2, 4, 5, 6, 7,
9, 10, 12, 14, 15, 17, 18, 20, 23, 24, 25, 26, 28, 43) and three head pose
indicators (yaw, pitch and roll) from each frame. The values of these codings are the
estimated log probabilities of AUs, ranging from -4 to 4.
        </p>
        <p>Manual Facial Action Unit Detection: A trained human FACS AU coder
manually coded 64 AUs (AU1-64) for each frame of a subset of videos by labeling
the AU intensities (0-5, 0 = absence).</p>
        <p>Feature Dimension Reduction: The number of frames in our videos were
too large to use full sequences of frame-coded AUs. To reduce dimensionality, we
applied 11 statistics (mean, max, min, standard deviation, 95th, 85th, 75th, 50th,
25th percentiles, half-recti ed mean, and max-min) to each AU over all frames
to obtain 11 23 features for automatically coded AUs, and 11 64 features
for manually coded AUs. We call these automated features and manual features,
respectively. The range of each feature was rescaled to [0; 1] to normalize features
over the training data.
Neural Network Model to Recognize Pain with Extracted Features:
A neural network with 1 hidden layer was used to recognize pain with extracted
automated or manual features. The number of neurons in the hidden layer was
twice the number of neurons in the input layer, and the Sigmoid activation
function (x) = 1=(1 + exp( x)) was used with batch normalization for the
hidden layer. The output layer used cross-entropy error.</p>
        <p>Neural Network Model to Predict Manual Features with Automated
Features: A neural network with the same structure was used to predict manual
features from automated features, except that the output layer was linear and
mean squared error was used as the loss function.</p>
        <p>Model Training and Testing: Experiments were conducted in a
participantbased (each participant restricted to one fold) 10-fold cross-validation fashion.
Participants were divided into 10 folds, and each time 1 fold was used as the test
set, and the other 9 folds together were used as the training set. We balanced
classes for each participant in each training set by duplicating samples from
the under-represented class. 1=9 participants in the training set were picked
randomly as a nested-validation set for early stopping in the neural network
training. A batch size of 1=8 the size of training set was used. We examined the
receiver operating characteristic curve (ROC curve) which plots True Positive
Rate against False Positive Rate as the discrimination threshold is varied. We
used the Area under the Curve (AUC) to evaluate the performance of classi ers.
We considered data from 3 domains (D) as shown in Figure 1: (1) D1 with pain
and no-pain both from V1/2 in hospital, (2) D2 with pain from V1/2 in hospital
and no-pain from V3 from outpatient lab, and (3) All data, i.e., pain from V1/2
and no-pain from V1/2/3. The clinical goal was to be able to discriminate pain
levels in the hospital; thus evaluation on D1 (where all samples were from the
hospital bed) was the most clinically relevant evaluation.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Analysis and Discussion</title>
      <p>Data of 73 participants labeled by both human and iMotions were used through
section 3.1 to 3.5, and data of the remaining 70 participants having only
automated (iMotions) AU codings were used for independent test set evaluation in
the results section.
Using automated features, we rst combined all visit data and trained a classi er
to distinguish pain from no-pain. This classi er performed well in general (AUC
= 0:77 0:011 on All data), but when we looked at di erent domains, the
performance of D1 (the most clinically relevant in-Hospital environment) was much
inferior to that on D2, as shown in data rows 1 and 4 under the \Automated"
column in Table 1.</p>
      <p>There were two main di erences between D1 and D2, i.e. between V1/2 and
V3 no-pain samples. The rst was that in V1/2, patients usually still had some
pain and their self-ratings were greater than 0, while in V3, no-pain ratings were
usually 0 re ecting a \purer" no-pain signal. The second di erence was that
V1/2 happened in hospital with patients in beds and V3 videos were recorded
in an outpatient lab with the patient sitting in a reclined chair. Since automated
recognition of AUs is known to be sensitive to facial pose and lighting di
erences, we hypothesized that the added discrepancy in classi cation performance
between D1 and D2 was mainly due to the model classifying based on
environmental di erences between V1/2 and V3. In other words the classi er when
trained and tested on D2, might be classifying \lying in hospital bed" vs \more
upright in outpatient chair" as much as pain vs no-pain. (This is similar to doing
well at recognizing cows by recognizing a green background).</p>
      <p>In order to investigate this hypothesis and attempt to improve classi cation
on the clinically relevant D1, we trained a classi er using only videos from D1.
Within the \Automated" column, row 2 in Table 1 shows that performance on
automated D1 classi cation doesn't drop much when D2 samples are removed
from the training set. At the same time, training using only D2 data results in
the worst classi cation on D1 (row 3), but the best classi cation on D2 (last row)
as the network is able to exploit the environmental di erences (no-pain+more
upright from V3, pain+lying-down from V1/2).</p>
      <p>Figure 2 (LEFT) shows ROC curves of within and across domain tests for
models trained on automated features in D2. The red dotted curve corresponds
to testing on D2 (within domain) and the blue solid curve corresponds to testing
on D1 (across domain). The model did well on within domain classi cation, but
failed on across domain tasks.
Trained with D2 Automated Features
1</p>
      <p>T1rained with D2 Manual Features Trai1ned with D2 Manual "Pain" Features</p>
      <p>Classi cation Based on Manual AUs Are Less Sensitive to
Environmental Changes
We also trained a classi er on manual AUs labeled by a human coder.
Interestingly, results from the classi er trained on manual AUs showed less of a di erence
in AUCs between the domains, with a higher AUC for D1 and a lower AUC for
D2 relative to those with the automated AUs (see Table 1 \Manual" and
\Automated" columns). The manual AUs appeared to be less sensitive to changes in
the environment re ecting the ability of the human labeler to consistently code
AUs without being a ected by lighting and pose variations.</p>
      <p>When we restricted training data from All to only D1 or only D2 data,
classication performance using manual AUs went down, likely due to the reduction in
training data, and training with D2 always gave better performance than
training with D1 on both D1 and D2 test data, which should be the case since D2
is higher in \pain" quality. These results appear consistent with our hypothesis
that human coding of AUs is not as sensitive as machine coding of AUs to the
environmental di erences between V1/2 and V3.</p>
      <p>
        Figure 2 (MIDDLE) displays the ROC curves for manual features. As
discussed above, in contrast to the plot on the left for automated features, manual
coding performance outperformed automated coding performance in the
clinically relevant test in D1. The dotted red curve representing within domain
performance is only slightly higher than the solid blue curve, likely due in part
to the quality di erence in no-pain samples in V1/2 and V3 and also possibly
due to any small amount of environmental information that the human labeler
was a ected by. Note that ignoring the correlated environmental information
in D2 (pain faces were more reclined and no-pain faces were more upright)
resulted in a lower numerical performance on D2 but does not likely re ect worse
classi cation of pain.
In an attempt to reduce the in uence of environmental conditions to further
improve accuracy on D1, we restricted the classi er to the eight AUs that have
been consistently associated with pain: 4 (Brow Lowerer), 6 (Cheek Raiser),
7 (Lid Tightener), 9 (Nose Wrinkler), 10 (Upper Lip Raiser), 12 (Lip Corner
Puller), 20 (Lip Stretcher), and 43 (Eyes Closed) [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ] to obtain 11 (statistics)
8 (AUs) features. Pain prediction results using these \pain" features are shown
in the last two columns in Table 1. Results show that using only pain-related
AUs improved the classi cation performances of manual features. However, it
did not seem to help as much for automated features.
      </p>
      <p>Similarly, Figure 2 (RIGHT) shows that limiting manual features to use only
pain-related AUs further improved D1 performance when training with D2. We
also performed PCA on these pain-related features and found that performance
in the hospital environmental domain was similar if using 4 or more principal
components.
Computer Vision AU automatic detection algorithms have been programmed/
trained on manual FACS data. However, we demonstrated di erential
performance of AUs encoded automatically versus manually. To understand the
relationship between automatically encoded v. manually coded AUs, we computed
correlations between automatically coded AUs and manually coded AUs at the
frame level as depicted in Figure 3. If two sets of AUs were identical, the
diagonal of the matrix (marked with small centered dots) should yield the highest
correlations, which was not the case. For example, manual AU 6 was highly
correlated with automated AU 12 and 14, but had relatively low correlation with
automated AU 6.</p>
      <p>The correlation matrix shows that not only are human coders less a ected
by environmental changes, the AUs they code are not in agreement with the
automated AUs. (We separately had another trained human coder code a subset
of the videos and observed closer correlation between the humans than between
each human and iMotions). This likely explains the reduced improvement from
restricting the automated features model to \pain-related AUs" as these have
been determined based on human FACS coded AUs.</p>
      <p>Transfer Learning via Mapping to Manual Features Improves
Performance
We have shown that manual codings are not as sensitive to domain change.
However, manual coding of AUs is very time-consuming and not amenable to an
automated real-time system. In an attempt to leverage manual coding to achieve
similar robustness with automatic AUs, we utilized transfer learning and mapped
automated features to the space of manual features. Speci cally, we trained a
machine learning model to estimate manual features from automated features
using data coded by both iMotions and a human. Separate models were trained
to predict: manual features of 64 AUs, manual features of the eight pain-related
AUs, principal components (PCs) of the manual features of the eight pain-related
AUs. PCA dimensionality reduction was used due to insu cient data for learning
a mapping from all automated AUs to all manual AUs.</p>
      <p>Once the mapping network was trained, we used it to transform the
automated features and train a new network on these transformed data for the
pain/no-pain classi cation. The 10-fold cross-validation was done consistently
so that the same training data was used to train the mapping network and the
pain-classi cation network.</p>
      <p>In Table 2, we show the classi cation AUCs when the classi cation model was
trained and tested with outputs from the prediction network. We observed that
when using All data to train (which had the best performance), with the
transfer learning prediction network, automated features performed much better in
classi cation on D1 (0:68 0:69 compared to 0:61 0:63 in Table 1). Predicting 4
principal components of manual pain-related features gave the best performance
on our data. Overall, the prediction network helped in domain adaptation of a
pain recognition model using automatically extracted AUs.</p>
      <p>Figure 5 (LEFT) plots the ROC curves using the transfer learning classi er
within and across domains trained and tested using 4 predicted features.
ComClassification</p>
      <p>pain/no pain class
4 Classification
4 features
3 PCA</p>
      <p>6
Manual pain-related features (11x8)</p>
      <p>Regression</p>
      <p>Regression
pared to Figure 2 (LEFT), the transferred automated features showed properties
more like manual features, with smaller di erences between performances on the
two domains and higher AUC on the clinically relevant D1. Table 2 shows
numerically how transfer learning helped automated features to ignore environmental
information in D2 like humans, and learn pure pain information which can also
be used in classi cation on D1.</p>
      <p>Within domain classi cation performance for D1 was also improved with
the prediction network. These results show that by mapping to the manual
feature space, automated features can be promoted to perform better in pain
classi cation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In the previous section, we showed that in Figure 4, classi cation with
painrelated pain features (2) performed better than automated features (1) on D1,
which was the clinically relevant classi cation. We have also found that applying
PCA to manual features (3-4) didn't change the performance on D1 much. Thus
we introduced a transfer learning model to map automated features rst to
manual pain-related features (or the top few principal components of them), and
then used the transferred features for classi cation (6-2, or 5-4). We got similar
results to manual features on D1 with transfer learning model (5-4) mapping to
4 principal components of manual features.</p>
      <p>In this section we report on the results from testing our
transfer learning method on a new separate dataset (new participants),
which has only automated features. Table 1 shows that without our method,
training on all data and restricting to pain-related AUs resulted in the best
performance for D1. And cross-validation results in Table 2 shows that with our
method, predicting 4 PCs yielded the best performance for D1. With these
optimal choices of model structure and training domain, we trained two models
Tr1ained with D2 Transferred Features re0.9
iittrsvoeaeeuTPR0000....4682 ittrscupupaneogaeo00000.....67548
v</p>
      <p>A0.30</p>
      <p>Test with D1
00 0.2 0.4 0.6 Test0w.8ith D2 1</p>
      <p>False Positive Rate</p>
      <p>Output Pain Score on D1</p>
      <p>5
True pain level
10
using all the data in the previous sections labeled by both iMotions and human,
and tested the model on a new separate data set (new participants) only labeled
by iMotions (D1, D2). Our model with transfer learning (AUC=0:72 0:009)
performed better than the model without it (AUC=0:69 0:033) on D1 with
p-value=5:4585e 04.</p>
      <p>Figure 5 (RIGHT) plots output pain scores of our model tested on D1 versus
0-10 self-reported pain levels. The model output pain score increases with true
pain level indicating that our model indeed re ects pain levels.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In the described work, we recognized di erences in classi er model performance
(on pain vs no-pain) across data domains that re ected environmental di
erences as well as di erences re ecting how the data were encoded (automatically
v. manually). We then introduced a transfer learning model to map automated
features rst to manual pain-related features (or principal components of them),
and then used the transferred features for classi cation (6-2, or 5-4 in Figure 4).
This allowed us to leverage data from another domain to improve classi er
performance on the clinically relevant task of distinguishing pain levels in the hospital.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>
        This work was supported by National Institutes of Health National Institute of
Nursing Research grant R01 NR013500 and by IBM Research AI through the
AI Horizons Network. Many thanks to Ryley Unrau for manual FACS coding
and Karan Sikka for sharing his code and ideas used in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>West</surname>
          </string-name>
          , Dan Ventura, and
          <string-name>
            <given-names>Sean</given-names>
            <surname>Warnick</surname>
          </string-name>
          .
          <article-title>Spring research presentation: A theoretical foundation for inductive transfer</article-title>
          . Brigham Young University,
          <source>College of Physical and Mathematical Sciences</source>
          ,
          <volume>1</volume>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brenna L Quinn</surname>
          </string-name>
          ,
          <string-name>
            <surname>Esther Seibold</surname>
            , and
            <given-names>Laura</given-names>
          </string-name>
          <string-name>
            <surname>Hayman</surname>
          </string-name>
          .
          <article-title>Pain assessment in children with special needs: A review of the literature</article-title>
          .
          <source>Exceptional Children</source>
          ,
          <volume>82</volume>
          (
          <issue>1</issue>
          ):
          <volume>44</volume>
          {
          <fpage>57</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Ghada</given-names>
            <surname>Zamzmi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chih-Yun</surname>
            <given-names>Pai</given-names>
          </string-name>
          , Dmitry Goldgof, Rangachar Kasturi,
          <string-name>
            <given-names>Yu</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Terri</given-names>
            <surname>Ashmeade</surname>
          </string-name>
          .
          <article-title>Machine-based multimodal pain assessment tool for infants: a review</article-title>
          .
          <source>preprint arXiv:1607.00331</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carl L Von Baeyer</surname>
          </string-name>
          .
          <article-title>Childrens self-report of pain intensity: what we know, where we are headed</article-title>
          .
          <source>Pain Research and Management</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <volume>39</volume>
          {
          <fpage>45</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Karan</given-names>
            <surname>Sikka</surname>
          </string-name>
          ,
          <article-title>Alex A Ahmed, Damaris Diaz, Matthew S Goodwin, Kenneth D Craig, Marian S Bartlett, and Jeannie S Huang. Automated assessment of childrens postoperative pain using computer vision</article-title>
          . Pediatrics,
          <volume>136</volume>
          (
          <issue>1</issue>
          ):e124{
          <fpage>e131</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Min</surname>
            <given-names>SH</given-names>
          </string-name>
          Aung, Sebastian Kaltwang, Bernardino Romera-Paredes, Brais Martinez, Aneesha Singh,
          <string-name>
            <surname>Matteo Cella</surname>
          </string-name>
          , Michel Valstar, Hongying Meng, Andrew Kemp,
          <article-title>Moshen Sha zadeh</article-title>
          , et al.
          <article-title>The automatic detection of chronic pain-related expression: requirements, challenges and the multimodal emopain dataset</article-title>
          .
          <source>IEEE transactions on a ective computing</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <volume>435</volume>
          {
          <fpage>451</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Kamal</given-names>
            <surname>Kaur</surname>
          </string-name>
          <string-name>
            <surname>Sekhon</surname>
          </string-name>
          , Samantha R Fashler, Judith Versloot,
          <string-name>
            <given-names>Spencer</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Kenneth D</given-names>
            <surname>Craig</surname>
          </string-name>
          .
          <article-title>Childrens behavioral pain cues: Implicit automaticity and control dimensions in observational measures</article-title>
          .
          <source>Pain Res Manag.</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ruth</surname>
            <given-names>VE</given-names>
          </string-name>
          <string-name>
            <surname>Grunau and Kenneth D Craig</surname>
          </string-name>
          .
          <article-title>Pain expression in neonates: facial action and cry</article-title>
          .
          <source>Pain</source>
          ,
          <volume>28</volume>
          (
          <issue>3</issue>
          ):
          <volume>395</volume>
          {
          <fpage>410</fpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Hadjistavropoulos</surname>
          </string-name>
          , Keela Herr,
          <string-name>
            <surname>Kenneth M Prkachin</surname>
            ,
            <given-names>Kenneth D</given-names>
          </string-name>
          <string-name>
            <surname>Craig</surname>
            , Stephen J Gibson,
            <given-names>Albert</given-names>
          </string-name>
          <string-name>
            <surname>Lukas</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jonathan H Smith.</surname>
          </string-name>
          <article-title>Pain assessment in elderly adults with dementia</article-title>
          .
          <source>The Lancet Neurology</source>
          ,
          <volume>13</volume>
          (
          <issue>12</issue>
          ):
          <volume>1216</volume>
          {
          <fpage>1227</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Paul</given-names>
            <surname>Ekman</surname>
          </string-name>
          and Wallace V Friesen.
          <article-title>Measuring facial movement</article-title>
          .
          <source>Environmental psychology and nonverbal behavior</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <volume>56</volume>
          {
          <fpage>75</fpage>
          ,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Brais</surname>
            <given-names>Martinez</given-names>
          </string-name>
          , Michel F Valstar,
          <string-name>
            <surname>Bihan Jiang</surname>
            , and
            <given-names>Maja</given-names>
          </string-name>
          <string-name>
            <surname>Pantic</surname>
          </string-name>
          .
          <article-title>Automatic analysis of facial actions: A survey</article-title>
          .
          <source>IEEE Trans on A ective Computing</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Ahmed Bilal Ashraf, Simon Lucey, Je rey F Cohn, Tsuhan Chen, Zara Ambadar,
          <string-name>
            <surname>Kenneth M Prkachin</surname>
            , and
            <given-names>Patricia E</given-names>
          </string-name>
          <string-name>
            <surname>Solomon.</surname>
          </string-name>
          <article-title>The painful face{pain expression recognition using active appearance models</article-title>
          .
          <source>Image and vision computing</source>
          ,
          <volume>27</volume>
          (
          <issue>12</issue>
          ):
          <volume>1788</volume>
          {
          <fpage>1796</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Md Maruf Monwar and
          <string-name>
            <given-names>Siamak</given-names>
            <surname>Rezaei</surname>
          </string-name>
          .
          <article-title>Pain recognition using arti cial neural network</article-title>
          .
          <source>In Signal Processing and Information Technology</source>
          ,
          <source>2006 IEEE International Symposium on</source>
          , pages
          <volume>28</volume>
          {
          <fpage>33</fpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sinno</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pan</surname>
            and
            <given-names>Qiang</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Trans on knowledge and data engineering</source>
          ,
          <volume>22</volume>
          (
          <issue>10</issue>
          ):
          <volume>1345</volume>
          {
          <fpage>1359</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <article-title>DL Ho man, A Sadosky, EM Dukes, and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Alvir</surname>
          </string-name>
          .
          <article-title>How do changes in pain severity levels correspond to changes in health status and function in patients with painful diabetic peripheral neuropathy</article-title>
          .
          <source>Pain</source>
          ,
          <volume>149</volume>
          (
          <issue>2</issue>
          ):
          <volume>194</volume>
          {
          <fpage>201</fpage>
          , May
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Gwen</surname>
            <given-names>Littlewort</given-names>
          </string-name>
          , Jacob Whitehill, Tingfan Wu, Ian Fasel, Mark Frank, Javier Movellan, and
          <string-name>
            <given-names>Marian</given-names>
            <surname>Bartlett</surname>
          </string-name>
          .
          <article-title>The computer expression recognition toolbox (cert)</article-title>
          .
          <source>In Automatic Face &amp; Gesture Recognition and Workshops (FG</source>
          <year>2011</year>
          ), 2011 IEEE International Conference on, pages
          <volume>298</volume>
          {
          <fpage>305</fpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kenneth</surname>
            <given-names>M Prkachin.</given-names>
          </string-name>
          <article-title>The consistency of facial expressions of pain: a comparison across modalities</article-title>
          .
          <source>Pain</source>
          ,
          <volume>51</volume>
          (
          <issue>3</issue>
          ):
          <volume>297</volume>
          {
          <fpage>306</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kenneth</surname>
            <given-names>M Prkachin.</given-names>
          </string-name>
          <article-title>Assessing pain by facial expression: facial expression as nexus</article-title>
          .
          <source>Pain Res Manag.</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <volume>53</volume>
          {
          <fpage>58</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>