<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adversarial and Cooperative Correlated Domain Adaptation based Multimodal Emotion Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jie-Lin Qiu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaoshi Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kai Hu</string-name>
          <email>sjtu_hukai@sjtu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Shanghai Jiao Tong University</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Southeast University</institution>
          ,
          <addr-line>Nanjing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose a new model, Adversarial and Cooperative Correlated Domain Adaptation (ACCDA), to make multimodal emotion recognition. Adversarial and Cooperative Correlated Domain Adaptation (ACCDA) is an extension and unity, which unifies adversarial discriminative domain adaptation and cooperative generative domain adaptation with deep canonical correlation analysis to train highly correlated domains of multiple physiological data (EEG and eye movement signals), which make use of their complentarity and relevance. In experiments on two real world datasets, we find that our model can significantly contribute to higher emotion classification accuracy when higher correlation is acquired. Our experiment results indicate that the Adversarial and Cooperative Correlated Domain Adaptation model performs better than the state-of-theart methods with a mean accuracy of 88.64% for four emotion classification on SEED IV dataset. It also outperforms than the state-of-the-art results on DEAP dataset with a mean accuracy of 86.15% for two dichotomies.</p>
      </abstract>
      <kwd-group>
        <kwd>Emotion Recognition</kwd>
        <kwd>EEG</kwd>
        <kwd>Eye Movement</kwd>
        <kwd>Domain Adaptation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Multimodal emotion recognition from electroencephalography (EEG) and eye
movement features have attracted increasing interest. Integrating this information with
fusion technologies is attractive for constructing robust emotion recognition models. The
combination of signals from the central nervous system, EEG, and external behaviors,
eye movement, has been reported to be a promising approach [
        <xref ref-type="bibr" rid="ref1 ref2 ref21 ref3 ref4 ref5 ref6 ref7 ref8 ref9">1,2,3,4,5,6,7,8,9,21</xref>
        ].
      </p>
      <p>
        Domain adaptation methods attempt to mitigate the harmful effects of domain shift.
Recent domain adaptation methods learn deep neural transformations that map both
domains into a common feature space. This is generally achieved by optimizing the
representation to minimize some measure of domain shift such as maximum mean
discrepancy [
        <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
        ] or correlation distances [
        <xref ref-type="bibr" rid="ref13 ref14">13,14</xref>
        ]. An alternative is to reconstruct
the target domain from the source representation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Adversarial adaptation methods
have become an increasingly popular incarnation of this type of approach which seeks
to minimize an approximate domain discrepancy distance through an adversarial
objective with respect to a domain discriminator. These methods are closely related to
generative adversarial learning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which pits two networks against each other a generator
corresponding author
and a discriminator. The generator is trained to produce images in a way that confuses
the discriminator, which in turn tries to distinguish them from real image examples. In
domain adaptation, this principle has been employed to ensure that the network cannot
distinguish between the distributions of its training and test domain examples [
        <xref ref-type="bibr" rid="ref17 ref18">17,18</xref>
        ].
However, each algorithm makes different design choices such as whether to use a
generator, which loss function to employ, or whether to share weights across domains. For
example, [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] share weights and learn a symmetric mapping of both source and target
images to the shared feature space, while [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] decouple some layers thus learning a
partially asymmetric mapping. Eric Tzeng et al. combined discriminative modeling, untied
weight sharing, and a GAN loss to form ADDA model and outperformed unsupervised
adaptation results [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        However, since the gradient computation requires back propagation through the
generators output, GAN can only model the distribution of continuous variables,
making it non-applicable for discrete sequences generation. Researchers then proposed
Sequence Generative Adversarial Network (SeqGAN) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which uses model-free policy
gradient algorithm to optimize the original GAN objective. With SeqGAN, the
expected JSD between current and target discrete data distribution is minimized if the training
is perfect. SeqGAN shows observable improvements in many tasks. Since then, many
variants of SeqGAN have been proposed to improve its performance. However,
SeqGAN is not an ideal algorithm for this problem, and current algorithms based on it
cannot show stable, reliable and observable improvements that cover all scenarios. So
Lu et al. proposed CoT for training generative models that measure a tractable density
function for target data [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For multimodal emotion, Lu et al. used both EEG
signals and eye movement signals to recognize three types of emotions [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Liu et al.
furthermore used Bimodal Deep AutoEncoder to extract high level representation
features [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Tang et al. adopted the Bimodal Deep Denoising AutoEncoder modal, taking
Bimodal-LSTM model into account [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>In this paper, we combine adversarial and cooperative networks to propose a new
domain adaptation framework, named Adversarial and Cooperative Correlated Domain
Adaptation (ACCDA), where correlation calculated between different models in high
dimension which can take advantage of complementary of multiple models and tends
out to achieve remarkable results. Our results demonstrate the complementary of
adversarial and cooperative networks, which indicates a new direction for multiple model
based tasks.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <sec id="sec-2-1">
        <title>Adversarial and Cooperative Correlated Domain Adaptation</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Domain adaptation</title>
      <p>
        DA is a branch of transfer learning (i.e., transductive learning within the same
feature space [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]). The source domain is denoted by Ds = fXs; Ysg, in which Xs =
fxs1; xs2; :::; xsng is the input and Ys = fys1; ys2; :::; ysng is the corresponding label
set. The values of Xs and Ys are drawn from the joint distribution P (Xs; Ys).
Similarly, the target domain denoted by Dt = fXt; Ytg corresponds to data and labels drawn
from the joint distribution P (Xt; Yt). In this paper, we consider unsupervised domain
adaptation, which means label information from the target domain is not required.
Typically, the marginal distributions of the input data are different between source domain
and target domain: P (Xs) 6= P (Xt). This is usually referred to as domain shift and
is considered to be the key problem that leads to poor performance when a model is
trained and tested on data from different domains. To eliminate the influence of domain
shift, feature-based domain adaptation methods try to find a proper transformation
function ( ) that aligns the data into a new feature space where P ( (Xs)) = P ( (Xt)).
      </p>
    </sec>
    <sec id="sec-4">
      <title>Domain-Adversarial Neural Network</title>
      <p>
        DANN was first proposed in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and its properties and applications are then
further explored in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The model can be divided into the following three parts: a feature
extractor Gf , a label predictor Gy , and a domain classifier Gd. There exists adversarial
relationships between the feature extractor and the domain classifier. The feature
extractor, as the name implies, extracts new features from input features: f = Gf (x; f ).
Here x denotes input feature vector and f denotes the corresponding output feature
vector in a new feature space. The outputs are then fed into the label predictor and the
domain classifier. The label predictor provides predictions of the corresponding labels:
y^ = Gy (f ; y ). The domain classifier distinguishes which domain the input is from:
d^ = Gd(f ; d). The three parts are updated simultaneously with the objective function:
(1)
(2)
      </p>
      <p>N
E( f ; y ; d) = X Ly (y^i; yi)
i=1</p>
      <p>N
X Ld(d^i; di)
i=1
where the first term Ly ( ; ) is the loss for label prediction, and Ld( ; ) corresponds to
the loss for domain classification. The update rule is designed as follows:
( ^f ; ^y ) = arg min E( f ; y ; ^d); ^d = arg max E( ^f ; ^y ; d)</p>
      <p>f ; y d
It can be observed that the label predictor and domain classifier are trained so that
the corresponding losses are minimized. The feature extractor is trained so that the
label prediction loss is minimized while the domain classification loss is maximized.
So the feature extractor is trying to extract features that are good for label prediction,
but not easy to distinguish which domain the features come from. In this way, the feature
extractor is to extract domain invariant features, so the domain shift can be eliminated.</p>
    </sec>
    <sec id="sec-5">
      <title>Adversarial Discriminative Domain Adaptation</title>
      <p>
        Similar to DANN, ADDA also can be divided into three parts, except that there are
two feature extractors, one for source domain data and another for target domain data
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Let Gf 0 and Gf 1 be the corresponding feature extractors for source domain and
target domain, respectively. The training procedure is two-stage. In the first stage, Gf 0
and the label predictor Gy are trained with source domain data so that the prediction
loss is minimized. After the training, the parameters of Gf 0 and Gy are fixed during the
following process. In the second stage, Gf 1 is initialized with the parameters of Gf 0.
Then Gf 1 and Gd are trained adversarially: Gd is trained to discriminate source domain
data and target domain data, while Gf 1 is trained to fool Gd. So, after the training, the
feature extractor Gf 1 aligns the distribution of the target domain data to that of the
source domain data.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Canonical Correlation Analysis</title>
      <p>
        Canonical correlation analysis is an algorithm to learn the non-linear transformation
of parameters of two random vectors in order to maximize the correlation between them
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Let (X1; X2) 2 Rn1 Rn2 denote random vectors with covariances( 11; 22)
and crosscovariance 12. CCA finds maximally correlated pairs of linear projections of
the two views,(!10X1; !20X2):
(!1 ; !2 ) = arg max corr(!10X1; !20X2) = arg max p
!1;!2 !1;!2
When finding multiple pairs of vectors (!1i; !2i) ,subsequent projections are also
constrained to be uncorrelated with previous ones, which is !1i 11!1j = !2i 22!2j = 0
for i &lt; j . Assembling the top k projection vectors !1i into the columns of a matrix
A1 2 Rn1 k, and similarly placing !2i into A2 2 Rn2 k, we obtain the following
formulation to identify the top k min(n1; n2) projections:
maximize : tr(A10 12A2); subject to : A10 11A1 = A20 22A2 = I
(5)
      </p>
    </sec>
    <sec id="sec-7">
      <title>Cooperative Generative Model</title>
      <p>
        Lu et al. proposed Cooperative Training (CoT) for training generative models that
measure a tractable density function for target data [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. CoT coordinately trains a
generator G and an auxiliary predictive mediator M . The training target of M is to estimate
a mixture density of the learned distribution G and the target distribution P , and that of
G is to minimize the Jensen-Shannon divergence estimated through M . CoT achieves
independent success without the necessity of pre-training via Maximum Likelihood
Estimation or involving high-variance algorithms like REINFORCE. This low-variance
algorithm is theoretically proved to be unbiased for both generative and predictive tasks.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Our Model</title>
      <p>The overall architecture of the Adversarial and Cooperative Correlated Domain
Adaptation (ACCDA) is shown in Figure 1. There are two views of networks which
contain EEG signals and eye movement signals respectively. It consists of three parts: an
adversarial discriminative domain adaptation (ADDA) network, an cooperative
generative domain adaptation (CGDA) network, and a deep canonical correlation analysis
network. The two domain adaptation networks will be trained simultaneously and
independently, the DCCA network is applied to extract more highly correlated source
and target map of both views. We describe the details of different components in the
following paragraphs.</p>
      <p>For adversarial discriminative domain adaptation network, we first pre-train a source
encoder network using labeled source examples. Next, we perform adversarial
adaptation by learning a target encoder network such that a discriminator that sees encoded
source and target examples cannot reliably predict their domain label. During testing,
target images are mapped with the target encoder to the shared feature space and
classified by the source classifier. Source network’s pre-trained network parameters will be
fixed and transmitted to target network. In unsupervised adaptation, we assume access
to source images Xs and labels Ys drawn from a source domain distribution ps(x; y),
as well as target images Xt drawn from a target distribution pt(x; y), where there are
no label observations. Domain adaptation instead learns a source representation
mapping, Ms , along with a source classifier, Cs, and then learns to adapt that model for
use in the target domain. We regularize the learning of the source and target mappings,
Ms and Mt, so as to minimize the distance between the empirical source and target
mapping distributions: Ms(Xs) and Mt(Xt). The source classification model is then
trained using the standard supervised loss:
min Lcls(Xs; Yt) = E(xs;ys) (Xs;Ys)
Ms;C</p>
      <p>K
X I[y = ks]logC(Ms(xs))
k=1
(6)
A domain discriminator D is optimized according to a standard supervised loss:
LadvD (Xs; Xt; Ms; Mt) = Exs Xs [logD(Ms(xs))] Ext Xt [log(1 D(Ms(xs)))]
(7)
Once the mapping parameterization is determined for the source, target mapping is set
so as to minimize the distance between the source and target domains under their
respective mappings, while crucially also maintaining a target mapping that is category
discriminative. Consider a layered representations where each layer parameters are
denoted as, Msl or Mtl, for a given set of equivalent layers, fl1; :::; lng. Then the space
of constraints explored in the literature can be described through layerwise equality
constraints as follows:</p>
      <p>(Ms; Mt) , f li (Msli ; Mtli )gi2f1;:::;ng
where each individual layer can be constrained independently. A very common form of
constraint is source and target layerwise equality:
li (Msli ; Mtli ) = (Msli = Mtli )
(8)</p>
      <p>We choose to allow independent source and target mappings by untying the weights.
This is a more flexible learning paradigm as it allows more domain specific feature
extraction to be learned. However, note that the target domain has no label access, and
thus without weight sharing a target model may quickly learn a degenerate solution if
we do not take care with proper initialization and training procedures. Therefore, we use
the pre-trained source model as an initialization for the target representation space and
fix the source model during adversarial training. In doing so, we are effectively learning
an asymmetric mapping, in which we modify the target model so as to match the source
distribution. This is most similar to the original generative adversarial learning setting,
where a generated space is updated until it is indistinguishable with a fixed real space.
Therefore, we choose the inverted label GAN loss:</p>
      <p>LadvM (Xs; Xt; D) =</p>
      <p>Ext Xt [logD(Mt(xt))]
(10)</p>
      <p>
        As for cooperative generative domain adaptation network, inspired by Lu et al.’s
work [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], we set the CGDA with similar structure compared with adversarial
discriminative domain adaptation network, only replace discriminator with generator. So the
domain generator G loss is:
      </p>
      <p>LgenG (Xs; Xt; Ms; Mt) = Es pdata [log(M (xs))] + Es G [log(M (xs))] (11)
where M is the mediator, a predictive module that measures a mixture distribution
of the learned generative distribution G and target latent distribution P = pdata as
M = 12 (P + G ).</p>
      <p>
        For deep canonical correlation analysis, we take advantage of complementarity of
EEG and eye movement signals and train an DCCA network to extract highly
correlated domain of both views. DCCA is proposed by Galen Andrew et al., which is a
non-linear version of CCA that uses neural networks as the mapping functions instead
of linear transformers [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. DCCA directly optimizes the correlation between the
two views’ potential learning representation. Retrieval can be performed by the cosine
distance when given the correlated embedding representations of the two views. We
regard the source and target domain of two views as input respectively. The layer size of
both views are the same, including input layer L1, hidden layers L2, and output layer
L3 with nodes of each layer are fully connected. When training, we first use the deep
networks to extract features, then we calculate the correlation at the output layer with
canonical correlation analysis. The goal is to jointly learn parameters for both views’
W and b, where W 2 Rc1 n1 is a matrix of weights, b 2 Rc1 is a vector of biases,
and c1 is the units of each intermediate layer in the network for the first view, such
that corr(f1(X1); f2(X2)) is as high as possible, where f ( ) is the whole function of
each view’s network. We define H1 and H2 matrices whose columns are the top-level
representations produced by the deep models on the two views in layer L3 with k
eigenvalues, and the total correlation of H1 and H2 is the sum of the k singular values of the
matrix:
      </p>
      <p>T =</p>
      <p>1=2
11
12 221=2; lcorr = corr(H1; H2) = jjT jjtr = tr(T 0T )1=2
(12)
Weights of nodes update using back propagation. The DCCA parameters Wlv and blv
are trained to use gradient-based optimization to optimize this quantity.
where
3</p>
      <sec id="sec-8-1">
        <title>Experiment</title>
        <p>3.1</p>
        <p>
          Dataset
We evaluate the performance of the approaches on two real-world datasets: the SEED
IV1 dataset, and the DEAP2 dataset [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. The SEED IV dataset contains EEG and eye
movement signals in total of four emotions [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. There were 72 film clips in total for
four emotions and forty five experiments were taken by participants to assess their
emotions when watching the film clips with keywords of emotions and ratings out of ten
points for two dimensions: valence and arousal. The valence scale ranges from sad to
happy. The arousal scale ranges from calm to excited. The EEG signals were recorded
with ESI NeuroScan System at a sampling rate of 1000 Hz with a 62-channel electrode
cap. The eye movement signals were recorded with SMI ETG eye tracking glasses.
The DEAP dataset contains EEG signals and peripheral physiological signals of 32
participants. Signals were collected while participants were watching one-minute-long
emotional music videos. We chose 5 as the threshold to divide the trials into two classes
according to the rated levels of arousal and valence. We used 5-fold cross validation to
compare with Liu et al. [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and Yin et al. [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ].
1 http://bcmi.sjtu.edu.cn/ seed/
2 http://www.eecs.qmul.ac.uk/mmv/datasets/deap/
3.2
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Feature Extraction</title>
      <p>For SEED IV dataset, we extracted Differential Entropy (DE) features from each EEG
signal channel in five frequency bands: (1-4 Hz), (4-8 Hz), (8-14Hz), (14-31 Hz)
and (31-50 Hz). The size of Hanning window used when extracting EEG features was
4 s. At each time step, there were totally 310 (5 bands 62 channels) dimensions for
EEG features. As for eye movement data, the features used are shown in Fig 2 and Table
2. There were totally 39 dimensions including both Power Spectral Density (PSD) and
DE features of pupil diameters at each time step. Before training the model, the features
were normalized to zero mean. One view contains EEG features and the other contains
eye movement features.</p>
      <p>
        For DEAP dataset, we extracted DE features from EEG signals in four frequency
bands: (4-8 Hz), (8-14 Hz), (14-31 Hz) and (31-50 Hz), since a bandpass
frequency filter from 4 - 45 Hz was applied during pre-processing. The size of Hanning
windows was 2 s. Then there were totally 128 (4 bands 32 channels)
dimensions of extracted 32-channel EEG features. As for peripheral physiological signals, six
time-domain features were extracted to describe the signals in different perspective,
including maximum value, minimum value, mean value, standard deviation, variance and
squared sum. So there were totally 48 (6 features 8 channels) dimensions of extracted
peripheral physiological features.
Our objective now is to perform domain adaptation between different subjects. The
leave-one-subject-out cross-validation algorithm is applied, which means, for each
domain adaptation method there are a few runs, and for each run the data from one of
the subjects are regarded as target domain while the data from other subjects as source
domains. Multi-layer perceptrons (MLPs) are used for the feature extractors, the label
predictors, and the domain classifiers in the adversarial domain adaptation networks.
Adam optimizer [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] was adopted for training of the networks to obtain faster
convergence. We performed randomized search of the hyperparameters over some
predefined sets of values. For each method, the hyperparameter settings were evaluated with
the leave-one-subject-out cross-validation algorithm and the best setting was chosen to
generate the final results. For DCCA networks, we use Grid Search to find optimal
hyperparameters. The specific predefined value sets for some of the hyperparameters are
listed in Table 1. After extracting features by DCCA, we apply SVM for classification.
For SEED IV dataset, we regard Zheng et al.’s multimodal deep learning results as our
baseline [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. We use different kinds of methods to make comparison with our
model. Table 3 demonstrates that BDAE achieved better results than SVM based feature
fusion. Compared with CCA based approach and other method, we conclude that
ACCDA model which coordinated signals achieved better results. Table 4 shows comparison
results of different methods on DEAP dataset. For two dichotomous classification, Liu
et al.’s multimodal autoencoder model achieved 2% higher than AutoEncoder. Yin et
al. used an ensemble of deep classifiers, making higher-level abstractions of
physiological features [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Then Tang et al. used Bimodal-LSTM and achieved the
state-ofthe-art accuracy for two dichotomous classification [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. As for our ACCDA method,
we learned correlation of multiple domain signals and achieved better results than the
state-of-the-art method with mean accuracies of 85.86% and 86.45% for arousal and
valence classification tasks.
In comparison with previous feature-level fusion and multimodal deep learning method,
it is very difficult to relate the original features in one modality to features in other
modality and this method usually learns unimodal features [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Moreover, the
relations across various modalities are deep instead of shallow. In our model, we can learn
coordinated representation from high-level signals and make two views of signals
become more complementary, which in return improves the classification performance
of fusion features. To find out the complementarity of adversarial and cooperative
domain adaptation, we verified the performance of multiple networks, which means both
view’s ADDA networks, both view’s CGDA networks to compare with ADDA-CGDA
and CGDA-ADDA method.
In this paper, we proposed a new method, Adversarial and Cooperative Correlated
Domain Adaptation (ACCDA), to make multimodal emotion recognition on three real
world datasets. The model learns correlation from high-level domains due to the
complementarity and relevance of multiple signals. The experimental results have shown
that our model contributes to higher classification accuracy of emotion recognition with
high correlation.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Soleymani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pantic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pun</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Multimodal emotion recognition in response to videos</article-title>
          .
          <source>IEEE Trans. Affective Computing</source>
          ,
          <volume>3</volume>
          ,
          <fpage>211</fpage>
          -
          <lpage>223</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>DMello</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Westlund</surname>
            ,
            <given-names>J. K.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>A review and meta-analysis of multimodal affect detection systems</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>47</volume>
          :
          <issue>43</issue>
          :
          <fpage>143</fpage>
          :
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Picard</surname>
            ,
            <given-names>R. W.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>Affective computing</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bocharov</surname>
            ,
            <given-names>A. V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Knyazev</surname>
            ,
            <given-names>G. G.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Savostyanov</surname>
            ,
            <given-names>A. N.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Depression and implicit emotion processing: An eeg study</article-title>
          .
          <source>Neurophysiologie clinique 47</source>
          <volume>3</volume>
          :
          <fpage>225230</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Tzirakis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Trigeorgis,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ; Nicolaou,
          <string-name>
            <given-names>M. A.</given-names>
            ;
            <surname>Schuller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. W.</given-names>
            ; and
            <surname>Zafeiriou</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>End-toend multimodal emotion recognition using deep neural networks</article-title>
          .
          <source>IEEE Journal of Selected Topics in Signal Processing</source>
          <volume>11</volume>
          :
          <fpage>13011309</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hassib</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schneega</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Eiglsperger,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Henze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ; and
            <surname>Alt</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2017b</year>
          .
          <article-title>Engagemeter: A system for implicit audience engagement sensing using electroencephalography</article-title>
          .
          <source>In CHI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , X.-W.;
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Emotional state classification from eeg data using machine learning ap- proach</article-title>
          .
          <source>Neurocomputing</source>
          <volume>129</volume>
          :
          <fpage>94106</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hassib</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pfeiffer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schneega</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Rohs,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Alt</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2017a</year>
          .
          <article-title>Emotion actuator: Embodied emotional feedback through electroencephalography and electrical muscle stim- ulation</article-title>
          .
          <source>In CHI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zheng</surname>
          </string-name>
          , W.-L.;
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>B.-N.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Multimodal emotion recognition using eeg and eye tracking data</article-title>
          .
          <source>2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society</source>
          <volume>50405043</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zheng</surname>
          </string-name>
          ,W.-L.;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;andLu,
          <string-name>
            <surname>B.-L.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Combining eye movements and eeg to enhance emotion recognition</article-title>
          .
          <source>In IJCAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tzeng</surname>
          </string-name>
          , E.;
          <string-name>
            <surname>Hoffman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Zhang, N.;
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and Darrell,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Deep domain confusion: Maximizing for domain invariance</article-title>
          .
          <source>CoRR abs/1412</source>
          .3474.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M. I.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Learning transferable features with deep adaptation networks</article-title>
          .
          <source>In ICML.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Feng</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Return of frustratingly easy domain adaptation</article-title>
          .
          <source>In AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,andSaenko,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Deepcoral:Correlationalignment for deep domain adaptation</article-title>
          .
          <source>In ECCV Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ghifary</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kleijn</surname>
            ,
            <given-names>W. B.</given-names>
          </string-name>
          ; Zhang,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Balduzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ; and
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Deep reconstructionclassification networks for unsupervised domain adaptation</article-title>
          .
          <source>In ECCV.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pouget-Abadie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Mirza,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. C.</surname>
          </string-name>
          ; and Bengio,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Generative adversarial nets</article-title>
          .
          <source>In NIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ganin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lempitsky</surname>
            ,
            <given-names>V. S.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Unsupervised domain adaptation by backpropagation</article-title>
          .
          <source>In ICML.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; Zhang, W.;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Seqgan: Sequence generative adversarial nets with policy gradient</article-title>
          .
          <source>In AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tzeng</surname>
          </string-name>
          , E.;
          <string-name>
            <surname>Hoffman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Saenko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and Darrell,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Adversarial discriminative domain adaptation</article-title>
          .
          <source>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 29622971.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; Zhang, W.; and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Cot: Cooperative training for generative modeling</article-title>
          . CoRR abs/
          <year>1804</year>
          .03782.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zheng</surname>
          </string-name>
          ,W.-L.;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;andLu,
          <string-name>
            <surname>B.-L.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Combining eye movements and eeg to enhance emotion recognition</article-title>
          .
          <source>In IJCAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , W.; Zheng, W.-L.; and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Emotion recognition using multimodal deep learning</article-title>
          .
          <source>In ICONIP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Tang</surname>
            , H.; Liu,
            <given-names>W.</given-names>
          </string-name>
          ; Zheng, W.-L.; and
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Multimodal emotion recognition using deep neural networks</article-title>
          .
          <source>In ICONIP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Qiao</surname>
            , R.; Qing,
            <given-names>C.</given-names>
          </string-name>
          ; Zhang, T.;
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A novel deep-learning based framework for multi-subject emotion recognition</article-title>
          .
          <source>2017 4th International Conference on Information,CyberneticsandComputationalSocialSystems (ICCSS) 181185.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Hotelling</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>1936</year>
          .
          <article-title>Relations between two sets of variates</article-title>
          . Biometrikae.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Andrew</surname>
            , G.; Arora,
            <given-names>R.</given-names>
          </string-name>
          ; Bilmes,
          <string-name>
            <given-names>J. A.</given-names>
            ; and
            <surname>Livescu</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Deep canonical correlation analysis</article-title>
          .
          <source>In ICML.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Koelstra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mhl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Soleymani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.-S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yazdani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ebrahimi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pun</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nijholt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Patras</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Deap: A database for emotion analysis ;using physiological signals</article-title>
          .
          <source>IEEE Transactions on Affective Computing</source>
          <volume>3</volume>
          :
          <fpage>18</fpage>
          31.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Zheng</surname>
          </string-name>
          , W.-L.;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Lu, Y.; liang Lu, B.;
          <article-title>and Cichock- i,</article-title>
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Emotionmeter: A multimodal framework for recognizing human emotions</article-title>
          .
          <source>IEEE transactions on cyber- netics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.;</given-names>
          </string-name>
          and Zhang,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Recognition of emotions using multimodal physiological signals and an ensemble deep learning model</article-title>
          .
          <source>Computer methods and programs in biomedicine 140:93110.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>CoRR abs/1412</source>
          .6980.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Ngiam</surname>
          </string-name>
          , J.;
          <string-name>
            <surname>Khosla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nam</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Multimodal deep learning</article-title>
          .
          <source>In ICML.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>