<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Behavior in Subspace: Dimensionality Reduction + Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <email>csygliu@comp.hkbu.edu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhonglei Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobey H. Ko</string-name>
          <email>tobeyko@hku.hk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P.R. China</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Hong Kong Baptist University</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong</institution>
          ,
          <country country="HK">Hong Kong</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>HKBU Institute of Research and Continuing Education</institution>
          ,
          <addr-line>Shenzhen</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>Automated detection of human behavior in a social setting has drawn considerable interests in recent years. In this working notes paper, we describe our system developed for human behavior analysis. The system is composed of two components: 1) a dimensionality reduction module that maps the original data to a subspace; and 2) a classifier module that classifies the test data based on the labels of training data in the learned subspace. The developed system is evaluated on</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Automated detection of human behavior in a social setting
has drawn considerable interests in recent years. Unlike the
human behavior analysis focusing on a single person,
detection of human behavior in a social setting emphasizes
more on the dynamics between diferent participants in a
social event, where indicators such as the participants’ speech
pattern, their body language, and movements of body can
be used to deduce valuable implications in understanding
how human behavior in a social setting can contribute to
the personal and/or career progression of an individual.
Naturally, analyzing audio content recorded during the social
event would yield a series of valuable information, such as
participants’ speech pattern, the pitch, tone, and pacing of
how each individual speak, or even content covered during
the discussion, that would help in identifying potential social
traits in an individual’s personal and career development.
However, these audio contents may very often contain
sensitive information in which major security concerns may arise
in recording and using such content. As a result, alternative
measures are being explored to discover human behavior in
social setting in a less privacy-invasive way. In the MediaEval
2018 Human Behavior Analysis Task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], people’s body
movement, as recorded by a tri-axial accelerator, along with other
accompanying visual features are provided to participants in
a hope to derive efective alternative approaches to analyze
human behavior in a social setting without the use of audio
content.
      </p>
      <p>Copyright held by the owner/author(s).</p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>
        In this section, we introduce our system designed for the
human behavior analysis task. The developed system is
composed of two components. The first component is a
dimensionality reduction module that maps the original data to a
subspace. The motivation of using dimensionality reduction
to learn the subspace is that the original high-dimensional
feature space often contains redundant or even noisy
information, which may afect the eficiency and accuracy. In our
system, we choose principal component analysis (PCA) [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]
for dimensionality reduction as it is eficient and easy to
interpret. The second component a classifier module that
classifies the test data based on the labels of training data in
the learned subspace. In our system, we choose the nearest
neighbor (NN) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] method for classification because of, again,
its eficiency and interpretability.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dimensionality Reduction via</title>
    </sec>
    <sec id="sec-4">
      <title>Principal Component Analysis</title>
      <p>Given the training data matrix X = [x1, x2, ..., x], where
x ∈ R denotes the feature vector of the -th data sample,
PCA aims to learn a  ×  transformation matrix W, which
maps the original data to the -dimensional subspace, with
the data variance being maximumly preserved. To achieve
this goal, PCA maximizes the following objective function:</p>
      <p>W
W = arg max ︁( W (X − x¯1 )(X − x¯1 ) W ,
︁)
problem:
where x¯ = ∑︀=1 x/ and 1 denotes the  × 1 vector with
all entries being 1. By further introducing a scaling constraint
= I, the optimal W that maximizes Eq. (1) is
composed of the normalized eigenvectors corresponding to the
 largest eigenvalues of the following eigen-decomposition
(X − x¯1 )(X − x¯1 ) w =  w.</p>
      <p>After obtaining the transformation matrix W, we can
map the original high-dimensional data sample x in both
training and test sets to the low-dimensional subspace by:
y = W x.
(1)
(2)
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Classification via Nearest Neighbor</title>
    </sec>
    <sec id="sec-6">
      <title>Method</title>
      <p>For a given test data sample, NN assigns the class label of
test sample’s nearest neighbor in the training set to the test
sample. Specifically, given the low-dimensional representation
data sample y is decided by the following function:
of the training set, i.e., {y1, y2, ..., y}, the label of a test
(y) = (︁ arg min (y, y) ,</p>
      <p>︁)
=1y,·  ,
(3)
where (y) denotes the label of y, and (y, y) denotes the
distance between y and y. In this paper, we utilize the widely
used Euclidean distance as the distance metric.
3</p>
    </sec>
    <sec id="sec-7">
      <title>RESULTS AND ANALYSIS</title>
      <p>We evaluate the performance of our system on the MediaEval
2018 Human Behavior Analysis Task. The dataset is
composed of two parts: 1) The development set with 54 subjects.
The video for each subject is 22 minutes (i.e., 1, 320 seconds)
long. So we have 54 ×</p>
      <p>1, 320 = 71, 280 training samples in
total; 2) The test set with 16 subjects. The video for each
subject is also 22 minutes (i.e., 1, 320 seconds) long. So we
have 54 ×</p>
      <p>1, 320 = 21, 120 test samples in total.</p>
      <p>We use three types of features to construct our original
data representation: 1) Colorhist: we calculate the standard
deviation of 20 frames’ colorhist as the representative of that
second, and the dimension is 128; 2) LBP: we calculate the
standard deviation of 20 frames’ LBP as the representative
of that second, and the dimension is 256; 3) Accel: for each
frame, this feature is 3-dimensional, and we concatenate
these 3-D feature of all 20 frames as the representative of
that second, the dimension is 60. For Acceleration, Video,
and Fusion, we submit two runs for each of them.
rectly.
subspace.</p>
      <p>and perform NN classification directly.
∙ For Run 1 of Acceleration, we use 60-D Accel feature
∙ For Run 2 of Acceleration, we use PCA to project
60-D Accel feature to a 10-D subspace, and perform
NN classification in the learned subspace.
∙ For Run 1 of Video, we use 384-D feature (Colorhist
+ LBP) and perform NN classification directly.
∙ For Run 2 of Video, we use PCA to project 384-D
feature (Colorhist + LBP) to a 50-D subspace, and
perform NN classification in the learned subspace.
∙ For Run 1 of Fusion, we use 444-D feature (Colorhist
+ LBP + Accel) and perform NN classification
di∙ For Run 2 of Fusion, we use PCA to project 444-D
feature (Colorhist + LBP + Accel) to a 50-D
subspace, and perform NN classification in the learned
However, they may lack the ability to extract suficient
discriminative information from the original feature space for
classification. Second, the label provided by NN is binary,
2 of Accel, Video, and Fusion on the MediaEval 2018
Human Behavior Analysis Task.</p>
      <p>ID
2
3
15
17
26
39
40
43
51
54
59
65
67
80
83
85
Mean
Std</p>
      <p>Accel
whereas the evaluation criterion ROC-AUC requires
probabilities. The inconsistency between them may further degrade
the performance. Third, the feature set we have used may
not be suficient to capture all the discriminative information
embedded in the original videos. In addition to the
observation on overall performance, we also see that compared with
the Video feature, the Accel feature plays a more important
role in classification, even its dimension is low than the Video
feature’s dimension. Moreover, by comparing Run 1 and Run
2, we find that PCA does not really improve the performance,
which motivates us to seek more powerful dimensionality
reduction methods for the task in the future.
4</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>This working notes paper introduces our system design for
identifying human behavior and shows the results of our
system on the MediaEval 2018 Human Behavior Analysis
Task. The unsatisfactory results motivate us to use more
informative features and seek for more powerful dimensionality
reduction and classification methods (such as deep neural
networks) in the future.</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported in part by the National Natural
Science Foundation of China (NSFC) under Grant 61503317, in
part by the General Research Fund (GRF) from the Research
Grant Council (RGC) of Hong Kong SAR under Project
HKBU12202417, and in part by the SZSTI Grant with the
Projct Code JCYJ20170307161544087.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cabrera-Quiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gedik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hung</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>No-Audio Multimodal Speech Detection in Crowded Social Settings task at MediaEval 2018</article-title>
          . In Mediaeval 2018 Workshop.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hotelling</surname>
          </string-name>
          .
          <year>1933</year>
          .
          <article-title>Analysis of a complex of statistical variables into principal components</article-title>
          .
          <source>Journal of Educational Psychology</source>
          <volume>24</volume>
          ,
          <issue>7</issue>
          (
          <year>1933</year>
          ),
          <fpage>498</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Jolliffe</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Principal component analysis</article-title>
          . Springer Verlag, New York.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Laaksonen</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Oja</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>Classification with learning knearest neighbors</article-title>
          .
          <source>In Proceedings of International Conference on Neural Networks (ICNN'96)</source>
          , Vol.
          <volume>3</volume>
          .
          <fpage>1480</fpage>
          -
          <lpage>1483</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>