INTRODUCTION

Behavior in Subspace: Dimensionality Reduction + Classification

Yang Liu

csygliu@comp.hkbu.edu.hk 0 2

Zhonglei Gu

Tobey H. Ko

tobeyko@hku.hk 1

P.R. China

0 Department of Computer Science, Hong Kong Baptist University , Hong Kong SAR , P.R. China 1 Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong , Hong Kong 2 HKBU Institute of Research and Continuing Education , Shenzhen , P.R. China

2018

29 31

Automated detection of human behavior in a social setting has drawn considerable interests in recent years. In this working notes paper, we describe our system developed for human behavior analysis. The system is composed of two components: 1) a dimensionality reduction module that maps the original data to a subspace; and 2) a classifier module that classifies the test data based on the labels of training data in the learned subspace. The developed system is evaluated on

INTRODUCTION

Automated detection of human behavior in a social setting has drawn considerable interests in recent years. Unlike the human behavior analysis focusing on a single person, detection of human behavior in a social setting emphasizes more on the dynamics between diferent participants in a social event, where indicators such as the participants’ speech pattern, their body language, and movements of body can be used to deduce valuable implications in understanding how human behavior in a social setting can contribute to the personal and/or career progression of an individual. Naturally, analyzing audio content recorded during the social event would yield a series of valuable information, such as participants’ speech pattern, the pitch, tone, and pacing of how each individual speak, or even content covered during the discussion, that would help in identifying potential social traits in an individual’s personal and career development. However, these audio contents may very often contain sensitive information in which major security concerns may arise in recording and using such content. As a result, alternative measures are being explored to discover human behavior in social setting in a less privacy-invasive way. In the MediaEval 2018 Human Behavior Analysis Task [ 1 ], people’s body movement, as recorded by a tri-axial accelerator, along with other accompanying visual features are provided to participants in a hope to derive efective alternative approaches to analyze human behavior in a social setting without the use of audio content.

APPROACH

In this section, we introduce our system designed for the human behavior analysis task. The developed system is composed of two components. The first component is a dimensionality reduction module that maps the original data to a subspace. The motivation of using dimensionality reduction to learn the subspace is that the original high-dimensional feature space often contains redundant or even noisy information, which may afect the eficiency and accuracy. In our system, we choose principal component analysis (PCA) [ 2, 3 ] for dimensionality reduction as it is eficient and easy to interpret. The second component a classifier module that classifies the test data based on the labels of training data in the learned subspace. In our system, we choose the nearest neighbor (NN) [ 4 ] method for classification because of, again, its eficiency and interpretability. 2.1

Dimensionality Reduction via Principal Component Analysis

Given the training data matrix X = [x1, x2, ..., x], where x ∈ R denotes the feature vector of the -th data sample, PCA aims to learn a × transformation matrix W, which maps the original data to the -dimensional subspace, with the data variance being maximumly preserved. To achieve this goal, PCA maximizes the following objective function:

W W = arg max ︁( W (X − x¯1 )(X − x¯1 ) W , ︁) problem: where x¯ = ∑︀=1 x/ and 1 denotes the × 1 vector with all entries being 1. By further introducing a scaling constraint = I, the optimal W that maximizes Eq. (1) is composed of the normalized eigenvectors corresponding to the largest eigenvalues of the following eigen-decomposition (X − x¯1 )(X − x¯1 ) w = w.

After obtaining the transformation matrix W, we can map the original high-dimensional data sample x in both training and test sets to the low-dimensional subspace by: y = W x. (1) (2) 2.2

Classification via Nearest Neighbor Method

For a given test data sample, NN assigns the class label of test sample’s nearest neighbor in the training set to the test sample. Specifically, given the low-dimensional representation data sample y is decided by the following function: of the training set, i.e., {y1, y2, ..., y}, the label of a test (y) = (︁ arg min (y, y) ,

︁) =1y,· , (3) where (y) denotes the label of y, and (y, y) denotes the distance between y and y. In this paper, we utilize the widely used Euclidean distance as the distance metric. 3

RESULTS AND ANALYSIS

We evaluate the performance of our system on the MediaEval 2018 Human Behavior Analysis Task. The dataset is composed of two parts: 1) The development set with 54 subjects. The video for each subject is 22 minutes (i.e., 1, 320 seconds) long. So we have 54 ×

1, 320 = 71, 280 training samples in total; 2) The test set with 16 subjects. The video for each subject is also 22 minutes (i.e., 1, 320 seconds) long. So we have 54 ×

1, 320 = 21, 120 test samples in total.

We use three types of features to construct our original data representation: 1) Colorhist: we calculate the standard deviation of 20 frames’ colorhist as the representative of that second, and the dimension is 128; 2) LBP: we calculate the standard deviation of 20 frames’ LBP as the representative of that second, and the dimension is 256; 3) Accel: for each frame, this feature is 3-dimensional, and we concatenate these 3-D feature of all 20 frames as the representative of that second, the dimension is 60. For Acceleration, Video, and Fusion, we submit two runs for each of them. rectly. subspace.

and perform NN classification directly. ∙ For Run 1 of Acceleration, we use 60-D Accel feature ∙ For Run 2 of Acceleration, we use PCA to project 60-D Accel feature to a 10-D subspace, and perform NN classification in the learned subspace. ∙ For Run 1 of Video, we use 384-D feature (Colorhist + LBP) and perform NN classification directly. ∙ For Run 2 of Video, we use PCA to project 384-D feature (Colorhist + LBP) to a 50-D subspace, and perform NN classification in the learned subspace. ∙ For Run 1 of Fusion, we use 444-D feature (Colorhist + LBP + Accel) and perform NN classification di∙ For Run 2 of Fusion, we use PCA to project 444-D feature (Colorhist + LBP + Accel) to a 50-D subspace, and perform NN classification in the learned However, they may lack the ability to extract suficient discriminative information from the original feature space for classification. Second, the label provided by NN is binary, 2 of Accel, Video, and Fusion on the MediaEval 2018 Human Behavior Analysis Task.

ID 2 3 15 17 26 39 40 43 51 54 59 65 67 80 83 85 Mean Std

Accel whereas the evaluation criterion ROC-AUC requires probabilities. The inconsistency between them may further degrade the performance. Third, the feature set we have used may not be suficient to capture all the discriminative information embedded in the original videos. In addition to the observation on overall performance, we also see that compared with the Video feature, the Accel feature plays a more important role in classification, even its dimension is low than the Video feature’s dimension. Moreover, by comparing Run 1 and Run 2, we find that PCA does not really improve the performance, which motivates us to seek more powerful dimensionality reduction methods for the task in the future. 4

CONCLUSION

This working notes paper introduces our system design for identifying human behavior and shows the results of our system on the MediaEval 2018 Human Behavior Analysis Task. The unsatisfactory results motivate us to use more informative features and seek for more powerful dimensionality reduction and classification methods (such as deep neural networks) in the future.

ACKNOWLEDGMENTS

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61503317, in part by the General Research Fund (GRF) from the Research Grant Council (RGC) of Hong Kong SAR under Project HKBU12202417, and in part by the SZSTI Grant with the Projct Code JCYJ20170307161544087.

[1]

Cabrera-Quiros ,

Gedik , and

Hung . 2018 . No-Audio Multimodal Speech Detection in Crowded Social Settings task at MediaEval 2018 . In Mediaeval 2018 Workshop.

[2]

Hotelling . 1933 . Analysis of a complex of statistical variables into principal components . Journal of Educational Psychology 24 , 7 ( 1933 ), 498 - 520 .

[3]

Jolliffe . 2002 . Principal component analysis . Springer Verlag, New York.

[4]

Laaksonen and

Oja . 1996 . Classification with learning knearest neighbors . In Proceedings of International Conference on Neural Networks (ICNN'96) , Vol. 3 . 1480 - 1483 .