<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. S. S. Raj);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Eficient Framework for the Clustering of Human Activity Data using Kernelized Robust Covariance Descriptors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guntru Prasanth Kumar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. S. Subodh Raj</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sudhish N. George</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Technology Calicut</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In this paper, a new method for the eficient clustering of human activity data is proposed. Unlike the traditional human activity clustering approaches, our method relies on the skeletal data recorded with the help of motion capture (mocap) systems to achieve the goal. The proposed method is structured around the kernel-based robust covariance descriptor. By introducing a data re-framing technique that eficiently utilizes the temporal properties of the human activity data, we have alleviated the data redundancy and insuficiency issues associated with action sequences. The optimization model developed encompasses the combined benefits of low-rank representation and least square regression. The formulation is strengthened by incorporating the temporal dependency of the human activity sequences with the help of a temporal Laplacian regularizer. With the proposed algorithm, a representation matrix is learned from the raw data, which is then used to perform subspace clustering. Experiments conducted on multiple human activity datasets reveal the ability of the proposed method to achieve better clustering results compared to state-of-the-art counterparts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Human activity data</kwd>
        <kwd>Kernelized covariance descriptors</kwd>
        <kwd>Temporal Laplacian regularization</kwd>
        <kwd>Temporal subspace clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Human activity recognition (HAR) from action sequences remains a challenging research
topic in computer vision due to its multifaceted applications [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. HAR finds application in
visual surveillance, healthcare, human-machine interface, video retrieval, and entertainment
industry, to name a few [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Traditional approaches to HAR use RGB video sequences as the
input. Handcrafted features are later extracted from the video sequences for the purpose of
activity recognition [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Because of the high-dimensional nature of the video sequences, high
computational complexity is often associated with such HAR approaches [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Later, sensor-based
HAR gained popularity. The focus of such methods were on the data obtained from sensors
such as accelerometer and gyroscopes. In such cases, the subject itself needs to be in possession
of the sensor so that the movements of the human body can be recorded. This along with the
influence of noise acts as a limiting factor in sensor based HAR approaches [
        <xref ref-type="bibr" rid="ref1 ref5 ref6">5, 1, 6</xref>
        ]. With the
evolution of mocap systems, new modalities were introduced to represent the human activity
information. Such modalities include the motion depth maps and the skeletal representations
[
        <xref ref-type="bibr" rid="ref7">7, 8</xref>
        ]. With skeletal representations, time series of 3D joint positions of the human body are
recorded by the mocap systems [9]. The spatio-temporal quality of such recordings are superior
that they find application in multiple domains including gait analysis, medical rehabilitation,
and computer animations [
        <xref ref-type="bibr" rid="ref3 ref7">7, 3</xref>
        ].
      </p>
      <p>The common approaches to HAR with mocap data involve supervised learning methods.
Though they guarantee good results, such methods are often susceptible to missing sample
issues. Further, the requirement of a huge clean dataset for initial learning of the system poses a
serious bottleneck to supervised HAR approaches [8, 10]. This paves a foundation for the need
of having unsupervised HAR strategies. Though in unsupervised methods the aforementioned
challenges faced by supervised methods are alleviated, they encounter other limitations. The
main challenge in unsupervised HAR is posed by the fact that the task needs to be performed
in a robust and accurate manner without any prior knowledge about the data samples [11].</p>
      <p>Recent studies showcase the ability of subspace clustering algorithms in dealing with
highdimensional data clustering problems [12]. The key idea in subspace clustering approach
is to identify multiple low-dimensional subspaces from which the data originates [13]. The
subspaces so identified house a cluster of data. This approach is commonly termed as the Union
of Subspaces (UoS) model [14]. The popularity of subspace clustering approach has increased
with the introduction of sparse subspace clustering approach in which a sparsity constraint
is imposed on the coeficients in order to learn a sparse representation of the raw data [ 14].
Low-rank representation learning (LRR) [15] is another technique used with subspace clustering
wherein the global structure of the data is considered while learning the coeficients. In LRR
based approaches, a given dictionary is utilized to learn a low-rank representation of the data
samples. The clustering results obtained with such low-rank representations are usually better
[16]. Least square regression (LSR) [17] based subspace clustering in which the grouping of
data samples are performed with the help of Frobenius norm operator is another promising
approach in subspace clustering. The aforementioned approaches when used independently
will not be suitable for HAR as they do not consider the time series information associated with
the skeletal data. As a solution to this problem we have developed an approach called the time
series activity clustering (TSAC) which utilizes the combined advantages of LRR and LSR. We
have incorporated a kernel-based robust covariance descriptor to extract features out of the raw
input data with the aim of exploring the non-linearity present in the data. Further, the temporal
Laplacian regularizer is employed to capture the temporal dependency among the data samples.</p>
      <p>The following are the main contributions of this work:
1. An unsupervised optimization model with improved performance is formulated for the
clustering of human activity data. To efectively utilize the temporal dependencies of
human activity sequences, a temporal Laplacian regularizer is introduced in the proposed
model.
2. We have blended the LRR and LSR based subspace clustering approaches to achieve better
clustering results. A clean dictionary is learned along with the representation matrix in</p>
      <p>Data
re-framing
Input action
sequences</p>
      <p>Kernelized
covariance
descriptors</p>
      <p>Vectorization
and stacking</p>
      <p>Feature
matrix</p>
      <p>Representation
matrix</p>
      <p>Affinity
matrix</p>
      <p>Subspace
clustering
Data pre-processing and feature matrix generation
Time series activity clustering</p>
      <p>The rest of the work is outlined as follows. The proposed method, the problem formulation,
and the solutions obtained are presented in Section 2. Section 3 explains the experimental
validation done using the proposed method. Finally, conclusions are drawn in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Method</title>
      <p>In this section, we give a detailed outline of the proposed TSAC approach. The workflow of the
proposed approach is shown in Fig. 1.</p>
      <sec id="sec-2-1">
        <title>2.1. Data Re-framing</title>
        <p>A sequence of human action is a collection of action timestamps evolved over a period of
time. Two important observations can be made with reference to a given collection of action
sequences as mentioned below:
1. The number of timestamps corresponding to each of the action sequences may not
necessarily be the same. This disparity often appears as a bottleneck in generalizing any
algorithm dealing with human activity recognition.
2. As the number of timestamps in an action sequence increases, it introduces additional
computation overhead. That will lead to a rise in demand for resource utilization.</p>
        <p>The aforementioned challenges can be addressed by standardizing the number of timestamps
for all the action sequences under consideration. If the number of timestamps is standardized
to be ‘N ’, then data pruning methods need to be employed on action sequences having more
number of timestamps and data augmentation needs to be performed on action sequences
having lesser number of timestamps. The temporal smoothness property of human action
sequences can be conveniently utilized to achieve this goal</p>
        <p>Human action sequences are temporally highly correlated. As far as activity recognition is
concerned, this correlation leads to redundant information. Often, the complete set of frames of
a recorded action sequence is not essential to perform activity recognition. Thus, we introduce
a pruning technique, termed succession pruning, in which the alternative frames of the action
sequence are removed to eliminate redundancy while maintaining the temporal properties of the
action sequence. That will drastically reduce the amount of data to be processed and will also
result in reduced computation overhead and resource utilization. Whereas in action sequences
experiencing insuficiency of timestamps, we perform timestamp augmentation. In this process,
the trailing end of the action sequence gets augmented with the terminal timestamps of the
same action sequence. That is in line with the temporal smoothness property of the human
action data.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Feature Matrix Generation using Kernel-based Robust Covariance</title>
      </sec>
      <sec id="sec-2-3">
        <title>Descriptor</title>
        <p>Human actions are represented in the form of skeletal structures with the help of modern motion
capture systems. Each timestamp of an action sequence can be represented as a collection of
‘n’ joints. Thus, the time stamp ‘t’ of an action sequence can be represented as e() ∈ R3× 
of 3D positions {e1(), . . . , e()}. The 3D coordinates of the iℎ joint of the skeletal structure
corresponding to the tℎ timestamp can be denoted as, e() ∈ [(), (), ()]⊤.</p>
        <p>Once the data re-framing process is completed, the raw action sequences with fixed number
of timestamps are represented in the form of a feature matrix. It involves a two step process.
In the first step, we use the concept of covariance to obtain the covariance descriptor of each
action sequence. The use of covariance will help us to capture the changes pertaining to each
joint of the skeletal structure [18]. If  represents the temporal average of the timestamps of an
action sequence, then the corresponding action sequence can be represented in the form of a
covariance matrix as shown below.</p>
        <p>Ψ =
1</p>
        <p>∑︁[e() −  ][e() −  ]⊤
 − 1 =1
(1)
This process is repeated for each action sequence, resulting in a unique covariance descriptor
for each input sequence.</p>
        <p>Although covariance descriptors finds application in multiple domains, they cannot capture
non-linearity present in the data. For making the feature matrix robust, diferent approaches are
adopted to incorporate additional statistical information along with the covariance descriptor.
This includes the entropy based approaches, the mutual information based approaches, and
the kernel-based approaches. Among others, the kernel-based approaches have been used
to simulate more complex models. The use of kernels improves the descriptive power of the
covariance matrices. The work proposed by Cavazza et al. [18] showcases the benefits of
using kernels in works related to human activity recognition. Motivated by this observation,
we modify the expression given in Eq. (1) to incorporate the kernel function and obtain the
following robust covariance descriptor.
(2)
(3)
Ψ =
1</p>
        <p>∑︁ [︀ (e()) −   ]︀[ (e()) −   ]︀ ⊤
 − 1 =1
Here, (.) represents the kernel function and   is the temporal average of the kernel entries.
The choice of kernel function is application specific. We have used two kernel functions namely
the polynomial kernel and the exponential kernel, out of which the later one have produced
promising results.</p>
        <p>The exponential kernel is defined as:
(e()) = exp {︁</p>
        <p>e() }︁
( + )2
where  &gt; 0 and  is the kernel bandwidth.</p>
        <p>After obtaining the robust covariance descriptors for each action sequence, in the second step
we generate the feature matrix. The covariance descriptor Ψ contains redundant information as
it is symmetric along the main diagonal. In order to reduce the amount of data to be processed,
we vectorize each covariance descriptor by retaining the upper triangular values alone. Later,
we stack (as columns) each of the vectors so obtained to form the feature matrix X ∈ R× .
Here, ‘’ is the length of the individual vectors and ‘k’ is the total number of action sequences.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.3. Temporal Subspace Clustering (TSC) with Laplacian Regularization</title>
        <p>Subspace clustering is performed by generating a representation matrix out of the feature matrix
in state-of-the-art approaches. This is accomplished by using the self representation property of
the feature matrix, resulting in the computation of a set of unique coeficients Y ∈ R×  for the
feature matrix X ∈ R× . This can be mathematically expressed as X = XY. The limitation
of such an approach is that they tend to produce sub-optimal results if the sampling done is not
suficient. Instead of using the self representation property, if we learn an eficient dictionary
W ∈ R× , we can overcome the aforementioned problem. Given a dictionary W, the set of
data samples X can be expressed as X ≈ WY. The set of coeficients Y can be eficiently
obtained in an iterative manner by utilizing the underlying low-rank nature of Y. The low-rank
property of Y can be capture with the help of LRR [15]. Since the rank minimization problem is
inherently NP-hard, nuclear norm can be used as a substitute for rank minimization [17]. It is
also important to obtain the intra-cluster correlation among the data samples to obtain better
clustering results. To this end, we can use the principle of LSR [17]. By yielding the concepts of
LRR and LSR we formulate an optimization problem as follows.</p>
        <p>1
Ym,Win 2 ‖X −</p>
        <p>WY‖F2 +  21 ‖Y‖F2 +  2‖Y‖*
s.t .</p>
        <p>WY‖F2 captures the reconstruction error, ||·||  represents the Frobenius
norm operator, || · || * denotes the nuclear norm operator, and  1 and  2 are the balancing</p>
        <p>Manifold regularization methods are eficient in incorporating the temporal dependency
of data in the problem formulation [19]. Thus, by modifying the general Laplacian
regularizer, which captures the spatial dependency of data, we have developed a temporal Laplacian
regularizer ℒ(· ) as our interest is in the temporal information of the action sequences.</p>
        <p>For a representation matrix Y, the temporal Laplacian regularization function can be defined</p>
        <p>Z is a temporal Laplacian matrix, W̃︁ii = ∑︀=1  , and Z is a weight matrix
(5)
(6)
(7)
(8)
(9)
 =
{︃1 for | − | ≤ 2</p>
        <p>0 otherwise
where  denotes empirically defined threshold value.</p>
        <p>With the introduction of ℒ(Y), Eq. (4) is modified as,
min
Y,W 2</p>
        <p>1</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.4. Solution</title>
        <p>1
‖X −</p>
        <p>WY‖F2 +</p>
        <p>2 ‖Y‖F2 +  2‖Y‖* +  3ℒ(Y) s.t . Y ≥ 0, W ≥ 0
The optimization problem given in Eq. (7) can be solved using the ADMM approach under
the ALM framework. ADMM finds solution for the unconstrained optimization problem by
splitting the problem into multiple sub-problems. As a first step, we will introduce three
auxiliary variables E, F, and G to decouple the terms present in the formulation. This results in
a formulation as shown below.</p>
        <p>min
Y,W,E,F,G 2
1
‖X −</p>
        <p>WY‖F2 +</p>
        <p>21 ‖E‖F2 +  2‖F‖* +  3ℒ(G)
The Augmented Lagrangian corresponding to Eq. (8) is given as:</p>
        <p>L(E, F, G, Y, W) = 21 ‖X −</p>
        <p>WY‖F2 +</p>
        <p>21 ‖E‖F2 +  2‖F‖* +  3tr(GLTG⊤)
+ ⟨Φ1, Y − E⟩ + ⟨Φ2, Y − F⟩ + ⟨Φ3, Y −</p>
        <p>G⟩
+ 2 (‖Y − E‖F2 + ‖Y − F‖F2 + ‖Y −
2.4.3. Updating G:</p>
        <p>F[+1] = argmin  2‖F‖* + ⟨2, Y −</p>
        <p>F⟩ + 2 ‖Y −</p>
        <p>F‖F2</p>
        <p>The update expression for F is found using the singular value thresholding (SVT) operator as
2.4.1. Updating E:
The update expression for E is obtained by solving the following sub-problem.</p>
        <p>E[+1] = argmin  1‖E‖F2 + ⟨Φ1, Y −</p>
        <p>E⟩ + 2 ‖Y −</p>
        <p>E‖F2</p>
        <p>By diferentiating Eq. (10) with respect to E and equating it to zero, the E update is given as,
The update expression for G is obtained by solving the following sub-problem.</p>
        <p>G
G[+1] = argmin  3tr(GLTG⊤) + ⟨Φ3, Y −
G⟩ + 2 ‖Y −</p>
        <p>G‖F2
By diferentiating Eq. (14) with respect to G and equating it to zero, the G update is given as,
G[+1] = (︀ Φ[3]</p>
        <p>+  Y[])︀(  3(LT + LT⊤) +  I)︀ − 1
2.4.4. Updating Y:
By solving the following sub-problem, the update expression for Y can be obtained.</p>
        <p>Y[+1] = argmin</p>
        <p>Y
1
2
‖X −</p>
        <p>WY‖F2 + ⟨Φ1, Y −</p>
        <p>E⟩ + ⟨Φ2, Y −</p>
        <p>F⟩
+ ⟨Φ3, Y −</p>
        <p>G⟩ +
 (︀
2 ‖Y −</p>
        <p>E‖F2 + ‖Y −</p>
        <p>F‖F2 + ‖Y −</p>
        <p>G‖F2)︀
By equating the gradient of Eq. (16) to zero, the update expression for Y is given as,
[︁</p>
        <p>]︁− 1[︁
Y[+1] = (︀ W[])︀ ⊤W[] + 3 I
︀( W[])︀ ⊤X[] +  (︀ E[+1] + F[+1] + G[+1])︀
−
︀( Φ[1] + Φ[2] + Φ[3])︀ ]︁
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
Solution of the above equation can be found as,</p>
        <p>W[+1] = [︁X[](︀ Y[+1])︀ ⊤]︁[︁Y[+1](︀ Y[+1])︀ ⊤]︁− 1
Finally, the Lagrange multipliers are updated as follows:
2.4.5. Updating W:
The W sub-problem is given as follows.
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
W[+1] = argmin</p>
        <p>W
12 ‖X −</p>
        <p>WY‖F2
Φ[1+1] = Φ[1] +  (︀ Y[+1] − E[+1])︀
Φ[2+1] = Φ[2] +  (︀ Y[+1] − F[+1])︀
Φ[3+1] = Φ[3] +  (︀ Y[+1]
−
FG[[] ]⃦⃦⃦ ⃦⃦ ∞∞∞
, ⃦⃦ E[+1] −
, ⃦⃦ F[+1] −
, ⃦⃦ G[+1] −</p>
        <p>E[]⃦
FG[[]⃦⃦⃦]⃦⃦ ∞∞∞
⎫
⎬ &lt; 
⎭</p>
        <p>The overall process involved in the proposed time series activity clustering algorithm is
summarized in Algorithm 1.</p>
        <p>Once the representation matrix Y is obtained, an afinity matrix Q ∈ R×  is calculated.
The accuracy of clustering is highly dependent on the afinity matrix Q. A usual approach in
obtaining the afinity matrix is as shown below [14, 15].</p>
        <p>But, the graph so constructed do not take into account the intrinsic relationships of the
withincluster data points. But for data containing temporal information, the within-cluster data points
are highly correlated. In order to take advantage of this information, an afinity matrix Q is
calculated as follows.
where, ‖.‖2 represents the ℓ2 norm operator.</p>
        <p>Update Φ[2+1] with Eq. (21)
Update Φ[3+1] with Eq. (22)</p>
        <p>Update  [+1] =  []
12:
13:
14:  =  + 1
15: Use (23) to check the convergence
16: end while</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results and Analysis</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset and Parameter Settings</title>
        <p>To verify the performance of the proposed algorithm, it was tested on multiple human activity
datasets. The datasets considered include the Gaming 3D (G3D) [22], Florence 3D (F3D) [23],
UTKinect-Action 3D (UTK) [24], MSRC-Kinect12 (MSRC) [25], MSR Action 3D (MSRA) [26],
and HDM14 [27] datasets. By means of observation, the parameters  ,  1,  2 and  are set to
5.2, 0.03, 18, and 0.7 respectively. The program development was done on a system with Intel
Core i7 processor and a RAM of 16 GB, operating on 64-bit windows operating system with a
clock frequency of 2.90 GHz.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental Results</title>
        <p>Experimental validation was done on the following methods.</p>
        <p>Subspace clustering approaches based on self representation: In this method, clustering
using state-of-the-art clustering approaches are performed on the generated afinity matrix.
The clustering methods considered include Spectral Clustering (SC) [28], Orthogonal Matching
Pursuit (OMP) [29], K-means (Km) [28], SSC [14], and Elastic Net Subspace Clustering (EnSC)
[29]. The results obtained using SSC is found to be superior to that of the counterparts [14].</p>
        <p>SSC approaches with Data Pruning: In this method, the input skeletal data is pruned
with strategies including min Φ [29], Temporal SSC [29], Threshold Temporal SSC [29], and
Percentage Temporal SSC [29]. Later, a feature matrix is generated from the pruned data
sequences, followed by the application of SSC [14] approach. This method converges quickly.
Among others, the percentage temporal SSC approach gives better results.</p>
        <p>TSC-Cov: This is a method that we had developed in one of our previous works. In this
method, data re-framing was not performed and the kernel-based features were not incorporated
in the covariance descriptor. But, we had used a new clustering approach named TSC-Cov for
subspace clustering.</p>
        <p>The performance of the proposed algorithm was evaluated against the above mentioned
methods. The metrics used for quantitative evaluation include the accuracy, the Normalized
Mutual Information (NMI), and the Adjusted Rand Index (ARI). The results obtained are tabulated
in Tables (1-3).</p>
        <p>For qualitative evaluation, the afinity matrix and clustering results obtained using the
proposed method and the state-of-the-art methods are presented. Although the experiments
were conducted on all the six datasets mentioned earlier, for the purpose of illustration, the
results obtained with the UTK dataset [24] is given in this paper.
proposed method have much denser block diagonal structure. This is an indication of quality of
the clustering process.</p>
        <p>Fig. 3 shows the clustering results obtained on the UTK dataset. For visual analysis, each
cluster is assigned a unique color. The figure shows a comparison between SSC [ 14], Percentage
Temporal SSC [29], Threshold Temporal SSC [29], TSC-Cov, and the proposed method with
reference to the true labels. From Fig. 3 we can observe that clustering results obtained with
the proposed method are comparatively better than the other methods.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>The paper proposes a new method for clustering of human activity sequences in an eficient
way. The proposed method involves the extraction of features from raw input data with the
help of a kernel-based robust covariance descriptor. The optimization model developed uses the
combined advantage of LRR and LSR based subspace clustering approaches. The concept of
temporal Laplacian regularized dictionary learning is introduced in order to learn an efective
representation matrix from the extracted data features. With the help of ADMM approach,
the solution for the optimization problem is obtained. Performance of the proposed approach
is compared with that of the state-of-the-art approaches in terms of accuracy, NMI, and ARI.
Experimental results validate superiority of the proposed method in obtaining better clustering
results as compared to that of the counterparts. Motion capture data often sufers from
corruptions in the recorded information. To address this problem, robust human activity clustering
algorithms can be developed in the future. Also, mocap information can be combined with
other modalities of human activity data to achieve improved clustering results in challenging
scenarios.
[8] S. Park, J. Park, M. Al-Masni, M. Al-Antari, M. Z. Uddin, T.-S. Kim, A depth camera-based
human activity recognition via deep learning recurrent neural network for health and
social care services, Procedia Computer Science 100 (2016) 78–84.
[9] J. K. Aggarwal, L. Xia, Human activity recognition from 3d data: A review, Pattern</p>
      <p>Recognition Letters 48 (2014) 70–80.
[10] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, Q. Tian, Symbiotic graph neural networks for
3d skeleton-based human action recognition and motion prediction, IEEE Tran. on PAMI
(2021).
[11] A. Bagnall, J. Lines, A. Bostrom, J. Large, E. Keogh, The great time series classification bake
of: a review and experimental evaluation of recent algorithmic advances, Data Mining
and Knowledge Discovery 31 (2017) 606–660.
[12] L. Parsons, E. Haque, H. Liu, Subspace clustering for high dimensional data: a review,</p>
      <p>Acm sigkdd explorations newsletter 6 (2004) 90–105.
[13] R. Vidal, P. Favaro, Low rank subspace clustering (lrsc), Pattern Recognition Letters 43
(2014) 47–61.
[14] E. Elhamifar, R. Vidal, Sparse subspace clustering: Algorithm, theory, and applications,</p>
      <p>IEEE Tran. on PAMI 35 (2013) 2765–2781.
[15] J. Xue, Y.-Q. Zhao, Y. Bu, W. Liao, J. C.-W. Chan, W. Philips, Spatial-spectral structured
sparse low-rank representation for hyperspectral image super-resolution, IEEE Tran. on
Image Processing 30 (2021) 3084–3097.
[16] J. Francis, A. Johnson, B. Madathil, S. N. George, A joint sparse and correlation induced
subspace clustering method for segmentation of natural images, in: 2020 IEEE 17th India
Council Int. Conf. (INDICON), 2020, pp. 1–7.
[17] Z. Wu, M. Yin, Y. Zhou, X. Fang, S. Xie, Robust spectral subspace clustering based on least
square regression, Neural Processing Letters 48 (2018) 1359–1372.
[18] J. Cavazza, A. Zunino, M. S. Biagio, V. Murino, Kernelized covariance for action recognition,
in: 2016 23rd Int. Conf. on Pattern Recognition (ICPR), 2016, pp. 408–413.
[19] Z. Zhang, K. Zhao, Low-rank matrix approximation with manifold regularization, IEEE</p>
      <p>Tran. on PAMI 35 (2013) 1717–1729.
[20] W. Liu, X. Ma, Y. Zhou, D. Tao, J. Cheng, -laplacian regularization for scene recognition,</p>
      <p>IEEE Tran. on Cybernetics 49 (2019) 2927–2940.
[21] G. Casalino, N. D. Buono, C. Mencar, Part-based data analysis with masked non-negative
matrix factorization, in: Int. Conf. on Computational Science and Its Applications, Springer,
2014, pp. 440–454.
[22] V. Bloom, V. Argyriou, D. Makris, Hierarchical transfer learning for online recognition of
compound actions, Computer Vision and Image Understanding 144 (2015).
[23] L. Seidenari, V. Varano, S. Berretti, A. Del Bimbo, P. Pala, Recognizing actions from
depth cameras as weakly aligned multi-part bag-of-poses, in: 2013 IEEE Conf. on CVPR
Workshops, 2013, pp. 479–485.
[24] L. Xia, C.-C. Chen, J. K. Aggarwal, View invariant human action recognition using
histograms of 3d joints, in: 2012 IEEE Computer Society Conf.on CVPR - Workshops, 2012,
pp. 20–27.
[25] S. Fothergill, H. M. M. , P. Kohli, S. Nowozin, Instructing people for training gestural
interactive systems, in: CHI ’12 Proc. of the SIGCHI Conf. on Human Factors in Computing
Systems, ACM, 2012, pp. 1737–1746.
[26] W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3d points, in: 2010 IEEE</p>
      <p>Computer Society Conf. on CVPR - Workshops, 2010, pp. 9–14.
[27] M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, A. Weber, Documentation Mocap</p>
      <p>Database HDM05, Technical Report CG-2007-2, Universität Bonn, 2007.
[28] Y. Lee, S. Choi, Minimum entropy, k-means, spectral clustering, in: 2004 IEEE Int. Joint</p>
      <p>Conf. on Neural Networks (IEEE Cat. No.04CH37541), volume 1, 2004, pp. 117–122.
[29] G. Paoletti, J. Cavazza, C. Beyan, A. Del Bue, Subspace clustering for action recognition
with covariance representations and temporal pruning, in: 2020 25th Int. Conf. on Pattern
Recognition (ICPR), 2021, pp. 6035–6042.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pareek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thakkar</surname>
          </string-name>
          ,
          <article-title>A survey on video-based human action recognition: recent updates, datasets, challenges, and applications</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>54</volume>
          (
          <year>2021</year>
          )
          <fpage>2259</fpage>
          -
          <lpage>2322</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.-B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of vision-based human action recognition methods</article-title>
          ,
          <source>Sensors</source>
          <volume>19</volume>
          (
          <year>2019</year>
          )
          <fpage>1005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jobanputra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bavishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Doshi</surname>
          </string-name>
          ,
          <article-title>Human activity recognition: A survey</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>155</volume>
          (
          <year>2019</year>
          )
          <fpage>698</fpage>
          -
          <lpage>703</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Fu,</surname>
          </string-name>
          <article-title>Human action recognition and prediction: A survey</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          <volume>130</volume>
          (
          <year>2022</year>
          )
          <fpage>1366</fpage>
          -
          <lpage>1401</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Piran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Lee</surname>
          </string-name>
          , H. Moon,
          <article-title>Sensor-based and visionbased human activity recognition: A comprehensive survey</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>108</volume>
          (
          <year>2020</year>
          )
          <fpage>107561</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Akbar</surname>
          </string-name>
          ,
          <article-title>Skeleton-based human activity recognition using convlstm and guided feature learning</article-title>
          ,
          <source>Soft Computing</source>
          <volume>26</volume>
          (
          <year>2022</year>
          )
          <fpage>877</fpage>
          -
          <lpage>890</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barnachon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bouakaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Boufama</surname>
          </string-name>
          , E. Guillou,
          <article-title>Ongoing human action recognition with motion capture</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>47</volume>
          (
          <year>2014</year>
          )
          <fpage>238</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>