An Efficient Framework for the Clustering of Human Activity Data using Kernelized Robust Covariance Descriptors

An Efficient Framework for the Clustering of Human Activity Data using Kernelized Robust Covariance Descriptors GuntruPrasanthKumar National Institute of Technology Calicut

India

M. S. Subodh RajSudhishNGeorge sudhish@nitc.ac.in National Institute of Technology Calicut

India

An Efficient Framework for the Clustering of Human Activity Data using Kernelized Robust Covariance Descriptors 1613-0073 62D02F59B6E761F954BE05643235759A GROBID - A machine learning software for extracting information from scholarly documents Human activity data Kernelized covariance descriptors Temporal Laplacian regularization Temporal subspace clustering

In this paper, a new method for the efficient clustering of human activity data is proposed. Unlike the traditional human activity clustering approaches, our method relies on the skeletal data recorded with the help of motion capture (mocap) systems to achieve the goal. The proposed method is structured around the kernel-based robust covariance descriptor. By introducing a data re-framing technique that efficiently utilizes the temporal properties of the human activity data, we have alleviated the data redundancy and insufficiency issues associated with action sequences. The optimization model developed encompasses the combined benefits of low-rank representation and least square regression. The formulation is strengthened by incorporating the temporal dependency of the human activity sequences with the help of a temporal Laplacian regularizer. With the proposed algorithm, a representation matrix is learned from the raw data, which is then used to perform subspace clustering. Experiments conducted on multiple human activity datasets reveal the ability of the proposed method to achieve better clustering results compared to state-of-the-art counterparts.

Introduction

Human activity recognition (HAR) from action sequences remains a challenging research topic in computer vision due to its multifaceted applications [1,2]. HAR finds application in visual surveillance, healthcare, human-machine interface, video retrieval, and entertainment industry, to name a few [3,4]. Traditional approaches to HAR use RGB video sequences as the input. Handcrafted features are later extracted from the video sequences for the purpose of activity recognition [3,4]. Because of the high-dimensional nature of the video sequences, high computational complexity is often associated with such HAR approaches [1]. Later, sensor-based HAR gained popularity. The focus of such methods were on the data obtained from sensors such as accelerometer and gyroscopes. In such cases, the subject itself needs to be in possession of the sensor so that the movements of the human body can be recorded. This along with the influence of noise acts as a limiting factor in sensor based HAR approaches [5,1,6]. With the evolution of mocap systems, new modalities were introduced to represent the human activity information. Such modalities include the motion depth maps and the skeletal representations [7,8]. With skeletal representations, time series of 3D joint positions of the human body are recorded by the mocap systems [9]. The spatio-temporal quality of such recordings are superior that they find application in multiple domains including gait analysis, medical rehabilitation, and computer animations [7,3].

The common approaches to HAR with mocap data involve supervised learning methods. Though they guarantee good results, such methods are often susceptible to missing sample issues. Further, the requirement of a huge clean dataset for initial learning of the system poses a serious bottleneck to supervised HAR approaches [8,10]. This paves a foundation for the need of having unsupervised HAR strategies. Though in unsupervised methods the aforementioned challenges faced by supervised methods are alleviated, they encounter other limitations. The main challenge in unsupervised HAR is posed by the fact that the task needs to be performed in a robust and accurate manner without any prior knowledge about the data samples [11].

Recent studies showcase the ability of subspace clustering algorithms in dealing with highdimensional data clustering problems [12]. The key idea in subspace clustering approach is to identify multiple low-dimensional subspaces from which the data originates [13]. The subspaces so identified house a cluster of data. This approach is commonly termed as the Union of Subspaces (UoS) model [14]. The popularity of subspace clustering approach has increased with the introduction of sparse subspace clustering approach in which a sparsity constraint is imposed on the coefficients in order to learn a sparse representation of the raw data [14]. Low-rank representation learning (LRR) [15] is another technique used with subspace clustering wherein the global structure of the data is considered while learning the coefficients. In LRR based approaches, a given dictionary is utilized to learn a low-rank representation of the data samples. The clustering results obtained with such low-rank representations are usually better [16]. Least square regression (LSR) [17] based subspace clustering in which the grouping of data samples are performed with the help of Frobenius norm operator is another promising approach in subspace clustering. The aforementioned approaches when used independently will not be suitable for HAR as they do not consider the time series information associated with the skeletal data. As a solution to this problem we have developed an approach called the time series activity clustering (TSAC) which utilizes the combined advantages of LRR and LSR. We have incorporated a kernel-based robust covariance descriptor to extract features out of the raw input data with the aim of exploring the non-linearity present in the data. Further, the temporal Laplacian regularizer is employed to capture the temporal dependency among the data samples.

The following are the main contributions of this work:

1. An unsupervised optimization model with improved performance is formulated for the clustering of human activity data. To effectively utilize the temporal dependencies of human activity sequences, a temporal Laplacian regularizer is introduced in the proposed model. 2. We have blended the LRR and LSR based subspace clustering approaches to achieve better clustering results. A clean dictionary is learned along with the representation matrix in The rest of the work is outlined as follows. The proposed method, the problem formulation, and the solutions obtained are presented in Section 2. Section 3 explains the experimental validation done using the proposed method. Finally, conclusions are drawn in Section 4.

Proposed Method

In this section, we give a detailed outline of the proposed TSAC approach. The workflow of the proposed approach is shown in Fig. 1.

Data Re-framing

A sequence of human action is a collection of action timestamps evolved over a period of time. Two important observations can be made with reference to a given collection of action sequences as mentioned below:

1. The number of timestamps corresponding to each of the action sequences may not necessarily be the same. This disparity often appears as a bottleneck in generalizing any algorithm dealing with human activity recognition. 2. As the number of timestamps in an action sequence increases, it introduces additional computation overhead. That will lead to a rise in demand for resource utilization.

The aforementioned challenges can be addressed by standardizing the number of timestamps for all the action sequences under consideration. If the number of timestamps is standardized to be 'N ', then data pruning methods need to be employed on action sequences having more number of timestamps and data augmentation needs to be performed on action sequences having lesser number of timestamps. The temporal smoothness property of human action sequences can be conveniently utilized to achieve this goal Human action sequences are temporally highly correlated. As far as activity recognition is concerned, this correlation leads to redundant information. Often, the complete set of frames of a recorded action sequence is not essential to perform activity recognition. Thus, we introduce a pruning technique, termed succession pruning, in which the alternative frames of the action sequence are removed to eliminate redundancy while maintaining the temporal properties of the action sequence. That will drastically reduce the amount of data to be processed and will also result in reduced computation overhead and resource utilization. Whereas in action sequences experiencing insufficiency of timestamps, we perform timestamp augmentation. In this process, the trailing end of the action sequence gets augmented with the terminal timestamps of the same action sequence. That is in line with the temporal smoothness property of the human action data.

Feature Matrix Generation using Kernel-based Robust Covariance Descriptor

Human actions are represented in the form of skeletal structures with the help of modern motion capture systems. Each timestamp of an action sequence can be represented as a collection of 'n' joints. Thus, the time stamp 't' of an action sequence can be represented as e(𝑡) ∈ R 3×𝑛 of 3D positions {e 1 (𝑡), . . . , e 𝑛 (𝑡)}. The 3D coordinates of the i 𝑡ℎ joint of the skeletal structure corresponding to the t 𝑡ℎ timestamp can be denoted as, e 𝑖 (𝑡) ∈ [𝑥 𝑖 (𝑡), 𝑦 𝑖 (𝑡), 𝑧 𝑖 (𝑡)] ⊤ .

Once the data re-framing process is completed, the raw action sequences with fixed number of timestamps are represented in the form of a feature matrix. It involves a two step process. In the first step, we use the concept of covariance to obtain the covariance descriptor of each action sequence. The use of covariance will help us to capture the changes pertaining to each joint of the skeletal structure [18]. If 𝜇 represents the temporal average of the timestamps of an action sequence, then the corresponding action sequence can be represented in the form of a covariance matrix as shown below.

Ψ = 1 𝑁 − 1 𝑁 ∑︁ 𝑡=1 [e(𝑡) − 𝜇][e(𝑡) − 𝜇] ⊤(1)

This process is repeated for each action sequence, resulting in a unique covariance descriptor for each input sequence.

Although covariance descriptors finds application in multiple domains, they cannot capture non-linearity present in the data. For making the feature matrix robust, different approaches are adopted to incorporate additional statistical information along with the covariance descriptor. This includes the entropy based approaches, the mutual information based approaches, and the kernel-based approaches. Among others, the kernel-based approaches have been used to simulate more complex models. The use of kernels improves the descriptive power of the covariance matrices. The work proposed by Cavazza et al. [18] showcases the benefits of using kernels in works related to human activity recognition. Motivated by this observation, we modify the expression given in Eq. ( 1) to incorporate the kernel function and obtain the following robust covariance descriptor.

Ψ = 1 𝑁 − 1 𝑁 ∑︁ 𝑡=1 [︀ 𝒦(e(𝑡)) − 𝜇 𝜅 ]︀[︀ 𝒦(e(𝑡)) − 𝜇 𝜅 ]︀ ⊤(2)

Here, 𝒦(.) represents the kernel function and 𝜇 𝜅 is the temporal average of the kernel entries. The choice of kernel function is application specific. We have used two kernel functions namely the polynomial kernel and the exponential kernel, out of which the later one have produced promising results. The exponential kernel is defined as:

𝒦(e(𝑡)) = exp {︁ e(𝑡)

(𝜎 + 𝑏) 2 }︁ (3)

where 𝑏 > 0 and 𝜎 is the kernel bandwidth.

After obtaining the robust covariance descriptors for each action sequence, in the second step we generate the feature matrix. The covariance descriptor Ψ contains redundant information as it is symmetric along the main diagonal. In order to reduce the amount of data to be processed, we vectorize each covariance descriptor by retaining the upper triangular values alone. Later, we stack (as columns) each of the vectors so obtained to form the feature matrix X ∈ R 𝑝×𝑘 . Here, '𝑝' is the length of the individual vectors and 'k' is the total number of action sequences.

Temporal Subspace Clustering (TSC) with Laplacian Regularization

Subspace clustering is performed by generating a representation matrix out of the feature matrix in state-of-the-art approaches. This is accomplished by using the self representation property of the feature matrix, resulting in the computation of a set of unique coefficients Y ∈ R 𝑘×𝑘 for the feature matrix X ∈ R 𝑝×𝑘 . This can be mathematically expressed as X = XY. The limitation of such an approach is that they tend to produce sub-optimal results if the sampling done is not sufficient. Instead of using the self representation property, if we learn an efficient dictionary W ∈ R 𝑝×𝑘 , we can overcome the aforementioned problem. Given a dictionary W, the set of data samples X can be expressed as X ≈ WY. The set of coefficients Y can be efficiently obtained in an iterative manner by utilizing the underlying low-rank nature of Y. The low-rank property of Y can be capture with the help of LRR [15]. Since the rank minimization problem is inherently NP-hard, nuclear norm can be used as a substitute for rank minimization [17]. It is also important to obtain the intra-cluster correlation among the data samples to obtain better clustering results. To this end, we can use the principle of LSR [17]. By yielding the concepts of LRR and LSR we formulate an optimization problem as follows.

min Y,W 1 2 ‖X − WY‖ 2 F + 𝜆 1 2 ‖Y‖ 2 F + 𝜆 2 ‖Y‖ * s.t. Y ≥ 0, W ≥ 0 (4)

where, the term Manifold regularization methods are efficient in incorporating the temporal dependency of data in the problem formulation [19]. Thus, by modifying the general Laplacian regularizer, which captures the spatial dependency of data, we have developed a temporal Laplacian regularizer ℒ(•) as our interest is in the temporal information of the action sequences.

For a representation matrix Y, the temporal Laplacian regularization function can be defined as [20]:

ℒ(Y) = 1 2 ∑︁ 𝑖 ∑︁ 𝑗 𝑧 𝑖𝑗 ‖𝑦 𝑖 − 𝑦 𝑗 ‖ 2 2 = tr (YL T Y ⊤ ),(5)

where

L T = ̃︁ W − Z is a temporal Laplacian matrix, ̃︁ W ii = ∑︀ 𝑚 𝑗=1 𝑧 𝑖𝑗 ,

and Z is a weight matrix that finds the successive similarities in X.

Each element of Z is found as [21],

𝑧 𝑖𝑗 = {︃ 1 for |𝑖 − 𝑗| ≤ 𝛾 2 0 otherwise (6)

where 𝛾 denotes empirically defined threshold value.

With the introduction of ℒ(Y), Eq. ( 4) is modified as,

min Y,W 1 2 ‖X − WY‖ 2 F + 𝜆 1 2 ‖Y‖ 2 F + 𝜆 2 ‖Y‖ * + 𝜆 3 ℒ(Y) s.t. Y ≥ 0, W ≥ 0 (7)

Solution

The optimization problem given in Eq. ( 7) can be solved using the ADMM approach under the ALM framework. ADMM finds solution for the unconstrained optimization problem by splitting the problem into multiple sub-problems. As a first step, we will introduce three auxiliary variables E, F, and G to decouple the terms present in the formulation. This results in a formulation as shown below.

min

Y,W,E,F,G 1 2 ‖X − WY‖ 2 F + 𝜆 1 2 ‖E‖ 2 F + 𝜆 2 ‖F‖ * + 𝜆 3 ℒ(G) s.t. Y = E, Y = F, Y = G, Y ≥ 0, W ≥ 0(8)

The Augmented Lagrangian corresponding to Eq. ( 8) is given as:

L(E, F, G, Y, W) = 1 2 ‖X − WY‖ 2 F + 𝜆 1 2 ‖E‖ 2 F + 𝜆 2 ‖F‖ * + 𝜆 3 tr(GL T G ⊤ ) + ⟨Φ 1 , Y − E⟩ + ⟨Φ 2 , Y − F⟩ + ⟨Φ 3 , Y − G⟩ + 𝛽 2 (‖Y − E‖ 2 F + ‖Y − F‖ 2 F + ‖Y − G‖ 2 F )(9)

Updating E:

The update expression for E is obtained by solving the following sub-problem.

E [𝑙+1] = argmin E 𝜆 1 ‖E‖ 2 F + ⟨Φ 1 , Y − E⟩ + 𝛽 2 ‖Y − E‖ 2 F (10)

By differentiating Eq. ( 10) with respect to E and equating it to zero, the E update is given as,

E [𝑙+1] = 1 𝜆 1 + 𝛽 (︀ Φ [𝑙] 1 + 𝛽Y [𝑙] )︀(11)

Updating F:

The F sub-problem is given as,

F [𝑙+1] = argmin F 𝜆 2 ‖F‖ * + ⟨𝜑 2 , Y − F⟩ + 𝛽 2 ‖Y − F‖ 2 F (12)

The update expression for F is found using the singular value thresholding (SVT) operator as follows [16],

F [𝑙+1] = SVT 𝜆 2 𝛽 [︃ Y [𝑙] + Φ [𝑙] 2 𝛽 ]︃(13)

Updating G:

The update expression for G is obtained by solving the following sub-problem.

G [𝑙+1] = argmin G 𝜆 3 tr(GL T G ⊤ ) + ⟨Φ 3 , Y − G⟩ + 𝛽 2 ‖Y − G‖ 2 F (14)

By differentiating Eq. ( 14) with respect to G and equating it to zero, the G update is given as,

G [𝑙+1] = (︀ Φ [𝑙] 3 + 𝛽Y [𝑙] )︀(︀ 𝜆 3 (L T + L T ⊤ ) + 𝛽I )︀ −1(15)

Updating Y:

By solving the following sub-problem, the update expression for Y can be obtained.

Y [𝑙+1] = argmin Y 1 2 ‖X − WY‖ 2 F + ⟨Φ 1 , Y − E⟩ + ⟨Φ 2 , Y − F⟩ + ⟨Φ 3 , Y − G⟩ + 𝛽 2 (︀ ‖Y − E‖ 2 F + ‖Y − F‖ 2 F + ‖Y − G‖ 2 F )︀(16)

By equating the gradient of Eq. ( 16) to zero, the update expression for Y is given as,

Y [𝑙+1] = [︁ (︀ W [𝑙] )︀ ⊤ W [𝑙] + 3𝛽I ]︁ −1 [︁ (︀ W [𝑙] )︀ ⊤ X [𝑙] + 𝛽 (︀ E [𝑙+1] + F [𝑙+1] + G [𝑙+1] )︀ − (︀ Φ [𝑙] 1 + Φ [𝑙] 2 + Φ [𝑙] 3 )︀ ]︁(17)

Updating W:

The W sub-problem is given as follows.

W [𝑙+1] = argmin W 1 2 ‖X − WY‖ 2 F (18)

Solution of the above equation can be found as,

W [𝑙+1] = [︁ X [𝑙] (︀ Y [𝑙+1] )︀ ⊤ ]︁[︁ Y [𝑙+1] (︀ Y [𝑙+1] )︀ ⊤ ]︁ −1(19)

Finally, the Lagrange multipliers are updated as follows:

Φ [𝑙+1] 1 = Φ [𝑙] 1 + 𝛽 (︀ Y [𝑙+1] − E [𝑙+1] )︀(20)Φ [𝑙+1] 2 = Φ [𝑙] 2 + 𝛽 (︀ Y [𝑙+1] − F [𝑙+1] )︀(21)Φ [𝑙+1] 3 = Φ [𝑙] 3 + 𝛽 (︀ Y [𝑙+1] − G [𝑙+1] )︀(22)

Convergence of the algorithm is ensured if, max

⎧ ⎨ ⎩ ⃦ ⃦ Y [𝑙+1] − E [𝑙] ⃦ ⃦ ∞ , ⃦ ⃦ E [𝑙+1] − E [𝑙] ⃦ ⃦ ∞ ⃦ ⃦ Y [𝑙+1] − F [𝑙] ⃦ ⃦ ∞ , ⃦ ⃦ F [𝑙+1] − F [𝑙] ⃦ ⃦ ∞ ⃦ ⃦ Y [𝑙+1] − G [𝑙] ⃦ ⃦ ∞ , ⃦ ⃦ G [𝑙+1] − G [𝑙] ⃦ ⃦ ∞ ⎫ ⎬ ⎭ < 𝜖(23)

The overall process involved in the proposed time series activity clustering algorithm is summarized in Algorithm 1.

Once the representation matrix Y is obtained, an affinity matrix Q ∈ R 𝑘×𝑘 is calculated. The accuracy of clustering is highly dependent on the affinity matrix Q. A usual approach in obtaining the affinity matrix is as shown below [14,15].

Q = |Y| + |Y ⊤ | 2(24)

But, the graph so constructed do not take into account the intrinsic relationships of the withincluster data points. But for data containing temporal information, the within-cluster data points are highly correlated. In order to take advantage of this information, an affinity matrix Q is calculated as follows.

Q(𝑖, 𝑗) = 𝑦 ⊤ 𝑖 𝑦 𝑗 ‖𝑦 𝑖 ‖ 2 ‖𝑦 𝑗 ‖ 2(25)

where, ‖.‖ 2 represents the ℓ 2 norm operator. Update 𝛽 [𝑙+1] = 𝜂𝛽 [𝑙] 14:

𝑙 = 𝑙 + 1 15:

Use (23) to check the convergence 16: end while

Experimental Results and Analysis

Dataset and Parameter Settings

To verify the performance of the proposed algorithm, it was tested on multiple human activity datasets. The datasets considered include the Gaming 3D (G3D) [22], Florence 3D (F3D) [23], UTKinect-Action 3D (UTK) [24], MSRC-Kinect12 (MSRC) [25], MSR Action 3D (MSRA) [26], and HDM14 [27] datasets. By means of observation, the parameters 𝛾, 𝜆 1 , 𝜆 2 and 𝜂 are set to 5.2, 0.03, 18, and 0.7 respectively. The program development was done on a system with Intel Core i7 processor and a RAM of 16 GB, operating on 64-bit windows operating system with a clock frequency of 2.90 GHz.

Experimental Results

Experimental validation was done on the following methods. Subspace clustering approaches based on self representation: In this method, clustering using state-of-the-art clustering approaches are performed on the generated affinity matrix. The clustering methods considered include Spectral Clustering (SC) [28], Orthogonal Matching Pursuit (OMP) [29], K-means (Km) [28], SSC [14], and Elastic Net Subspace Clustering (EnSC) [29]. The results obtained using SSC is found to be superior to that of the counterparts [14].

SSC approaches with Data Pruning: In this method, the input skeletal data is pruned with strategies including min Φ [29], Temporal SSC [29], Threshold Temporal SSC [29], and Percentage Temporal SSC [29]. Later, a feature matrix is generated from the pruned data sequences, followed by the application of SSC [14] approach. This method converges quickly. Among others, the percentage temporal SSC approach gives better results.

TSC-Cov: This is a method that we had developed in one of our previous works. In this method, data re-framing was not performed and the kernel-based features were not incorporated in the covariance descriptor. But, we had used a new clustering approach named TSC-Cov for subspace clustering.

The performance of the proposed algorithm was evaluated against the above mentioned methods. The metrics used for quantitative evaluation include the accuracy, the Normalized Mutual Information (NMI), and the Adjusted Rand Index (ARI). The results obtained are tabulated in Tables (1-3).

For qualitative evaluation, the affinity matrix and clustering results obtained using the proposed method and the state-of-the-art methods are presented. Although the experiments were conducted on all the six datasets mentioned earlier, for the purpose of illustration, the results obtained with the UTK dataset [24] is given in this paper.

Figure 1 :1Figure 1: Overview of the proposed time series activity clustering

Figure 2 :2Figure 2: Affinity graphs of SSC, Percentage Temporal SSC, Threshold Temporal SSC, TSC-Cov, and Kernel and LRR imposed TSC (Proposed) approaches on UTK Dataset.

Figure 3 :3Figure 3: Clustering Results on UTK Dataset

1 2 ‖X − WY‖ 2 F captures the reconstruction error, ||•|| 𝐹 represents the Frobenius norm operator, || • || * denotes the nuclear norm operator, and 𝜆 1 and 𝜆 2 are the balancing parameters.

Algorithm 11Time Series Activity Clustering Require: Skeletal data and parameters 𝜆 1 , 𝜆 2 , 𝜆 3 , 𝜂, 𝛾 and 𝛽

Ensure: Y ∈ R 𝑘×𝑘1: Find Ψ using Eq. (1)2: Find X using Ψ3: Generate matrices Z, ̃︁ W, and L T4: while 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑 do5:Update E [𝑙+1] with Eq. (11)6:Update F [𝑙+1] with Eq. (13)7:Update G [𝑙+1] with Eq. (15)8:Update Y [𝑙+1] with Eq. (17)9:Update W [𝑙+1] with Eq. (19)10: 11: 12:Update Φ [𝑙+1] 1 Update Φ [𝑙+1] 2 Update Φ [𝑙+1] 3with Eq. (20) with Eq. (21) with Eq. (22)13:

Table 1

Comparison of Clustering accuracy (%)

Dataset

SSC [14] Percentage Temporal SSC [29] TSC-Cov TSAC (Proposed) G3D [22] 65.16 66.04 90.04 92.65 F3D [23] 61.86 63.72 81.39 81.58 UTK [24] 74.37 72.36 89.44 90.01 MSRC [25] 73.42 80.60 81.09 82.00 MSRA [26] 58 TSC-Cov TSAC (Proposed) G3D [22] 0.719 0.708 0.953 0.961 F3D [23] 0.716 0.709 0.872 0.875 UTK [24] 0.709 0.672 0.899 0.911 MSRC [25] 0.720 0.762 0.887 0.890 MSRA [26] 0.700 0.720 0.958 0.972 HDM14 [27] 0.754 0.771 0.893 0.901

Table 3

Comparison of ARI Dataset SSC [14] Percentage Temporal SSC [29] TSC-Cov TSAC (Proposed) G3D [22] 0.499 0.479 0.847 0.851 F3D [23] 0.548 0.539 0.781 0.787 UTK [24] 0.547 0.499 0.804 0.819 MSRC [25] 0.551 0.714 0.773 0.781 MSRA [26] 0.435 0.456 0.881 0.895 HDM14 [27] 0.439 0.484 0.753 0.772 Fig. 2 visualizes the affinity matrices generated using SSC [14], Percentage Temporal SSC [29], Threshold Temporal SSC [29], TSC-Cov, and the proposed method while working on the UTK dataset. We can observe that among others, the affinity matrix generated using the proposed method have much denser block diagonal structure. This is an indication of quality of the clustering process.

Fig. 3 shows the clustering results obtained on the UTK dataset. For visual analysis, each cluster is assigned a unique color. The figure shows a comparison between SSC [14], Percentage Temporal SSC [29], Threshold Temporal SSC [29], TSC-Cov, and the proposed method with reference to the true labels. From Fig. 3 we can observe that clustering results obtained with the proposed method are comparatively better than the other methods.

Conclusions

The paper proposes a new method for clustering of human activity sequences in an efficient way. The proposed method involves the extraction of features from raw input data with the help of a kernel-based robust covariance descriptor. The optimization model developed uses the combined advantage of LRR and LSR based subspace clustering approaches. The concept of temporal Laplacian regularized dictionary learning is introduced in order to learn an effective representation matrix from the extracted data features. With the help of ADMM approach, the solution for the optimization problem is obtained. Performance of the proposed approach is compared with that of the state-of-the-art approaches in terms of accuracy, NMI, and ARI. Experimental results validate superiority of the proposed method in obtaining better clustering results as compared to that of the counterparts. Motion capture data often suffers from corruptions in the recorded information. To address this problem, robust human activity clustering algorithms can be developed in the future. Also, mocap information can be combined with other modalities of human activity data to achieve improved clustering results in challenging scenarios.

A survey on video-based human action recognition: recent updates, datasets, challenges, and applications PPareek AThakkar Artificial Intelligence Review 54 2021 A comprehensive survey of vision-based human action recognition methods H.-BZhang Y.-XZhang BZhong QLei LYang J.-XDu D.-SChen Sensors 19 1005 2019 Human activity recognition: A survey CJobanputra JBavishi NDoshi Procedia Computer Science 155 2019 Human action recognition and prediction: A survey YKong YFu International Journal of Computer Vision 130 2022 Sensor-based and visionbased human activity recognition: A comprehensive survey LMDang KMin HWang MJPiran CHLee HMoon Pattern Recognition 108 107561 2020 Skeleton-based human activity recognition using convlstm and guided feature learning SKYadav KTiwari HMPandey SAAkbar Soft Computing 26 2022 Ongoing human action recognition with motion capture MBarnachon SBouakaz BBoufama EGuillou Pattern Recognition 47 2014 A depth camera-based human activity recognition via deep learning recurrent neural network for health and social care services SPark JPark MAl-Masni MAl-Antari MZUddin T.-SKim Procedia Computer Science 100 2016 Human activity recognition from 3d data: A review JKAggarwal LXia Pattern Recognition Letters 48 2014 Symbiotic graph neural networks for 3d skeleton-based human action recognition and motion prediction MLi SChen XChen YZhang YWang QTian IEEE Tran. on PAMI 2021 The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances ABagnall JLines ABostrom JLarge EKeogh Data Mining and Knowledge Discovery 31 2017 Subspace clustering for high dimensional data: a review LParsons EHaque HLiu Acm sigkdd explorations newsletter 6 2004 Low rank subspace clustering (lrsc) RVidal PFavaro Pattern Recognition Letters 43 2014 Sparse subspace clustering: Algorithm, theory, and applications EElhamifar RVidal IEEE Tran. on PAMI 35 2013 Spatial-spectral structured sparse low-rank representation for hyperspectral image super-resolution JXue Y.-QZhao YBu WLiao JC.-W. Chan WPhilips IEEE Tran. on Image Processing 30 2021 A joint sparse and correlation induced subspace clustering method for segmentation of natural images JFrancis AJohnson BMadathil SNGeorge IEEE 17th India Council Int. Conf. (INDICON) 2020. 2020 Robust spectral subspace clustering based on least square regression ZWu MYin YZhou XFang SXie Neural Processing Letters 48 2018 Kernelized covariance for action recognition JCavazza AZunino MSBiagio VMurino 2016 23rd Int. Conf. on Pattern Recognition (ICPR) 2016 Low-rank matrix approximation with manifold regularization ZZhang KZhao IEEE Tran. on PAMI 35 2013 𝑝-laplacian regularization for scene recognition WLiu XMa YZhou DTao JCheng IEEE Tran. on Cybernetics 49 2019 Part-based data analysis with masked non-negative matrix factorization GCasalino NDBuono CMencar Int. Conf. on Computational Science and Its Applications Springer 2014 Hierarchical transfer learning for online recognition of compound actions VBloom VArgyriou DMakris Computer Vision and Image Understanding 144 2015 Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses LSeidenari VVarano SBerretti ADel Bimbo PPala 2013 IEEE Conf. on CVPR -Workshops 2013 View invariant human action recognition using histograms of 3d joints LXia C.-CChen JKAggarwal IEEE Computer Society Conf.on CVPR -Workshops 2012. 2012 Instructing people for training gestural interactive systems SFothergill HM M PKohli SNowozin CHI '12 Proc. of the SIGCHI Conf. on Human Factors in Computing Systems ACM 2012 Action recognition based on a bag of 3d points WLi ZZhang ZLiu IEEE Computer Society Conf. on CVPR -Workshops 2010. 2010 Documentation Mocap Database HDM05 MMüller TRöder MClausen BEberhardt BKrüger AWeber CG-2007-2 2007 Universität Bonn Technical Report Minimum entropy, k-means, spectral clustering YLee SChoi IEEE Int. Joint Conf. on Neural Networks (IEEE Cat. No.04CH37541) 2004. 2004 1 Subspace clustering for action recognition with covariance representations and temporal pruning GPaoletti JCavazza CBeyan ADel Bue 25th Int. Conf. on Pattern Recognition (ICPR) 2020. 2021