<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Human Activity Recognition Using Pose Estimation and Machine Learning Algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Abhay Gupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuldeep Gupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kshama Gupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kapil Gupta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Technology</institution>
          ,
          <addr-line>Kurukshetra Haryana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>323</fpage>
      <lpage>330</lpage>
      <abstract>
        <p>Human Activity Recognition is becoming a popular field of research in the last two decades. Understanding human behavior in images gives useful information for a large number of computer vision problems and has many applications like scene recognition and pose estimation. There are various methods present for activity recognition; every technique has its advantages and disadvantages. Despite being a lot of research work, recognizing activity is still a complex and challenging task. In this work, we proposed an approach for human activity recognition and classification using a person's pose skeleton in images. This work is divided into two parts; a single person poses estimation and activity classification using pose. Pose Estimation consists of the recognition of 18 body key points and joints locations. We have used the OpenPose library for pose estimation work. And the activity classification task is performed by using multiple logistic regression. We have also shown a comparison between various other regression and classification algorithm's accuracy on our dataset. We have prepared our dataset, divided it into two parts, one is used to train the model, and another is used to validate our proposed model's performance.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Human Activity Recognition</kwd>
        <kwd>Pose Estimation</kwd>
        <kwd>Body Keypoints</kwd>
        <kwd>Logistic Regression</kwd>
        <kwd>OpenPose</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The goal of a Human Activity Recognition
(HAR) system is to predict the label of a
person's action from an image or video. This
interesting topic is inspired by many useful
real-world applications, such as simulation,
visual surveillance, understanding human
behavior, etc. Action recognition through
videos is a well-known and established research
problem. In contrast, image-based action
recognition is a comparably, less explored
problem, but it has gained the community's
attention in recent years. Because motion
activities cannot be estimated through the still
image, recognition of actions from images
remains a tedious and challenging problem. It
requires a lot of work as the methods that have
been applied to video-based systems cannot be
applicable in this. However, the approach is not
the only problem faced in this task. There are
many other challenges too, especially the
changes in clothing and body shape that affect
the appearance of the body parts, various
illumination effects, estimation of the pose is
difficult if the person is not facing the camera,
definition, and diversity activities, etc.</p>
      <p>
        Activity recognition through smartphones
and wearable sensors is very common; there are
various benchmarks available. But these
systems rely on collecting data from sensors
installed on the devices and user needs to wear
these devices that are uncomfortable in
practical. Vision-based systems are a better
alternative for this kind of problem due to the
fact that the user doesn't need to carry or wear
any device. Instead, tools like a camera are
installed in the surrounding environment to
capture data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One of the popular
visionbased HAR systems uses pose information.
Poses have had remarkable success in human
activity recognition, and researchers are widely
using them in this problem these days. Poses
provide useful information about human
behavior. The concept is beneficial in various
tasks such as HAR, content extraction,
semantic understanding, etc. It uses a
convolutional neural network (CNN) because
they are much efficient in dealing with images.
They are similar to traditional neural networks
in that they consist of neurons with biases and
learnable weights. In this study, we proposed a
pose-based HAR system that overcomes the
issues we discussed in the smartphone and
wearable sensor approach. We extract human
pose (18 body keypoints location in the
twodimensional plane) from images using
OpenPose library, which internally uses CNNs.
Finally, activity is classified using the pose
information through a supervised machine
learning algorithm.
      </p>
      <p>The rest of the paper is structured as in
section 2 literature survey of some selected
research papers in the area is mentioned.
Section 3 contains the methodology and
architecture of the proposed approach. A brief
description of dataset and evaluation metrics
(like precision, recall, and f1-score) used in this
work is given in sections 4 and 5, respectively.
Section 6 contains experiments and results of
various classification algorithms applied in this
work. Section 7 finally concludes the work with
some future direction.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Research has recently begun to recognize
the behavior of humans from the images.
Compared to the video-based action
classification, the number of research papers
and journals are less. We have stated some
techniques used for HAR. Four types of
approaches address classifications of actions,
including image structure-based methods,
posebased systems, model-based approaches, and
example-based methods. The pose-based
method trains each pose using an annotated 3D
image [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The model-based method uses a
known parametric body model to match posture
variables [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The example-based model uses
classical machine learning algorithms to find
actions in some image properties [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the
method based on image structure, the posture's
representation is considered as features to the
classification of the action [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] detected daily living activities by
preprocessing the data collected from the
Microsoft Kinect motion-sensing device for
minimizing the error produced by the system
and subject. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed a new approach to
activity recognition by simultaneously
extracting features from objects used to
perform the activity and human posture. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
applied openpose and Kalman filter to track the
target body, and then a one-dimensional full
CNN is used for the classification of activity.
      </p>
      <p>
        Moreover, a single person activity can also
be recognized by using smartphone sensors and
wearable sensors; the smartphone-based
approach uses sensors that are inbuilt in the
device, such as accelerometer and gyroscope, to
identify activity, whereas the wearable
sensorbased approach requires the sensors to be
attached on the subject body to collect action
information. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] used several machine learning
algorithms (SVM, KNN, and Bagging) and
collected data from smartphones'
accelerometers and gyroscope sensors, and
detected six different activities. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] recognize
human activity using an accelerometer and
gyroscope sensor, which is mounted on
humans, and used various machine learning
algorithms such as KNN, Random Forest,
Naïve Bayes, and detecting three different
activities. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] collect data from the smartphone
and smartwatch and used a five-fold
crossvalidation technique to detect five upper limb
motions. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used wearable and
smartphoneembedded sensors for detecting six dynamic
and six static activities using a machine
learning algorithm. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] applied Deep learning
and convolutional neural network to recognize
the body's actions on data retrieved from
smartphone sensors.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Approach</title>
      <p>Our approach to activity recognition and
classification consists of two sequential tasks,
pose estimation from images, and then the
classification of the activities using extracted
pose key points as input with the help of
classification algorithms such as logistic
regression, support vector machine, decision
tree, etc. Figure 1 shows the architecture of the
proposed approach.</p>
      <p>Human Pose Estimation is the task of
extracting the body's skeletal key points and
joints locations corresponding to the human
body parts. It uses all those key points and joints
to associate the two-dimensional structure of
the human body. In this work, we have used the
OpenPose framework for estimating the pose
from the input image.</p>
      <p>In OpenPose, the image is sent over the
CNN output network to get the features from
input. The feature map is then processed in the
multi-stage CNN sequential layers to generate
(PAF) Part Affinity Fields and Confidence
Map. The Partial Affinity Fields and
Confidence map generated above are passes
through a bipartite graph matching algorithm to
capture human posture in the image. Figure 2
shows the OpenPose pipeline.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1.1. Part Affinity Field Maps (L)</title>
      <p>It contains two-dimensional vectors that
encode the body part's positions and
orientations in an image. It encrypts your data
in the form of a double link be-tween body
parts.</p>
      <p>L = (L1, L2, L3 …. Lc) (1)</p>
      <p>Lc∈Rw*h*2
c ∈ {1 … C}, where C is the total number of
limbs, R is the real number, L is the set of part
affinity field maps, and w x h is the dimension
of each map in the set L.</p>
    </sec>
    <sec id="sec-5">
      <title>3.1.2. Confidence Map</title>
      <p>It is a two-dimensional representation of the
belief that a particular part of the body can be
placed on a specific pixel.</p>
      <p>S = (S1, S2, S3 …. Sj) (2)</p>
      <p>Sj∈Rw*h
j ∈ {1 … J}, where J is the total number of
body parts, R is the real number, and S is the set
of confidence maps.</p>
      <p>The number of keypoints detected through
OpenPose dependent upon the dataset has been
trained.
In this work, the COCO dataset having 18
different body key points (see Figure 3)
R_Ankle, R_Knee, R_Wrist, L_Wrist,
R_Shoulder, L_Shoulder, L_Ankle, L_Ear,
R_Ear, R_Elbow, L_Elbow, L_Knee, L_Eye,
R_Eye, R_Hip, L_Hip, Nose, and Neck is used.</p>
    </sec>
    <sec id="sec-6">
      <title>Activity Classification</title>
      <p>We formulate the activity classification
problem as a multiclass classification problem,
which can be modeled using various machine
learning regression and classification
algorithms. The classification algorithm takes
18 body keypoints (x-axis and y-axis values of
each point) as input for our model's training and
testing. We used a supervised learning
approach as our dataset contains body
keypoints with an activity label. Among all the
algorithms, we use multiple logistic regression,
and random forest provides significantly
greater accuracy.</p>
    </sec>
    <sec id="sec-7">
      <title>4. Dataset</title>
      <p>
        OpenPose uses the COCO keypoints
detection dataset for the pose estimation task,
which contains more than 200K images labeled
with keypoints [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We have collected images
from Google for classification purposes, and
some photos are clicked by a smartphone
camera. We prepared our dataset on
approximately 1000 images (five activity
categories, namely, sitting, standing, running,
dancing, and laying). Each activity category has
more than 170 images. We divided our dataset
into training and testing in the ratio 90:10. Data
collected from different sources contain
unequal width and height images, while our
model requires the same width and height. We
have resized all images to fixed-size 432x368
pixels, and then key points are extracted from
them.
      </p>
    </sec>
    <sec id="sec-8">
      <title>5. Evaluation Metrics</title>
      <p>For performance evaluation, Recall,
Precision, and F1-score are used in this
experiment. We have also shown the Confusion
Matrix of some classifiers.
5.1.</p>
    </sec>
    <sec id="sec-9">
      <title>Precision</title>
      <p>Precision(P) is the ratio of the number of
true positives (Tp) to the sum of false positives
(Fp) and true positives. It can also be defined as
how many images classified into this class
belong to this class.</p>
      <p>P = Tp/(Tp+Fp)
5.2.</p>
    </sec>
    <sec id="sec-10">
      <title>Recall</title>
      <p>Recall(R) is the ratio of the number of true
positives (Tp) to the sum of false negatives (Fn)
and true positives. It can also be defined as how
many images that belong to this class are
classified into this class.</p>
      <p>R = Tp/(Tp+Fn)
5.3.</p>
    </sec>
    <sec id="sec-11">
      <title>F1-Score</title>
      <p>F1-Score is calculated as the harmonic mean
of recall and precision. Eqs.5 calculates it.</p>
      <p>F1-Score = 2 (P*R)/(P+R)
(5)
5.4.</p>
    </sec>
    <sec id="sec-12">
      <title>Confusion Matrix</title>
      <p>It is a two-dimensional matrix used to
measure the overall performance of the
machine learning classification algorithm. In
the matrix, each row is associated with the
predicted activity class, and each column is
associated with the actual activity class. The
matrix compares the target activity with the
activity predicted by the model. This gives a
better idea of what types of errors our classifier
has made.</p>
    </sec>
    <sec id="sec-13">
      <title>6. Experiments and Result</title>
      <p>The following five activities are considered
for pose estimation and activity recognition and
classification: sitting, standing, dancing, laying,
and running. The experiments are conducted in
Scikit Learn (0.23.1) and Python (3.6.6) in
Windows 10 Operating System with Intel i5
Processor 3.40 GHz with 8 GB RAM and using
five classification algorithms for activity
classification. These algorithms are described
below with their confusion matrix. The
performance results are provided in Table 1,
which shows the recall, precision, and f1-score
of various classifiers used in the proposed
approach.</p>
    </sec>
    <sec id="sec-14">
      <title>6.1. Classification Algorithm</title>
    </sec>
    <sec id="sec-15">
      <title>6.1.1. Logistic Regression</title>
      <p>This algorithm is based on supervised
learning, and it is used in classification
problems. In this work, multiple logistic
regression is used for classifying activities, and
'sag' is used as a solver because it solves only
L2 regularization with primal formulation or no
regularization and Uses dummy variables to
represent the categorical outcome.</p>
    </sec>
    <sec id="sec-16">
      <title>6.1.2. K-Nearest Neighbors</title>
      <p>K-nearest neighbor (KNN) is a supervised
machine learning algorithm used for
classification, and it's a non-parametric, lazy
algorithm. Despite this simplicity, we got very
competitive results that are one reason for using
this algorithm in our work. We used different
values for k and got the highest accuracy in 5.
The distance function(d) used in this algorithm
is given in Eqs.6 and for the confusion matrix
(see Figure5).</p>
      <p>d(p,q) = √ ∑ (qi - pi)2 (6)
where p, q are vectors containing keypoints
of two different images and i=1 … n.</p>
    </sec>
    <sec id="sec-17">
      <title>6.1.3. Support Vector Machine</title>
      <p>It also comes under supervised learning
algorithms and is mainly used in classification
and regression problems. We plotted all
available data as points in two-dimensional
space. The classification is done by finding a
hyperplane that provides similarly different
outputs between the two classes. The confusion
matrix is provided in Figure6.</p>
    </sec>
    <sec id="sec-18">
      <title>6.1.4. Decision Tree</title>
      <p>The decision tree comes under supervised
learning. It is the most powerful and accepted
tool for prediction and classification. This
algorithm uses learning to predict a target pose's
activity and make decisions from previously
trained data. Predictions for activities are made
from the root of the tree. The record attribute
value is compared to the root attribute value.
The confusion matrix is given in Figure7.</p>
    </sec>
    <sec id="sec-19">
      <title>6.1.5. Random Forest</title>
      <p>Random decision forest is a supervised
learning algorithm, and it is an ensemble
learning method for classification and
regression. It is also one of the most used and
popular algorithms because it gives better
results without tuning hyper-parameter. It
creates multiple decision trees and selects the
best solution using voting. We use a random
forest because it predicts activity with good
accuracy and runs efficiently even for big
datasets. The confusion matrix is shown in
Figure8.</p>
    </sec>
    <sec id="sec-20">
      <title>7. Conclusion</title>
      <p>In this study, we proposed our approach for
human activity recognition from still images by
extracting the skeletal coordinate information
(pose) using OpenPose API and then further
utilizing this pose information to classify
activity with the help of a supervised machine
learning algorithm. We prepared our dataset for
this work, which contains five different
activities, viz. sitting, standing, laying, dancing,
running. We have used five algorithms
(Logistic Regression, SVM, KNN, Random
Forest, and Decision Tree) to find better results
for our model. From our experiment results, we
observed that Multiple Logistic Regression,
SVM, and Random Forest are showing the
highest accuracy of 80.72%, 80.43%, and
80.75%, respectively, and the other two
algorithms KNN and Decision Tree, are
underperforming. We have shown accuracies of
some recent researches on HAR in table 2.</p>
      <p>Although much research has already been
done to a certain extent to deal with the activity
recognition problem, more convincing actions
must be taken. In practice, there are a lot of
different activities that humans use to perform
in everyday life. Detecting all of them isn't an
easy task because it requires a very large dataset
to train the model. Although the dataset is not
the only problem, the definition and diversity of
activities also make it more complicated for
machines to understand. Some more activities
can be added to extend the scope and usefulness
of the work in the future. Besides adding
activities, we can apply some data
preprocessing techniques for handling missing
keypoints of the body. We can also experiment
with some other machine learning algorithms
that can provide better results.</p>
      <sec id="sec-20-1">
        <title>Authors and</title>
      </sec>
      <sec id="sec-20-2">
        <title>Year</title>
      </sec>
      <sec id="sec-20-3">
        <title>Nandy et al., 2019 [12]</title>
      </sec>
      <sec id="sec-20-4">
        <title>Ghazal et al., 2018 [15]</title>
      </sec>
      <sec id="sec-20-5">
        <title>Gatt et al., 2019 [16]</title>
      </sec>
      <sec id="sec-20-6">
        <title>Dataset</title>
      </sec>
      <sec id="sec-20-7">
        <title>Activities</title>
      </sec>
      <sec id="sec-20-8">
        <title>Model Used</title>
      </sec>
      <sec id="sec-20-9">
        <title>Acceleromete r and heart rate sensor</title>
      </sec>
      <sec id="sec-20-10">
        <title>Walking, climbing stairs, sitting, running</title>
      </sec>
      <sec id="sec-20-11">
        <title>Images from the internet</title>
      </sec>
      <sec id="sec-20-12">
        <title>COCO keypoints</title>
      </sec>
      <sec id="sec-20-13">
        <title>Sitting on the chair or ground</title>
      </sec>
      <sec id="sec-20-14">
        <title>Abnormal activity such as fall detection</title>
      </sec>
      <sec id="sec-20-15">
        <title>Multilayer Perceptron</title>
      </sec>
      <sec id="sec-20-16">
        <title>Linear Regression</title>
      </sec>
      <sec id="sec-20-17">
        <title>Gaussian Naïve Bayes</title>
      </sec>
      <sec id="sec-20-18">
        <title>Decision Tree</title>
      </sec>
      <sec id="sec-20-19">
        <title>Decision-making</title>
        <p>algorithm with
feedforward CNN</p>
      </sec>
      <sec id="sec-20-20">
        <title>Used pre-trained models of PoseNet and</title>
      </sec>
      <sec id="sec-20-21">
        <title>OpenPose</title>
        <p>Accuracy
(%)
77.0</p>
      </sec>
    </sec>
    <sec id="sec-21">
      <title>8. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>"A Survey on Human Activity Recognition and Classification,"</article-title>
          <source>2020 International Conference on Communication and Signal Processing (ICCSP)</source>
          , Chennai, India,
          <year>2020</year>
          , pp.
          <fpage>0915</fpage>
          -
          <lpage>0919</lpage>
          , doi: 10.1109/ICCSP48568.
          <year>2020</year>
          .
          <volume>9182416</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jurie</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <article-title>"Expanded Parts Model for Human Attribute and Action Recognition in Still Images,"</article-title>
          <source>2013 IEEE Conference on Computer Vision</source>
          and Pattern Recognition, Portland,
          <string-name>
            <surname>OR</surname>
          </string-name>
          ,
          <year>2013</year>
          , pp.
          <fpage>652</fpage>
          -
          <lpage>659</lpage>
          , doi: 10.1109/CVPR.
          <year>2013</year>
          .
          <volume>90</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kembhavi</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <article-title>"Observing Human Object Interactions: Using Spatial and Functional Compatibility for Recognition,"</article-title>
          <source>in IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>31</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>1775</fpage>
          -
          <lpage>1789</lpage>
          , Oct.
          <year>2009</year>
          , doi: 10.1109/TPAMI.
          <year>2009</year>
          .
          <volume>83</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Yang</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao Jiang</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Drew</surname>
            ,
            <given-names>ZeNian</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Mori</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Unsupervised Discovery of Action Classes," 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)</source>
          , New York, NY, USA,
          <year>2006</year>
          , pp.
          <fpage>1654</fpage>
          -
          <lpage>1661</lpage>
          , doi: 10.1109/CVPR.
          <year>2006</year>
          .
          <volume>321</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shotton</surname>
          </string-name>
          et al.,
          <article-title>"Realtime human pose recognition in parts from single depth images,"</article-title>
          <source>CVPR</source>
          <year>2011</year>
          , Providence, RI,
          <year>2011</year>
          , pp.
          <fpage>1297</fpage>
          -
          <lpage>1304</lpage>
          , doi: 10.1109/CVPR.
          <year>2011</year>
          .
          <volume>5995316</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. M. V.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gandolfi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Beltrami and M. Schmid</surname>
          </string-name>
          ,
          <article-title>"Skeleton data preprocessing for human pose recognition using Neural Network*,"</article-title>
          <source>2020 42nd Annual International Conference of the IEEE Engineering in Medicine &amp; Biology Society (EMBC)</source>
          , Montreal, QC, Canada,
          <year>2020</year>
          , pp.
          <fpage>4265</fpage>
          -
          <lpage>4268</lpage>
          , doi: 10.1109/EMBC44109.
          <year>2020</year>
          .
          <volume>9175588</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Reily</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Reardon</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>"Simultaneous Learning from Human Pose and Object Cues for RealTime Activity Recognition,"</article-title>
          <source>2020 IEEE International Conference on Robotics and Automation (ICRA)</source>
          , Paris, France,
          <year>2020</year>
          , pp.
          <fpage>8006</fpage>
          -
          <lpage>8012</lpage>
          , doi: 10.1109/ICRA40945.
          <year>2020</year>
          .
          <volume>9196632</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Zhengyuan</surname>
          </string-name>
          ,
          <article-title>"Real-Time Continuous Human Rehabilitation Action Recognition using OpenPose and FCN," 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Shenzhen</article-title>
          , China,
          <year>2020</year>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>242</lpage>
          , doi: 10.1109/AEMCSE50948.
          <year>2020</year>
          .
          <volume>00058</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bulbul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cetin</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Dogru</surname>
          </string-name>
          ,
          <article-title>"Human Activity Recognition Using Smartphones," 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technolo-gies (ISMSIT</article-title>
          ), Ankara,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , doi: 10.1109/ISMSIT.
          <year>2018</year>
          .
          <volume>8567275</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>"Research on human activity recognition based on active learning,"</article-title>
          <source>2010 International Conference on Machine Learning and Cybernetics</source>
          , Qingdao,
          <year>2010</year>
          , pp.
          <fpage>285</fpage>
          -
          <lpage>290</lpage>
          , doi: 10.1109/ICMLC.
          <year>2010</year>
          .
          <volume>5581050</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>K. -S. Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chae and H. -S. Park</surname>
          </string-name>
          ,
          <article-title>"Optimal Time-Window Derivation for Human-Activity Recognition Based on Convolutional Neural Networks of Repeated Rehabilitation Motions,"</article-title>
          <source>2019 IEEE 16th International Conference on Rehabilitation Robotics (ICORR)</source>
          , Toronto, ON, Canada,
          <year>2019</year>
          , pp.
          <fpage>583</fpage>
          -
          <lpage>586</lpage>
          , doi: 10.1109/ICORR.
          <year>2019</year>
          .
          <volume>8779475</volume>
          ..
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nandy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chowdhury and K. P. D. Singh</surname>
          </string-name>
          ,
          <article-title>"Detailed Human Activity Recognition using Wearable Sensor and Smartphones,"</article-title>
          <source>2019 International Conference on Opto-Electronics and Applied Optics (Optronix)</source>
          , Kolkata, India,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , doi: 10.1109/OPTRONIX.
          <year>2019</year>
          .
          <volume>8862427</volume>
          ..
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Saini</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Maan</surname>
          </string-name>
          ,
          <article-title>"Human Activity and Gesture Recognition: A Review," 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), Lakshmangarh</article-title>
          , Sikar, India,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          , doi: 10.1109/ICONC345789.
          <year>2020</year>
          .
          <volume>9117535</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wei</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheikh</surname>
          </string-name>
          ,
          <article-title>"Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          ,
          <year>2017</year>
          , pp.
          <fpage>1302</fpage>
          -
          <lpage>1310</lpage>
          , doi: 10.1109/CVPR.
          <year>2017</year>
          .
          <volume>143</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>