<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A touchless human- Journal of Ambient Intelligence and Hu</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/34</article-id>
      <title-group>
        <article-title>Italian sign language alphabet recognition from surface EMG and IMU sensors with a deep neural network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paolo Sernani</string-name>
          <email>p.sernani@univpm.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iacopo Pacifici</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Falcionelli</string-name>
          <email>n.falcionelli@pm.univpm.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Selene Tomassini</string-name>
          <email>s.tomassini@pm.univpm.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aldo Franco Dragoni</string-name>
          <email>a.f.dragoni@univpm.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Engineering Department, Università Politecnica delle Marche</institution>
          ,
          <addr-line>Via Brecce Bianche 12, 60131 Ancona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RTA-CSIT 2021: 4th International Conference Recent Trends and Applications In Computer Science And Information Technology</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1746</volume>
      <fpage>105</fpage>
      <lpage>114</lpage>
      <abstract>
        <p>The use of surface electromyography (EMG) and Inertial Measurement Unit (IMU) data emerged as a possible alternative to computer vision-based gesture recognition. As a consequence, the convenience of using such data in the automatic recognition of sign languages, a natural application of gesture recognition, has been investigated in scientific literature. Most of the methodologies and evaluations are based on traditional machine learning techniques, such as SVMs, relying on selected handcrafted features. Instead, leveraging on the findings about deep Long Short Term Memory (LSTM) architectures to process time series, we propose a deep LSTM-based neural network for the recognition of the Italian Sign Language alphabet with surface EMG and IMU data. To preliminary validate our methodology, we collected a dataset recording gesture samples with the Myo Gesture Control Armband. We obtained a 97% accuracy on the proposed dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sign Language Recognition</kwd>
        <kwd>Bidirectional LSTM</kwd>
        <kwd>Long Short Term Memory</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Surface Electromyography</kwd>
        <kwd>EMG</kwd>
        <kwd>Inertial Measurement Unit</kwd>
        <kwd>IMU</kwd>
        <kwd>Italian Sign Language</kwd>
        <kwd>LIS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>mode of non-verbal communication with
computer interfaces [2]. The possible applications
In the last three decades, automatic gesture are countless, including touchless interaction
recognition has been investigated in many with smart objects [3], rehabilitation and
perapplications domains. In fact, hand gestures sonal health systems [4, 5], human-robot
colare recognized as a natural, ubiquitous and laboration [6], interaction with smart home
meaningful part of communicating [1]. There- reasoning systems [7, 8], and many others.
fore, extensive research has been devoted to
making hand gestures a natural and efective</p>
      <sec id="sec-1-1">
        <title>Obviously, the automatic recognition of</title>
        <p>sign language gestures is an eminent
application field for the advancements in gesture
recognition. To this end, the earliest researches
in computer vision [9] evolved with the use of
depth sensors, such as those of the Microsoft
Kinect [10] and Leap Motion [11]. An
alternative methodology is emerging in recent years:
the use of wearable devices with surface
electromyography (EMG) and Inertial
Measurement Unit (IMU) sensors [12]. Using EMG
and IMU sensors has the disadvantage of
forcing a user to wear the device (on both hands,
for complex gestures). However, it does not work, explaining the setup of the experiments
require a fixed camera which might be vul- and presenting the results. Finally, Section 5
nerable to varying lighting conditions, in ad- draws the conclusions of this research work.
dition to having a limited range of vision and
causing privacy issues.</p>
        <p>In this regard, we present a deep learning 2. Related Works
methodology for the recognition of the Italian
Sign Language (LIS) alphabet using EMG and
IMU data. Specifically, this paper adds the
following contributions to the state of the art
about sign language gesture recognition:
The use of EMG and IMU data for the
recognition of sign language gestures has been
validated by several studies. For example, Savur
and Sahin [15] got 91% accuracy on the
American Sign Language (ASL) alphabet, using a
• we propose a deep neural network ar- Support Vector Machine (SVM) classifier. Wu
chitecture to classify the EMG and IMU et al. [12] proposed the design of a wearable
data corresponding to the 26 letters of device and a feature selection method to
colthe LIS alphabet. We based our network lect EMG and IMU data for the recognition
on the bidirectional Long Short Term of gestures. They validated their proposal on
Memory (LSTM) architecture, as it has the ASL gestures, getting a top accuracy of
been already proven useful to process 96% with a comparison of traditional machine
time series, e.g. in speech [13] and ges- learning approaches (Nearest Neighbor, Naive
ture recognition [14]; Bayes, Decision Tree, and SVM). In [16] Abreu
et al. evaluated the use of the Myo Armband
• we propose a dataset with 30 gesture for the Brazilian Sign Language alphabet by
samples for each letter of the LIS alpha- defining 20 SVM binary classifiers to
recogbet, collected to preliminary evaluate nize 20 letters, in a one-vs-all strategy.
Simour approach. Each sample includes ilarly to these works, we use EMG and IMU
the data from the 8 EMG sensors and data (from the Myo Armband) to recognize
the IMU of the Myo Gesture Control the letters of the LIS alphabet. However,
inArmband, a commercial wearable de- stead of relying on traditional machine
learnvice designed to collect EMG signals ing methods and feature selection, we propose
and IMU data when moving the hand a deep neural network, leveraging on a deep
and the arm. architecture to learn the gesture
representaTo guarantee the reproducibility of our ap- tion which allows the classification.
proach, as well as encourage further develop- Recurrent Neural Networks, in particular
ments of the research in this field, the experi- those based on the LSTM and bidirectional
ments and the dataset are publicly available LSTM architectures, have been validated for
in two dedicated GitHub repositories. representing and classifying complex
sequen</p>
        <p>The rest of the paper is structured as fol- tial data simultaneously, such as in
modellows. Section 2 lists some studies related to ing human gesture structure and temporal
the presented research. Section 3 explains the dynamics [14]. Some research works are
preproposed approach, with the necessary back- senting LSTM-based architectures for sign
ground about the LSTM architecture, and de- language recognition. For example, Liu et
scribes the dataset collected to evaluate our al. [17] propose to use the LSTM
architecmethod. Section 4 discusses a preliminary ture to perform recognition by analyzing the
experimental evaluation of our neural net- trajectory of skeleton joints provided by the</p>
      </sec>
      <sec id="sec-1-2">
        <title>Microsoft Kinect; Guo et al. [18] combine a</title>
        <p>3D Convolutional Neural Network with the
LSTM to classify gestures from videos, in a
transfer-learning approach; Mittal et al.
design a LSTM-based architecture to recognize
words and sentences of the Indian Sign
Language from Leap Motion data [19]. Similarly
to these works, we also based our system on
the LSTM architecture, but we rely on EMG
and IMU data, instead of visual data. In the
need of data to train our method, we
synthetically augmented our dataset to preliminary
validate our method, using data
augmentation also to add intra-class variation in our
samples and prevent overfitting.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Materials and Methods</title>
      <p>xt
ft
Forget
Gate
ht
tanh
ct
tanh
ot
Output
Gate
it
Input</p>
      <p>Gate
Memory
Cell
xt
xt
Recurrent Neural Networks (RNN) use recur- xt
rent connections to model the flow of time Figure 1: An LSTM unit, with the input/output
in a sequence of data [20], and are therefore of the memory cell regulated by the input, output,
particularly suited to work with time series. and forget gates.</p>
      <p>LSTM are a type of RNN which are capable of
learning long-time dependencies in the data. gates.</p>
      <p>As we want to recognize gestures from a se- As pointed out in [13], the output ℎ at time
quence of time-ordered EMG and IMU data, point  of an LSTM hidden unit is regulated
our system is based on the LSTM architecture. by the following equations:
Moreover, we also collected a dataset to test
the accuracy of the proposed system in the   =  (    +  ℎ ℎ −1 +     −1 +   )
recognition of the LIS gestures.
  =  (    +  ℎ ℎ −1 +     −1 +   )</p>
      <sec id="sec-2-1">
        <title>3.1. LSTM and Bidirectional</title>
      </sec>
      <sec id="sec-2-2">
        <title>LSTM</title>
        <sec id="sec-2-2-1">
          <title>LSTM is a well-known RNN architecture, pro</title>
          <p>posed by Hochreiter and Schmidhuber [21]. where   ,   ,   , and   are the activation
vecAs showed in Figure 1, the basic hidden unit tors of the input gate, forget gate, output gate,
of a LSTM network is composed of a self- and memory cell at time point  ,  is the
recurrent cell, called memory cell, whose in- sigmoid function,  denotes the bias of each
put/output is regulated by three multiplicative gate/cell, and  are diagonal weight matrixes.
gates, i.e. the input gate, the output gate, and The output vector   at time point  of an
hidthe forget gate. A LSTM layer is composed by den layer is therefore given by:
a series of such units and the network
interacts with the memory cells only by using the   =  ℎ ℎ +  
  =     −1 +   tanh(    +  ℎ ℎ −1 +   )
  =  (    +  ℎ ℎ −1 +     +   )
ℎ =   tanh(  )
where  ℎ is the weight matrix and   the for the acquisition of each sample was 2
secbias vector. onds, sampling both the EMG and IMU data</p>
          <p>Traditional LSTMs, as RNNs in general, pro- at 200 Hz. The subject was required to
selfcess input data in ascending temporal order. collect the samples with a desktop application
Therefore, their outputs is mostly based on that we developed specifically for the gesture
previous context. However, when data is pro- acquisition.
cessed at once, as it might happen with the Each data sample for each gesture
repreclassification of gestures, the recognition of a senting a letter is included in a json file
conpattern might be more efective with the use taining both the EMG and the IMU data. The
of future context as well. To this end, Bidi- EMG data is organized into an emg object
inrectional RNNs [22] and, specifically, Bidirec- cluding the following fields:
tional LSTMs [20] have been proposed. The • frequency, i.e. the sampling frequency
basic idea of such models is to present the (in Hz) of the values from the EMG
sentraining sequences both forwards and back- sors. This value is 200 for all the
samwards, using two separate recurrent nets, which ples;
are connected to the same output layer.</p>
          <p>Therefore, we based our deep neural
network on the Bidirectional LSTM architecture,
as the gesture are processed once done,
taking advantage of both previous and future
context.
• data, a 400 x 8 integer matrix. Each row
is then an 8-dimensional array
including the values from the 8 EMG sensors
of the Myo Armband. Therefore, data
is the time series of the values from the
EMG sensors during the acquisition of
the gesture.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>3.2. Proposed Dataset</title>
        <p>To evaluate the proposed architecture, we de- Similarly, the IMU data of the sample is
orgaveloped a dataset including all the 26 gestures nized into an imu object with the following
of the LIS alphabet. Most of the letters of the ifelds:
alphabet is represented with static gestures, • frequency, i.e. the sampling frequency
while the “G”, “H”, and “Z” are performed (in Hz) of the values from the IMU. This
by moving the hand as well. We recorded 30 value is 200 for all the samples;
samples for each letter, building a dataset
composed of 780 samples. The dataset is publicly • data, a 400 elements length object array.
available as a GitHub repository1. Each object has three fields, namely
gy</p>
        <p>All the collected gesture were performed roscope (an array composed by 3
floatby the same person (male, 24 years old) wear- ing point values), acceleration (an array
ing a Myo Gesture Control Armband2 on his composed by 3 floating point values),
right arm, always in the same position. In fact, and rotation (an array composed by 4
each sample of the dataset is composed of the lfoating point values).
raw data produced by the 8 EMG sensors and In addition, each json file includes a
timesIMU of the Myo Armband. The time window tamp, representing the date and time of the
gesture acquisition, and the duration of each
1https://github.com/airtlab/An-EMG-and-IMU- acquisition, which is 2000 for all the samples.
Data2shett-tpfosr:/-/twhee-bI.taarlcihaniv-eS.iogrng-/Lwaenbg/2u0a2g0e0-5A2l8p1h1a1b8e2t2/ The information about the acquisition
durahttps://support.getmyo.com/hc/en-us/articles/ tion and the sampling frequency are
redun202648103-Myo-Gesture-Control-Armband-tech-specs dant in the current version of the dataset, as
they are the same for all the gestures. How- Table 1
ever, this information might be useful in the The deep neural network model used for the
gesfuture, when we might add samples varying ture recognition. The total number of trainable
the acquisition time window or the sampling parameters is 87,514.
frequency. The complete dataset specification Layer Output Shape Param #
is available in a dedicated open-access data
paper [23].</p>
      </sec>
      <sec id="sec-2-4">
        <title>3.3. System Architecture</title>
        <p>Figure 2 depicts the architecture of the
proposed gesture recognition system, used to
identify the gestures of the LIS alphabet. The x 14 matrix, i.e. there are 400 14-dimensional
user performs the gesture wearing the Myo vectors for each samples. The first network
Armband; the data from the EMG sensors and layer is a bidirectional LSTM. It processes the
the IMU are the input for our deep neural input with 64 hidden units, returning in
outnetwork, based on the Bidirectional LSTM ar- put 128 hidden state values (64 for the forward
chitecture. The system labels the input data sequence, 64 for the backward sequence) for
with one of the 26 letters of the alphabet, iden- each of the 400 vectors in a sample. In fact,
tifying the gesture made by the user. As ex- each hidden unit is configured to output a
plained in Section 4, to evaluate our system, value for each vector in the sample matrix, as
we synthetically augmented the data in our proposed by Graves et al [24] to stack
muldataset during the training process, trying to tiple LSTM layers. Thus, the second layer
use more samples and reduce overfitting. is a also a bidirectional LSTM. However,
be</p>
        <p>Table 1 lists all the layers included in our ing the last recurrent layer, each of the 32
deep neural network. Among the available hidden units returns a single value for the
data, we used the 8 series with the values entire sample. Therefore, the output of the
from the 8 EMG sensors of the Myo Arm- second layer is composed of 64 values (32 for
band. Concerning the IMU, we took the two the forward sequence, 32 for the backward
3-dimensional vectors with values from the one). A 50% dropout performs the diluition
accelerometer and the gyroscope. Therefore, of the LSTM output, to prevent overfitting.
each sample is fed into the network as a 400
Bi-LSTM
Bi-LSTM
Dropout (0.5)
Fc1 (ReLU)
Dropout (0.5)
Fc2 (Softmax)
(400, 128)
(64)
(64)
(64)
(64)
(26)
gyroscopse are 3D vectors, they can be easily
ifrst includes 64 hidden units, using the
rectiifer as the activation function. After another
50% dropout for regularization, the output is
processed by the 26 units of the second fully
connected layer. The softmax activation
function of each unit computes the probability
distribution over the 26 classes, i.e. the letters
of the LIS alphabet.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experimental Evaluation</title>
      <sec id="sec-3-1">
        <title>We evaluate our model by collecting prelim</title>
        <p>inary results on the proposed dataset. We
actually want understand to which extent our
deep neural network is a viable solution to
recognize the LIS gestures based on EMG and
IMU sensor data. As the collected dataset
includes only 780 samples, which might be too
rotated by multiplying with a rotation matrix:
⎡1
⎢
⎢
⎣
0
0</p>
        <p>0
cos( )
− sin( )</p>
        <p>0 ⎤
sin( )⎥
cos( )⎥⎦
Such transformation rotates the coordinate
system of  degrees, counterclockwise, around
the x-axis. Ohashi et al also propose the
following formulation to apply the same rotation
to the data of 8 EMG sensors:</p>
        <p>( )


( )</p>
        <p>=
 = ⌊ / ⌋
 = 360/
 =  / − 
( )
 −</p>
        <p>+  (1 −  )
 ( ) +  (1 −  )
( )
 − −1
jective of testing with more data and prevent
overfitting. Even if data augmentation has
few for a deep learning approach, we also ap- Here, 
plied data augmentation, with the twofold ob- sensor when rotating the armband of 
de( ) is the reading of the ℎ</p>
        <p>EMG
grees; 
sor in the original data; 
( ) is the reading of the  -th
senis the number of
some threats to validity.
stage, and therefore inevitably sufer from
the experiments should be considered as early
been proven useful to get general results [25], available EMG sensors;  ( ) is the polynomial
function  ( ) =  2. Intuitively, if the rotation
places the
 -th sensor between the original</p>
        <sec id="sec-3-1-1">
          <title>4.1. Data Augmentation</title>
          <p>To augment the proposed dataset, we apply
the technique presented in [26]. Ohashi et
al. point out that, during the gesture
recognipositions of the  -th and ( + 1)-th sensors, the
reading of the  -th sensor in the rotated data
is computed as the interpolation of the
readdistance from those sensors.
ings of the  -th and ( + 1)-th sensors and the</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Therefore, we apply such rotation technique</title>
        <p>tion with a wearable device such as the Myo to our data, given that, with this approach,
Armband, the user is supposed to wear the de- Oshahi et al. got better performance than
vice always with the same configuration (i.e.
augmenting data with gaussian noise, with
identical placement and rotation). In this way, rotating data around all the three axis, and
the sensors would be attached to the user’s
arm in the same positions every time the
dewith linear interpolation. As in their work,
we rotate the data with the angles in the
folvice is used. However, a displacement is very lowing set:
likely to happen when detaching and
attaching the device again. Therefore, samples with
various rotation angles are desirable in the
training data of a gesture recognition model. By rotating the data, we get 780 samples for
{−30◦, −22.5◦, −15◦, −7.5◦, 7.5◦, 15◦, 22.5◦, 30◦}
each angle, adding the 6,240 synthetic samples
to the 780 originally collected with the Myo
Armband.</p>
        <sec id="sec-3-2-1">
          <title>4.2. Experimental Setup</title>
          <p>We tested the proposed deep neural network without DA
on the original dataset, as well as on the aug- with DA
mented dataset. We applied a stratified shufle
split cross-validation scheme to validate the
accuracy of our model. To this end, we firstly 4.3. Results
repeated a randomized 80-20 split 5 times, us- Table 3 shows the prediction accuracy on the
ing the 80% of the data as the training set, and test set obtained by repeating 5 times the
stratthe 20% as the test set, preserving the percent- ified shufle split of the dataset. With the 780
age of samples from each class, in each split. samples of the original dataset, the mean
acThe 12.5% of the training data, i.e. the 10% curacy is 57.44% with a standard deviation of
of the entire dataset, was used as validation 5.46% over the 5 splits of the experiment. In
data for the training of the neural network. other words, around half of the test samples
Then, we repeated the same randomized split gets misclassified. In fact, using only 576
sam30 times on each dataset, to collect more gen- ples for the network training (with 78 samples
eral results. used as validation data) results in a poor
per</p>
          <p>We used the Root Mean Square Propagation formance of our model.
(RMSProp) optimizer to minimize the Cate- Instead, with the 7,020 samples of the
auggorical Cross-Entropy loss function during mented dataset, the mean accuracy increases
the training of the neural network. The num- to 97.36%, and the standard deviation decreases
ber of training epochs varied for each split, as to 0.62% over the 5 splits. Using 4,914
samwe early stopped the training after 5 epochs ples for training (with 702 samples used for
without an improvement on the minimum validation) significantly improves the
perforvalidation loss, restoring the weights corre- mance of our model. The lower standard
desponding to the best validation loss. Table 2 viation shows that the model trained on the
shows the number of training epochs in each augmented dataset exhibits a better
generalsplit, in the 5 split experiments. For the 30 ization. Intuitively, most of the
misclassificasplit experiments the mean number of train- tion errors occurs with gestures which look
ing epochs was 42.77 (± 9.01) for the original similar. For example, in the first split, the “V”
dataset, and 37.67 (± 7.80) on the augmented is erroneously identified as the “U” 9 times
dataset. The batch size was 32 samples in each and as the “F” one time, while other 44
samsplit of each experiment. ples are correctly identified. Similary, 3 “U”</p>
          <p>A Jupyter notebook with the described ex- samples are wrongly identified as “V”. In the
periments is available in a GitHub public repos- same split, the “W” is misclassified only one
itory3, in order to guarantee the reproducibil- time, being identified as the “V”.
ity of the tests. The tests ran on Google Colab The results are similar when repeating the
with the GPU runtime, using Keras 2.4.3, Ten- tests on 30 random stratified shufle splits of
sorFlow 2.4.1, and scikit-learn 0.22.2.post1. the dataset, as showed in Table 4. The mean
value of accuracy is 58.69% (± 4.37%) for the
original dataset and 97.07% (± 1.32%) on the</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3https://github.com/airtlab/italian-sign-language</title>
        <p>recognition/
Table 4 the gesture acquisition to 2 seconds. Such
Mean number of training epochs and mean accu- time window is worth of further research, as
racy on 30 random stratified shufle splits, with this time might vary from person to person
and without Data Augmentation (DA). and also for more complex gestures.</p>
        <p>Epoch # Accuracy Concerning the presented results, we built
without DA 42.77 ± 9.01 58.69 ± 4.37% our model on the results of existing literature
with DA 37.67 ± 7.80 97.07 ± 1.32% about LSTMs to process time series, especially
in speech and gesture recognition. However, a
augmented dataset. Therefore, both in the systematic study on alternative models as well
experiments with 5 splits and 30 splits, the as a comparison on more datasets should be
training on augmented data is more stable performed to get more results, and therefore
than with the original data, resulting in a validate our method.
lower standard deviation on the test accuracy.</p>
        <p>Moreover, the tests did not highlight any sig- 5. Conclusions
nificant diference in the recognition of static
gestures (most of the letters) with respect to We presented a deep learning approach for
the dynamic ones (“G”, “H”, and “Z”), scoring the recognition of the LIS alphabet, based on
similar class-wise precision and recall values. surface EMG and IMU data. Specifically, we</p>
        <p>These preliminary results encourage the developed a deep neural network based on the
use of wearable devices equipped with EMG bidirectional LSTM architecture. To validate
and IMU sensors to execute the recognition our method, we built a dataset including 30
of the LIS with deep neural networks. Most gesture samples for each letter of the alphabet.
of the samples gets correctly identified by our The gestures were recorded from the 8 EMG
LSTM-based model. As expected, the data sensors and the IMU of the Myo Armband.
augmentation improves the performance, and To ensure the proper training of our model,
our model gets better results with more data, with enough samples, we used data
augmentahighlighting the need of expanding the col- tion, simulating the rotation of the armband.
lected dataset. The results are preliminary, but promising:
on the augmented dataset, our model got 97%
4.4. Threats to validity accuracy, showing few classification errors
Being in early stage, the presented research on very similar gestures. The source code of
inevitably sufers from some threats to valid- the experiments and the dataset are available
ity. Concerning the collected dataset, all the as public GitHub repositories, to guarantee
gesture samples were performed by the same the reproducibility of the tests. Moreover, the
subject. Samples from more subjects are nec- public dataset is available for further tests.
essary to get more general conclusions. More- The presented research is in early stage,
over, we arbitrary fixed the time window for since a systematic study of alternative deep</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>