<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Polite Robot: Visual Handshake Recognition Using Deep Learning</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Liutauras Butkus, Mantas Lukoševičius Faculty of Informatics Kaunas University of Technology Kaunas</institution>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>78</fpage>
      <lpage>83</lpage>
      <abstract>
        <p>-Our project was to create a demo system where a small humanoid robot accepts an offered handshake when it sees it. The visual handshake recognition, which is the main part of the system proved to be not an easy task. Here we describe how and how well we solved it using deep learning. In contrast to most gesture recognition research we did not use depth information or videos, but did this on static images. We wanted to use a simple camera and our gesture is rather static. We have collected a special dataset for this task. Different configurations and learning algorithms of convolutional neural networks were tried. However, the biggest breakthrough came when we could eliminate the background and make the model concentrate on the person in front. In addition to our experiment results we can also share our dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>The goal of this project is to create a robot that can visually
recognize an offered handshake and accept it. When the robot
sees a man offering a handshake, it responds by stretching its
arm too. This serves as a visual and interactive demonstration,
which would get students more interested in machine learning
and robotics.</p>
      <p>For this purpose we used a small humanoid robot, a simple
camera mounted on it, and deep convolutional neural networks
for image recognition. The recognition, as well as training of
it, were done on a PC and the command to raise the arm was
sent back to the robot.</p>
      <p>This article mainly shares our experience in developing
and training the visual handshake recognition system, which
proved to not be trivial. In particular, we will discuss how
images were collected, preprocessed, what architecture of
convolutional neural networks was used, how it was trained
and tested; what gave good and what not so good results.</p>
      <p>This document is divided into several sections. Section II
reviews existing solution to similar problem. Section III
introduces our method for this project. Section IV describes
the data set used in this study. Section V emphasizes
importance of data preprocessing before training. Section VI
describes robot interface. Sections VII and VIII provide
analysis of results and conclusions.</p>
      <p>II.</p>
      <p>Copyright held by the author(s).</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Virtually all vision-based hand gesture recognition systems
described in literature use (a) image sequences (videos) with
(b) depth information in them, see [1] for a good recent
survey. Microsoft Kinect [2] and Leap Motion [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ] are two
examples of popular sensors specifically designed for gesture
and posture 3D tracking. While clearly both temporal (a) and
depth (b) aspects are helpful in recognizing hand gestures, our
system uses neither of the two. We (b) used an inexpensive
camera for simple RGB image acquisition to make the system
more accessible and the algorithms more widely applicable,
e.g., in smartphones, natural lighting. We also (a) used single
frames to recognize the extended hand for the handshake,
since the gesture is rather static – just holding the extended
hand still – and could perhaps be called “posture”. This makes
the recognition problem considerably harder.
      </p>
      <p>
        A bit similar project to ours called “Gesture Recognition
System using Deep Learning” was presented in PyData
Warsaw 2017 conference [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ]. The author introduced a
Pythonbased, deep learning gesture recognition model that is
deployed on an embedded system, works in real-time and can
recognize 25 different hand gestures from a simple webcam
stream. The development of the system included: a large-scale
crowd-sourcing operation to collect over 150,000 short video
clips, a process to decide which deep learning framework to
use, the development of a network architecture that allows for
classifications of video clips solely with RGB input frames,
the iterations necessary to make the neural network run in
realtime on embedding devices, and lastly, the discovery and
development of playful gesture-based applications. Their
approach is still different from our approach in that they used
video samples as their input (several frames at a time) and
tried to recognize moving gestures.
      </p>
      <p>
        There is considerable literature similar to our approach in
both (a) and (b) for recognizing sign language hand gestures
(or rather postures) from RGB images, including using deep
learning [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ]. These approaches, however, usually work with
images of a single hand on a uniform background where the
hand can be cropped from the image using thresholding [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ],
skin color [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ], or relying on the subject wearing a
brightlycolored glove [7].
      </p>
      <p>III. OUR METHOD</p>
      <p>The system model consists of several parts showed in
Figure 1 and 2, including camera, camera images
preprocessing, convolutional neural network’s training using deep
learning, graphical user interface, robot interface and robot
itself.</p>
      <p>At first camera was used to collect the image dataset. This
is described in section IV. After later research, which is
described in section VI, the images in the dataset had to be
pre-processed to be able to train the model, which is the next
part of our system. Using Keras library the model was created,
compiled and finally trained with the images (this process
explained in section VI). The final part is to run the model to
recognize new live images. For this reason camera’s interface
was programmed to take photos at every 0.5 second, the model
gets those images as an input and returns probability of seeing
an offered handshake as an output result. If this result is above
a certain threshold, a robot interface sends a command to robot
to perform a corresponding task. This part more deeply
described in section VI.</p>
      <sec id="sec-2-1">
        <title>A. Choice of using deep learning libraries</title>
        <p>
          Deep learning [
          <xref ref-type="bibr" rid="ref7">8</xref>
          ] (also known as deep structured learning
or hierarchical learning) is part of a broader family of machine
learning methods based on learning data representations, as
opposed to task-specific algorithms. Learning can be
supervised, semi-supervised or unsupervised.
        </p>
        <p>Deep learning models are loosely related to information
processing and communication patterns in a biological nervous
system, such as neural coding that attempts to define a
relationship between various stimuli and associated neuronal
responses in the brain.</p>
        <p>
          Deep learning architectures such as deep neural networks,
deep belief networks and recurrent neural networks [
          <xref ref-type="bibr" rid="ref8">9</xref>
          ] have
been applied to fields including computer vision, speech
recognition, natural language processing, audio recognition,
social network filtering, machine translation, bioinformatics
and drug design, where they have produced results comparable
to and in some cases superior to human experts.
        </p>
        <p>
          Convolutional networks [
          <xref ref-type="bibr" rid="ref7">8</xref>
          ], also known as convolutional
neural networks, or CNNs, are a specialized kind of neural
network for processing data that has a known grid-like
topology. Examples include time-series data, which can be
thought of as 1-D grid taking samples at regular time intervals,
and image data, which can be thought of as a 2-D grid of
pixels. Convolutional networks have been tremendously
successful in practical applications. The name “convolutional
neural network” indicates that the network employs a
mathematical operation called convolution. Convolution is a
specialized kind of linear operation. Convolutional networks
are simply neural networks that use convolution in place of
general matrix multiplication in at least one of their layers.
        </p>
        <p>
          Keras [
          <xref ref-type="bibr" rid="ref9">10</xref>
          ] is a high-level deep learning library written in
Python and capable of running on top of either TensorFlow or
Theano deep learning libraries. It was developed with a focus
on enabling fast experimentation. Being able to go from idea
to result with the least possible delay is key to doing good
research. Keras deep learning library allows for easy and fast
prototyping (through total modularity, minimalism, and
extensibility). It supports both convolutional networks (we
used in our solution) and recurrent networks, as well as
combinations of the two. Keras also supports arbitrary
connectivity schemes (including multi-input and multi-output
training) and runs seamlessly on CPU and GPU. The core data
structure of Keras is a model, a way to organize layers. The
main type of model is the Sequential model, a linear stack of
layers. Keras’ Guiding principles include Modularity. A model
is understood as a sequence or a graph of standalone,
fullyconfigurable modules that can be plugged together with as
little restrictions as possible. In particular, neural layers, cost
functions, optimizers, initialization schemes, activation
functions, regularization schemes are all standalone modules
that users can combine to create new models. Each module
should be kept short and simple. To be able to easily create
new modules allows for total expressiveness, making Keras
suitable for advanced research.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Our convolutional neural network model</title>
        <p>The convolutional neural network model that we used is
specified in Figure 3.</p>
        <p>It takes 64x40 resolution images as inputs, consists of
three convolutional layers, each followed by pooling, and has
a single node output. We use rectified linear units in all layers
except for the output node where it is sigmoid.</p>
      </sec>
      <sec id="sec-2-3">
        <title>C. Training process</title>
        <p>Before starting to train the model there are several
parameters which describe training details. The first parameter
is epochs count. Epoch itself is an arbitrary milestone,
generally defined as “one pass over the entire dataset”, used to
separate training into distinct phases, which is useful for
logging and periodic evaluation. In general it means how
many times the process will go through the training set.</p>
        <p>Second parameter is batch size. Batch size defines number
of samples that going to be propagated through the network.
For instance, there are 200 training samples and we want to set
up batch size equal to 30. Algorithm takes first 30 samples
from the training dataset and trains network. Next it takes
second 30 samples and trains network again. The procedure
can be done until we propagate through the networks all
samples. However, the problem usually happens with the last
set of samples. In this example the last 20 samples which is
not divisible by 30 without remainder. The simplest solution is
just to get final 20 samples and train the network.</p>
        <p>We have tried different loss functions and training
optimization methods. The ones that worked reasonably well
in the end are reported in Section VII.</p>
        <p>Train accuracy and train loss are calculated on the go,
during training. Figures in Section VII show how well our
network is doing on the data it is being trained. Training
accuracy usually keeps increasing throughout training.</p>
      </sec>
      <sec id="sec-2-4">
        <title>D. Validation process</title>
        <p>To validate the model we need to have new dataset this
new images, which has not been used in training process.
Validation is usually carried out together with training. After
every epoch, the model is tested against a validation set, and
validation loss and accuracy are calculated. These numbers tell
you how good your model is at predicting outputs for inputs it
has never seen before. Validation accuracy increases initially
and drops as you over fit. Overfitting happens when our model
fits too well to the training set. It then becomes difficult for the
model to generalize to new examples that were not in the
training set. For example, our model recognizes specific
images in your training set instead of general patterns. Our
training accuracy will be higher than the accuracy on the
validation/test set.</p>
        <p>E.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Testing process</title>
        <p>To test the model we need another new dataset. Testing
usually is run manually by giving an image from dataset for
trained model to get a result. And the result is a percent value
that shows probability on each output option</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>IV. DATA COLLECTION PREPARATION</title>
      <p>As we mentioned in the previous section a collection of
image data was needed to implement this project. As the
system only recognizes greetings, only two results are
possible: greetings are recognized or not. During the
development of the whole project, more than 4,000 different
images were collected for the training of the neural network.
Approximately 2000 for each category. Single-image
resolution is 318x198.</p>
      <p>We can see in Figure 4, that in the image, one person was
usually with his hand stuck or not. It was also tried to capture
images in as many different environments as possible. Human
clothing was also varied trying to capture as diverse as
possible colors. This is important in order to ensure that
recognition is not restricted to a particular specific situation.</p>
      <p>
        The pictures were divided into three sets: training,
validation and testing. The neural network is taught with
training data. It is then validated with validation data to verify
that a well-trained neural network performs recognition with
new examples. The test data is intended to validate the final
neural network's capability to obtain the final true recognition.
In addition, data augmentation [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] was used during training,
in which various small transformations were made to the
images before training on them (rotation, translation,
colorshift, up- /down-scaling).
      </p>
      <p>If there are people who are interested in this task, we could
share the data with everyone who wants it.</p>
    </sec>
    <sec id="sec-4">
      <title>BACKGROUND REMOVAL</title>
      <p>Initially, we tried to train the neural network with the data
obtained directly from the camera without preprocessing them.
However, it has been noticed that the model with the best
attempt reached 78 percent training accuracy and about 64
percent validation accuracy followed by overfitting, during
which the error rate increased significantly. For this reason, it
was necessary to look for solutions on how to avoid overfitting
and how to increase the validation accuracy of the model. To
achieve this, attempts were made to change the model's
parameters, but this did not improve result as much as it was
expected. Then it was decided to process the data itself. From
previous experiments, we were able to get the impression that
overfitting appears due to the excessive color gamut and color
of the images. For this reason, we have decided to try
removing background images and training a neural network
with pictures without background. However, that causes a new
problem. How to detect where the background is and where is
an object (in this case a human)? For this problem, we decided
to take the first image without a human and claim that it is a
background and all other images are objects with backgrounds.
Though, in this case camera had to be in fixed position. Then
we were able to subtract two images and get image without a
background. Usually, after subtraction some noise always had
left in images. To reduce it, we set a permissible error for pixel
RGB values. You can see those images in Figure 5.
hackable, modular, humanoid robot development platform
designed from the ground up with customization and
modification in mind. It has built in software which invokes
robot actions. You can see robot’s software interface in Figure
7.</p>
      <p>The result was obvious. It's possible to achieve 91 percent
accuracy with a smooth natural background. This means that
the model is quite precise enough to recognize the extended
hand when the background behind the human is equal and
does not need to be removed.</p>
      <p>In order to obtain this result, first we needed to draw up a
test plan, which would make it clear how the training is most
appropriate. We've identified three methods (regular training,
training with removed background, training with attached
backgrounds), and five types of data (when the background is
a specific color, when the background is smooth and natural,
when the background is static color, when the background is
changing and when the background is with a few outsiders).
All test results are presented in the results Section VII.</p>
    </sec>
    <sec id="sec-5">
      <title>VI.INTERFACING ROBOT</title>
      <p>Robot interface is used when the model is started to predict
new images. After CNN return probability of the image it is
sent to robot interface. Then robot interface reads the input
value and if it is true interface runs command for a robot to
raise its hand.</p>
    </sec>
    <sec id="sec-6">
      <title>VII. EXPERIMENT RESULTS AND ANALYSIS</title>
      <p>In this section we will explain in detail what experiments
were done and what results were achieved.</p>
      <p>As we mentioned in the previous section the first
experiments were carried out using dataset with non-removed
background images for training and validation. The other
parameters were:
 Image width: 64</p>
      <p>
        For this project HR-OS1 Humanoid Endoskeleton robot
[
        <xref ref-type="bibr" rid="ref11">12</xref>
        ] was used. It is showed in Figure 6. It has integrated
onboard Linux computer with Intel Atom processor, which
gives all the processing power to run robot. The HR-OS1 is a


      </p>
    </sec>
    <sec id="sec-7">
      <title>Image height: 40</title>
    </sec>
    <sec id="sec-8">
      <title>Training dataset samples: 421</title>
    </sec>
    <sec id="sec-9">
      <title>Validation dataset samples: 122</title>
    </sec>
    <sec id="sec-10">
      <title>Epochs: 30</title>
      <p>After training we have got the results, which are shown in
Figure 8, and they after final iteration were:
 Training accuracy: 78%
Fig. 8. Training without removing background images results graph.</p>
      <p>The result shows that model validates new images by 64%
accuracy. However, after testing manually this model with
images which very different environments, the result have
been even worse.</p>
      <p>The next experiment were held by training model with
removed background. The model parameters were:
 Image width: 64
 Image height: 40









</p>
      <p>After training we got the results, which are shown in
Figure 9, and they after final iteration were:</p>
      <p> Training accuracy: 91%</p>
      <p>This time the result shows that model validates new images
by 82% accuracy and after testing manually this model with
images which very different environments, the result shown
about the same accuracy.</p>
      <p>All our tests and their results are shown in the table below.
Some of the most sophisticated training experiments have not
been completed, as it was not immediately meaningful to
perform them due to poor results from simple training.</p>
      <p>30 epochs
1. Background of a</p>
      <p>specific color</p>
      <sec id="sec-10-1">
        <title>2. A smooth natural background</title>
      </sec>
      <sec id="sec-10-2">
        <title>3. Static background (colorful)</title>
        <p>4.</p>
      </sec>
      <sec id="sec-10-3">
        <title>Changing background</title>
        <p>A
81/77
52/54
40/50</p>
        <p>Columns of the table represents training of the model,
rows – how model was validated:</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>A - Simple training on original images;</title>
    </sec>
    <sec id="sec-12">
      <title>B - Trained with removed backgrounds;</title>
    </sec>
    <sec id="sec-13">
      <title>C - Trained with background replacements;</title>
      <p>x/y – training accuracy / validation accuracy</p>
      <p>The results showed that the removal of the background
significantly improves the accuracy of the model recognition.
The best results were from experiments where training took
place with removed background pictures and validating with
also removed background images or smooth background
images.</p>
    </sec>
    <sec id="sec-14">
      <title>VIII.</title>
    </sec>
    <sec id="sec-15">
      <title>DISCUSSION AND FUTURE WORK</title>
      <p>In this work several different training experiments were
performed, watching and studying accuracy of the trained
models. The experiments showed that the results depended
more not on the model used and its parameters, but on the
transformation of the images. To achieve the best results
image preprocessing played a key role in this experiment. The
best result was reached by removing background before
training.</p>
      <p>Our interpretation of the results is that removing the
background reduces the variation in the data and makes the
machine learning model focus on the person in the image.
Without the background removal the models are prone to
overfitting, probably basing their decision on wrong features
of the image. Might be that similar accuracy can be achieved
without background removal, but with much more data,
training, and probably more powerful models. In that case the
models have to infer that the person in the foreground is the
most important object in the images and learn how to
distinguish it on its own. Motion or depth information, which
is used in many gesture recognition systems, would also make
separation of the person in front from the background easier
and likely not necessary to be done explicitly.</p>
      <p>This is consistent with results discussed in related work
(Section II) where other authors either use motion and/or depth
information, or also crop the foreground from the background
in some way.</p>
      <p>In our approach it is not necessary to remove the
background during testing/valuation. The best validation
results are on data with smooth natural backgrounds. The
accuracy of this validation data reached 92%. A reasonable
future work would be to attempt to create a model that can
better recognize offered handshakes in a wider range of
environments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Maryam</given-names>
            <surname>Asadi-Aghbolaghi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Albert Clapes</surname>
          </string-name>
          , Marco Bellantonio, Hugo Jair Escalante, “
          <article-title>A survey on deep learning based approaches for action and gesture recognition in image sequences”</article-title>
          ,
          <source>12th IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG</source>
          <year>2017</year>
          ),
          <year>2017</year>
          http://sunai.uoc.edu/~vponcel/doc/survey-deep
          <article-title>-learning_fg2017</article-title>
          .pdf, Microsoft Robotics, “Kinect Sensor”, accessed on 05 - 2018 https://msdn.microsoft.com/en-us/library/hh438998.aspx
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Leap</given-names>
            <surname>Motion</surname>
          </string-name>
          , “Leap Motion - Developer”, accessed on 05 - 2018 https://developer.leapmotion.com/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Joanna</given-names>
            <surname>Materzynska</surname>
          </string-name>
          , “
          <article-title>Building a Gesture Recognition System using Deep Learning”</article-title>
          ,
          <source>PyData Warsaw</source>
          <year>2017</year>
          , https://medium.com/twentybn/building
          <article-title>-a-gesture-recognition-systemusing-deep-learning-video-d24f13053a1</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Oyedotun</surname>
            ,
            <given-names>O.K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Khashman</surname>
          </string-name>
          , “
          <article-title>Deep learning in vision-based static hand gesture recognition“</article-title>
          ,
          <source>Neural Computing and Applications</source>
          (
          <year>2017</year>
          )
          <volume>28</volume>
          :
          <fpage>3941</fpage>
          . https://doi.org/10.1007/s00521-016-2294-8
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Núñez</surname>
          </string-name>
          <string-name>
            <surname>Fernández</surname>
          </string-name>
          , Bogdan Kwolek, “
          <article-title>Hand Posture Recognition Using Convolutional Neural Network” Progress in Pattern Recognition, Image Analysis</article-title>
          ,
          <source>Computer Vision</source>
          , and Applications.
          <source>CIARP 2017. Lecture Notes in Computer Science</source>
          , vol
          <volume>10657</volume>
          . Springer, Cham ,
          <year>2018</year>
          http://home.agh.edu.pl/~bkw/research/pdf/2017/FernandezKwolek_CIA RP2017.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Rosalina</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Yusnita</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Hadisukmana</surname>
            ,
            <given-names>R. B.</given-names>
          </string-name>
          <string-name>
            <surname>Wahyu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Roestam</surname>
            and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wahyu</surname>
          </string-name>
          ,
          <article-title>"Implementation of real-time static hand gesture recognition using artificial neural network,"</article-title>
          <source>2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT)</source>
          ,
          <source>Kuta Bali</source>
          ,
          <year>2017</year>
          http://journal.binus.ac.id/index.php/commit/article/viewFile/2282/3245
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , Yoshua Bengio, Aaron Courville, “Deep Learning”, MIT Press,
          <year>2016</year>
          . http://www.deeplearningbook.org/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Denny</given-names>
            <surname>Britz</surname>
          </string-name>
          , „
          <source>Recurrent Neural Networks Tutorial, Part</source>
          <volume>1</volume>
          - Introduction to RNNs“, accessed on 05 - 2018 http://www.wildml.com/
          <year>2015</year>
          /09/recurrent
          <article-title>-neural-networks-tutorial-part1-introduction-to-rnns/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Keras</surname>
            <given-names>documentation</given-names>
          </string-name>
          , “Why use Keras?”, accessed on 05 - 2018 https://keras.io/why-use-keras/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Prasad</surname>
            <given-names>Pai,</given-names>
          </string-name>
          “
          <article-title>Data Augmentation Techniques in CNN using Tensorflow”</article-title>
          , accessed on 05 - 2018 https://medium.com/ymedialabs-innovation/
          <article-title>dataaugmentation-techniques-in-cnn-using-tensorflow-371ae43d5be9</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Tossen</surname>
            <given-names>Robotics</given-names>
          </string-name>
          , “
          <article-title>HR-OS1 Humanoid Endoskeleton spescifications”</article-title>
          , accessed on 05 - 2018 http://www.trossenrobotics.com/HR-OS1
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Połap</surname>
            , Dawid, Marcin Woźniak, Christian Napoli, Emiliano Tramontana, and
            <given-names>Robertas</given-names>
          </string-name>
          <string-name>
            <surname>Damaševičius</surname>
          </string-name>
          .
          <article-title>"Is the colony of ants able to recognize graphic objects?."</article-title>
          <source>In International Conference on Information and Software Technologies</source>
          , pp.
          <fpage>376</fpage>
          -
          <lpage>387</lpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Woźniak</surname>
            , Marcin, Dawid Połap,
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Napoli</surname>
            , and
            <given-names>Emiliano</given-names>
          </string-name>
          <string-name>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>"Graphic object feature extraction system based on cuckoo search algorithm." Expert Systems with Applications</article-title>
          , vol.
          <volume>66</volume>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>