<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Investigation of optimal configurations of a convolutional neural network for the identification of objects in real-time</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M A Isayev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D A Savelyev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute of RAS - Branch of the FSRC "Crystallography and Photonics" RAS</institution>
          ,
          <addr-line>Molodogvardejskaya street 151, Samara, Russia, 443001</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoe Shosse 34А, Samara, Russia, 443086</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>417</fpage>
      <lpage>423</lpage>
      <abstract>
        <p>The comparison of different convolutional neural networks which are the core of the most actual solutions in the computer vision area is considers in hhe paper. The study includes benchmarks of this state-of-the-art solutions by some criteria, such as mAP (mean average precision), FPS (frames per seconds), for the possibility of real-time usability. It is concluded on the best convolutional neural network model and deep learning methods that were used at particular solution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        At present, the field of computer vision is actively developing, especially at the time of emergence of
convolutional neural networks (CNN) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and unmanned devices [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Another integral part of
computer vision field is the detection of objects and classifying based on special object features [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Object detection is successfully used in vehicle tracking, positioning, surveillance [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5-7</xref>
        ]. The
difference between the classification and detection algorithms is that the detection algorithms contain
the boundaries of the region of interest (object) and are defined in the image. It is also worth noting the
big difference between the concepts of classification and clustering - the main difference between
classification and clustering is that when solving the classification problem, groups of objects are
already known, while clusters are already determined at the moment of solving the clustering problem.
Obviously, for object detection tasks, you should not use a regular neural network with a fully
connected layer at the end. This is due to the fact that, as a rule, the length of the output layer is
dynamic, which is associated with a non-fixed number of appearing objects.
      </p>
      <p>
        One approach to solve this problem is to obtain different areas from an image (Region of Interests)
and use convolutional neural networks to determine the presence of an object within this area. This
solution does not take into account the possibility of a different location of the object and different
proportions of the side. Consequently, it will be necessary to process a huge number of such areas,
which is so expensive in terms of computational power. Another solution is special algorithms that
were developed for the problem of detecting objects in real time. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Solutions in the field of image recognition in real time are divided into two main families: Region
Proposes (the frame regions are alternately proposed and classified) and Single Shot (all objects are
immediately detected on the resulting image). The first family includes neural networks such as
RCNN, Fast R-CNN, Faster R-CNN [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9-11</xref>
        ]. The second family includes YOLO, SSD [
        <xref ref-type="bibr" rid="ref12 ref5">5, 12</xref>
        ]. Neural
networks that use recognition by region have a rather slow recognition time in the qualitative
determination of objects.
      </p>
      <p>In this paper, we study the determination of the optimal solution for the problem of detecting
objects in real time based on testing solutions of R-CNN, R-FCN, SSD (VGG-16), YOLOv3
(Darknet-53), based on metrics such as mAP and FPS.</p>
    </sec>
    <sec id="sec-2">
      <title>2. A convolutional neural network used to identify objects</title>
      <p>
        The following parameters were used for the study: dataset - PASCAL VOC 2012 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Faster R-CNN,
R-FCN, SSD (VGG-16), YOLOv3 (Darknet-53) solutions were tested. Faster R-CNN uses RPN
(Region Proposal Network) instead of the slow selective search algorithm. RPN is a complete
replacement for the selective algorithm, and works as follows: at the last level of the original CNN, a
3x3 sliding window bypasses the feature map and reduces its dimension and for each sliding window
position, RPN generates many possible areas based on the k boundaries of a possible object. R-FCN,
or Region-based Fully Convolutional Net is a fully connected network and raises one of the main
problems in the design of neural networks. On the one hand, when performing a classification of an
object, it is necessary to train the model on the property of the invariance of the object's location:
despite where the object appears in the image, the object must be uniquely determined. On the other
hand, it is necessary that the trained model selects the boundaries of the object in the place of the
image where it appears (local variation). The compromise between variance and location invariance is
to use positional scorecards. The input image is processed by CNN, adding a fully connected layer to
create a storage of position-sensitive rating maps that RoI generates. Next, for each region, the
assessment storage is checked for the fact whether this region is the corresponding position of some
object.
      </p>
      <p>SSD, in contrast to Faster R-CNN, which uses algorithms for the regional classification and
generation of prediction domains, simultaneously determines the frame of the object, as well as its
class at the time of image processing. The SSD sends the image for processing through a series of
convolutional layers, receiving several sets of feature maps, for each position in each of these feature
maps, a 3x3 convolutional filter is used to obtain a set of reference coordinates of the image
boundaries, where for each set of coordinates, the offset and the probability of being within the
boundaries of these object coordinates.</p>
      <p>YOLOv3, like SSD, belongs to the Single Shot family, and also uses softmax with independent
logistic classifiers to calculate the similarity of the input data with a particular class. Instead of using
MSE (mean squared error) to compute a classification error, YOLOv3 uses binary cross-entropy for
each class label. To determine the coordinates of the boundaries of the object, YOLOv3 uses the
kmeans clustering algorithm.</p>
    </sec>
    <sec id="sec-3">
      <title>3. An anthropometric model-based method for extracting facial specified features to improve classification</title>
      <p>
        Detection of objects in images and videos using neural networks, including the identification of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
real-time faces [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is an important task today. In particular, V.S. Gorbatsevich and al. propose
original iterative proposal clustering (IPC) algorithm for aggregation of output face proposals formed
by CNN and the 2-level “weak pyramid” providing better detection quality on the testing sets
containing both small and huge images [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>To extract important features from person`s face at the first stage, it is necessary to mark the key
points on the person’s face, which will determine the relative position of the main elements of the
person’s face and take the necessary measurements, which in the next stage are fed to the classifier.
figure 1 shows the 68 face landmarks on the left and the main anthropometric parameters on the right.
The key points of a person’s face are used to mark the image of protruding regions of a person’s face,
such as: eyes, eyebrows, nose, mouth, jaw line. At the moment, they are actively used in applications
for aligning faces in an image, presenting a model of a pose of a human head, detecting flicker, etc.
Data Science</p>
      <p>
        The detection of such points is a subtask of determining the shape and shape of the input object.
Obtaining an image as an input, the predictor tries to localize the key points, taking into account the
shape and shape of the object [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        At the stage when the necessary key points are known and their coordinates relative to the whole
image, the necessary proportions are calculated based on the anthropometric model [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Facial
measurements are presented at the figure 2.
.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. The method based on hybrid Hesse filter for extracting facial specified features to improve classification</title>
      <p>his age increases.</p>
      <p>The Hesse filter is a tool for detecting wrinkles in the input image. After all, it is no secret that the
appearance of wrinkles on a person’s face is the most expected changes in a person’s face and skin as</p>
      <p>
        The algorithm is based on the use of the Hesse matrix and directional gradient [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The Hessian
hybrid filter detects wrinkles by computing the Hessian matrix for each pixel of the input image. The
maximum eigenvalues of the Hessian matrix indicate that a particular pixel of the image belongs to the
outline of the wrinkle, regardless of its position. The eigenvalues in this context are independent
vector measurements for the components of the second derivatives at each point. A small value of the
eigenvalue indicates a weak rate of change of the surface of the face in the corresponding direction of
the vector of eigenvalues, and vice versa.
      </p>
      <p>First of all, a two-dimensional image is converted into an image with only one shade. We denote
the image as I (x, y). The gradient (Gx, Gy) is calculated from the single-channel image as: ∆I (x, y) =
=. Ha Hb 

 Hb Hc 
(1)</p>
      <p>At the stage when arrays of features are received, these age signs are fed to the input of the
classifier, for its subsequent training. In the framework of this work, classifiers were used based on the
random forest algorithm.</p>
      <p>
        Random forest is an ensemble decision tree algorithm because the final prediction, in the case of a
regression problem, is an average of the predictions of each individual decision tree, in classification;
it's the average of the most frequent prediction [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. So, the algorithm takes the average of many
decision trees to arrive at a final prediction, as shown on figure 4.
      </p>
      <p>As part of this work, the following architectures of convolutional neural networks were used, which
were trained to solve the problem to the identification of objects: InceptionV3, ResNet50.</p>
      <p>
        It is quite obvious that convolutional neural network models, their configuration and training is
very resource-intensive, even on modern computers. In view of this, many researchers have proposed
and developed modifications of convolutional neural networks (such as residual networks and
inception blocks), the task of which was to simplify the initial problem with resources [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>Comparison of the results of age estimation task by the considered methods showed: accuracy of
RF + HHF method is 81.1%, accuracy of Inception v3 is 80.8%, and accuracy of ResNet50 is 85.7%.</p>
      <p>The generalized result of the objects identification research is given in table 1.</p>
      <p>Figure 5 shows the number of labeled data by class, and figure 6 shows the results of the average
detection accuracy for some classes of objects when using the YOLOv3 solution.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We compared various convolutional neural networks in this paper. The study includes comparing data
on computer vision solutions for such criteria as mAP, FPS, i.e. the possibility of using them in real
time. Based on the study, it was shown that the most suitable solution for real-time object
identification task is YOLOv3. Despite not the highest mAP rating, YOLOv3 has a high processing
speed of the video stream. Therefore, YOLOv3 has great prospects as a tool for tracking and detecting
objects in a video stream.</p>
      <p>As a result of the study, it is shown that convolutional neural networks successfully cope with the
task of automatically determining a person’s biological age on his face: : accuracy of RF + HHF
method is 81.1%, accuracy of Inception v3 is 80.8%, and accuracy of ResNet50 is 85.7%.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was financially supported by the Ministry of Science and Higher Education of the Russian
Federation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caballero</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huszar</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Totz</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aitken</surname>
            <given-names>A P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bishop</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rueckert</surname>
            <given-names>D</given-names>
          </string-name>
          and
          <article-title>Wang Z 2016 Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network</article-title>
          <source>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          <year>1874</year>
          -1883
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Szegedy</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ioffe</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlens</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wojna Z 2016 Rethinking</surname>
          </string-name>
          <article-title>the inception architecture for computer vision Proceedings of the IEEE conference on computer vision and pattern recognition 2818-2826</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Pestana</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez-Lopez J L</surname>
          </string-name>
          ,
          <string-name>
            <surname>Saripalli</surname>
            <given-names>S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Campoy</surname>
            <given-names>P 2014</given-names>
          </string-name>
          <article-title>Computer vision based general object following for gps-denied multirotor unmanned vehicles American Control Conference (ACC)</article-title>
          <year>1886</year>
          -
          <fpage>1891</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Magdeev</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tashlinskii</surname>
            <given-names>A G</given-names>
          </string-name>
          <year>2019</year>
          <article-title>Efficiency of object identification for binary images</article-title>
          <source>Computer Optics</source>
          <volume>43</volume>
          (
          <issue>2</issue>
          )
          <fpage>277</fpage>
          -
          <lpage>281</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2019-43-2-
          <fpage>277</fpage>
          -281
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Redmon</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Divvala</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <article-title>Farhadi A 2016 You only look once: Unified, real-time object detection Proceedings of the IEEE conference on computer vision and pattern recognition 779-788</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Protsenko</surname>
            <given-names>V I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazanskiy N L and Serafimovich P G 2015</surname>
          </string-name>
          <article-title>Real-time analysis of parameters of multiple object detection systems</article-title>
          <source>Computer Optics</source>
          <volume>39</volume>
          (
          <issue>4</issue>
          )
          <fpage>582</fpage>
          -
          <lpage>591</lpage>
          DOI: 10.18287/
          <fpage>0134</fpage>
          -2452- 2015-39-4-
          <fpage>582</fpage>
          -591
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kazanskiy</surname>
            <given-names>N L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Protsenko</surname>
            <given-names>V I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Serafimovich P G 2017</surname>
          </string-name>
          <article-title>Perfomance analisys of real-time face detection system baced on stream data mining frameworks</article-title>
          <source>Procedia Engineering</source>
          <volume>201</volume>
          <fpage>806</fpage>
          -
          <lpage>816</lpage>
          DOI: 10.1016/j.proeng.
          <year>2017</year>
          .
          <volume>09</volume>
          .602
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Girshick</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            <given-names>T</given-names>
          </string-name>
          and
          <string-name>
            <surname>Malik</surname>
            <given-names>J 2014</given-names>
          </string-name>
          <article-title>Rich feature hierarchies for accurate object detection and semantic segmentation</article-title>
          <source>Proceedings of the IEEE conference on computer vision and pattern recognition 580-587</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Dai</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun J 2016 R-fcn</surname>
          </string-name>
          :
          <article-title>Object detection via region-based fully convolutional networks</article-title>
          <source>Advances in neural information processing systems 379-387</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Girshick R 2015 Fast</surname>
          </string-name>
          r-cnn
          <source>Proceedings of the IEEE international conference on computer vision 1440-1448</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ren</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sun J 2015 Faster</surname>
          </string-name>
          r
          <article-title>-cnn: Towards real-time object detection with region proposal networks</article-title>
          <source>Advances in neural information processing systems 91-99</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Liu</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anguelov</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szegedy</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            <given-names>C Y</given-names>
          </string-name>
          and
          <string-name>
            <surname>Berg A C 2016</surname>
          </string-name>
          <article-title>Ssd: Single shot multibox detector European conference on computer vision</article-title>
          (Springer, Cham)
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Nemirovskiy</surname>
            <given-names>V B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            <given-names>A K</given-names>
          </string-name>
          <year>2017</year>
          <article-title>Clustering face</article-title>
          images
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>1</issue>
          )
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-1-
          <fpage>59</fpage>
          -66
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Vizilter Yu</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorbatsevich</surname>
            <given-names>V S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorotnikov</surname>
            <given-names>A V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kostromov</surname>
            <given-names>N A</given-names>
          </string-name>
          <year>2017</year>
          <article-title>Real-time face identification via CNN and boosted hashing forest</article-title>
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>2</issue>
          )
          <fpage>254</fpage>
          -
          <lpage>265</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-2-
          <fpage>254</fpage>
          -265
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Gorbatsevich</surname>
            <given-names>V S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moiseenko</surname>
            <given-names>A S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vizilter Y V 2019 FaceDetectNet:</surname>
          </string-name>
          <article-title>Face detection via fully-convolutional network</article-title>
          <source>Computer Optics</source>
          <volume>43</volume>
          (
          <issue>1</issue>
          )
          <fpage>63</fpage>
          -
          <lpage>71</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2019-43- 1-
          <fpage>63</fpage>
          -71
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Krizhevsky</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            <given-names>G andSutskever</given-names>
          </string-name>
          <article-title>I 2012 ImageNet Classification with Deep Convolutional Neural Networks NIPS'12</article-title>
          <source>Proceedings of the 25th International Conference on Neural Information Processing Systems</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          )
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Karthikeyan</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balakrishnan</surname>
            <given-names>G 2018</given-names>
          </string-name>
          <article-title>A comprehensive age estimation on face images using hybrid filter-based feature extraction Biomedical Research Medical Diagnosis and Study of Biomedical Imaging Systems</article-title>
          and Applications 472-480
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Bosch</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            <given-names>A</given-names>
          </string-name>
          and
          <string-name>
            <surname>Munoz X 2007 Image</surname>
          </string-name>
          <article-title>Classification using Random Forests</article-title>
          and
          <source>Ferns 11th International Conference on Computer Vision 1-8</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Szegedy</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ioffe</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            <given-names>V</given-names>
          </string-name>
          and
          <article-title>Alemi A 2017 Inception - v4, Inception-ResNet and the Impact of Residual Connections on Learning Thirty-</article-title>
          <source>First AAAI Conference on Artificial Intelligence</source>
          <volume>4278</volume>
          -4284
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>