<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Learning for UI Element Detection: DrawnUI 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Naveen Narayanan?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nitin Nikamanth Appiah Balaji?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kavya Jaganathan</string-name>
          <email>kavya17074g@cse.ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sri Sivasubramaniya Nadar College Of Engineering</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>As the eld of software development is rapidly growing, it becomes vital to show steady progress in the application development sector to maintain the pace. Constantly tweaking the front-end of an application becomes cumbersome for developers. Detection tools that help design UI from hand-drawn sketches can aid developers and nondevelopers in building applications with great ease. The recent advancements in deep learning techniques and the availability of computational power facilitates training e cient models for object detection. Two such models, the YOLOv4 and Cascade RCNN are implemented for detecting UI elements from hand-drawn sketches and are detailed in this paper. These models are observed to have an average mAP@IoU 0.5 improvement of 31.85% over the baseline Faster RCNN.</p>
      </abstract>
      <kwd-group>
        <kwd>DrawnUI</kwd>
        <kwd>YOLOv4</kwd>
        <kwd>Cascade RCNN</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>With recent needs in the software development industry for tools to increase
e ciency of application development, aiding tools like recommendation systems,
generative models help in reducing the time taken to build quality systems. With
the di culty in iterative software engineering setups, it becomes cumbersome
for UI designers and front-end developers to make changes and x issues. So
the requirement of an interactive tool for designers to work on, for generating
previews or even generate a completely nished front-end application becomes
convenient.</p>
      <p>As the eld of computer vision and deep learning have attained maturity,
various network architectures have been built for numerous downstream
applications like object detection, instance segmentation, semantic segmentation and
panoptic segmentation. The rise of CNN's and its huge success in image classi
cation tasks have encouraged further research on CNN based models for object
detection.</p>
      <p>In this paper we discuss about various object detection models and their
performance comparison for UI element detection.The rest of the paper is divided
into 5 sections. Section 2 gives an idea of the evolution of object detection models
and the motivation behind the chosen networks. Section 3 gives an overview
of the dataset used. Section 4 explains the chosen architectures and Section 5
compares the performance of the proposed methods and Section 6 draws the
conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Object Detection using deep neural networks(DNN) has been an active area of
research for over a decade. R-CNN, proposed by Ross Girshick et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] had a
two stage architecture, which combined a proposal detector and a region-wise
classi er that became predominant in the recent past due to its success.
Subsequently SPP net, proposed by Kaiming He et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] improved over RCNN with
correct estimation and detection e ciency in testing as analysed by [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Regardless of region proposal generation used in all the above networks, the training of
all network layers can be processed in a single-stage with a multi-task loss. This
was implemented in Fast RCNN [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which saved the additional expense on
storage space,and improved both accuracy and e ciency with more reasonable
training schemes. It was later found that the Faster RCNN [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] can overcome the
region proposal computational cost by implementing a RPN module becoming a
signi cant improvement over Fast RCNN. Cascade RCNN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is a multi
stage extension of Faster RCNN, achieves high quality object detection by e
ectively rejecting close false positives. This is achieved by combining a cascaded
bounding box regression and cascaded detection which simultaneously increases
both, the quality of hypotheses and the detector.
      </p>
      <p>
        YOLO, developed by Joseph Redmon et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], had a di erent approach when
compared to R-CNN, where-in a single network predicted both, the bounding
boxes and the class probabilities for these boxes. The YOLOv1 divided input
images into Size x Size grid cells and in each grid, certain number of bounding boxes
were taken. Bounding boxes are then selected if their class probabilities are more
than a particular threshold value. But YOLOv1 had di culties in detecting small
objects that appear in groups and in detecting objects having unusual aspect
ratios. Combating this issue, YOLOv2 as described by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], used batch
normalisation, a high resolution classi er, anchor boxes and dimension clusters to improve
the performance. It also bounds the location using logistic activation
overcoming the instability of YOLOv1 in early iterations. YOLOv2 also uses Darknet-19
thus obtaining a good balance between accuracy and model complexity. The
next proposed YOLOv3 had few incremental improvements on YOLOv2. It had
a better feature extractor, DarkNet-53 with shortcut connections as well as a
better object detector with feature map upsampling and concatenation as
mentioned in[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The latest YOLOv4 has inbuilt Data Augmentation for a more
robust training as part of its 'Bag of Freebies' and it improves accuracy of
object detection using its 'Bag of Specials' as explained in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] on the backbone and
Detector.
      </p>
      <p>In our work, we thus implement Cascade RCNN and Yolov4 in detecting
UI elements because of their signi cant advantages and state-of-the-art
performances.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Description</title>
      <p>
        The dataset provided by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], consists of hand drawn images of internet
websites, mobile application interfaces from 1000 di erent templates. The dataset
consisted of 21 classes which included a variety of small elements like check
boxes and buttons all the way to large elements such as images and containers.
In order to increase the sample images and to rectify the class imbalance, data
augmentation techniques such as random scaling and ipping were implemented.
Cascade RCNN [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a multi-stage extension of the two stage architecture of
Faster-RCNN. Proposal sub-network is the rst stage of the Faster RCNN
architecture, in which the entire image is processed by a backbone network after
which preliminary detection hypotheses, also called object proposals is produced
by applying a proposal head("H0"). The second stage consists of processing the
hypotheses by using a region-of-interest detection sub-network (\H1"), denoted
as a detection head. Every hypotheses is then assigned a classi cation score("C")
and a bounding box("B"). This architecture also consists of a cascaded bounding
box regression and cascaded detection which simultaneously increases both, the
quality of hypotheses and the detector.
      </p>
      <p>
        The Backbone used is ResNeSt over the traditional ResNet.This is because,
according to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], architectures originally designed for image classi cation like
(a) Faster RCNN
(b) Cascade RCNN
ResNet might not be suitable for downstream applications like object
detection because of limited receptive- eld size and lack of cross-channel interaction.
ResNeSt is made of multiple number of Split-Attention blocks stacked in
ResNetstyle. Each Split-Attention block divides Feature-maps into several groups and
ner-grained subgroups or splits, where the feature representation of each group
is determined via a weighted combination of the representations of its splits.
Feature Pyramid Networks(FPN) is used for feature extraction(referred as pool
in Fig 1) while Region Proposal Network(RPN) is used as the proposal head.
4.2
      </p>
      <p>
        YOLO
Unlike algorithms which re-purposes classi ers for object detection, YOLO takes
a di erent approach. Instead of repeated forward passes, as in the case of sliding
window or region proposal models, the image is divided into S*S grids, and a
single convolutional network pass outputs the results for all the S*S grids. The
output of the network is of the shape S*S*(B * 5 + C), where B is the number
of anchor boxes and C is the number of object classes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It generates B number
of possible detections for each grid. So instead of separately regressing for region
proposals, YOLO considers an end-to-end regression task for detection.
      </p>
      <p>
        The architecture as mentioned by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], consists of CSPDarknet53 backbone,
SPP additional module, PANet path-aggregation neck, and YOLOv3 head.
YOLOv4 also uses techniques such as 'Bag of Freebies' and 'Bag of Special'
to introduce data augmentation, mish activation, cross-stage partial connection
and multi-input weighted residual connections in the network .
      </p>
      <p>With YOLOv4, augmentation techniques such as CutMix, Mosaic, class label
smoothing and self-adversarial training were trialed to improve the performance.
By CutMix, parts of an image is replaced by another image, with annotations
from both image parts. Similarly in mosaic augmentation, 4 di erent images are
combined in various ratios, which thereby improves the recognition of objects of
di erent scales. To reduce model over tting on the classi cation output, the
condence scores are intentionally reduced by label smoothing technique. Finally by
Self-Adversarial training (SAT), the images are passed normally in the forward
step. Instead of back propagating, the loss output is used to distort the image
in a way it harms the model. These augmented images are then used to train
the model in a generic fashion. With these additional improvements YOLOv4
performs more accurately as compared to its earlier versions.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In the ImageClef DrawnUI 2020 Challenge, our team CudaMemError1 achieved
a top submission rank of 2, with Overall Precision of 0.9504. Two additional
metrics namely, mAP@IoU 0.5 and recall@IoU 0.5 were further used to evaluate
the models. We can observe that, Cascade RCNN (67972) scored better than
Run ID Model
67413 baseline Faster RCNN
67833 Cascade RCNN
67710 Cascade RCNN
67722 Cascade RCNN
67829 YOLOv4
67707 YOLOv4
67831 YOLOv4
67972 Cascade RCNN
67706 YOLOv4
YOLOv4 (67706) by 1.7% in terms of Overall Precision but, YOLOv4
outperformed Cascade RCNN by 10.9% and 7.5% in mAP@IoU 0.5 and recall@IoU
0.5 respectively. Comparing with the baseline Faster RCNN model, a 38.7%
improvement in mAP@IoU 0.5 and 48.3% improvement in recall@IoU 0.5 was
achieved with YOLOv4 model and a 25% improvement in mAP@IoU 0.5 and
37.9% in recall@IoU 0.5 is achieved with the Cascade RCNN model.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Two di erent architectures namely YOLOv4 and Cascade RCNN were
implemented for detecting UI elements from hand drawn sketches. YOLOv4 performed
signi cantly better than the baseline and Cascade RCNN in terms of mAP@IoU
0.5 and recall@IoU 0.5 with a score of 79.36 and 59.8 respectively, while Cascade
RCNN had an Overall Precision score of 0.9504.</p>
      <p>Front end development can be streamlined and made easy with these
detection systems. Automatic code generation can be further added to this pipeline
in order to make the whole process of front end development simple and
convenient. These detection systems will be an asset for developers to be able to keep
up with ever-growing software engineering world, and also for non-developers to
get into the digital space without learning how to code.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bochkovskiy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>H.Y.M.:</given-names>
          </string-name>
          <article-title>Yolov4: Optimal speed and accuracy of object detection (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasconcelos</surname>
          </string-name>
          , N.:
          <article-title>Cascade r-cnn: High quality object detection and instance segmentation (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fichou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berari</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dogariu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stefan</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Constantin</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of ImageCLEFdrawnUI 2020: The Detection and Recognition of Hand Drawn Website UIs Task</article-title>
          .
          <source>In: CLEF2020 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          -25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          , R.:
          <string-name>
            <surname>Fast</surname>
          </string-name>
          r-cnn (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malik</surname>
          </string-name>
          , J.:
          <article-title>Rich feature hierarchies for accurate object detection and semantic segmentation (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Spatial pyramid pooling in deep convolutional networks for visual recognition (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DemnerFushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Stefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in lifelogging, medical, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          .
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          - 25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Redmon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Divvala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>You only look once: Uni ed, real-time object detection (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Redmon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Yolo9000: Better, faster, stronger (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Redmon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Yolov3: An incremental improvement (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
          <article-title>Towards real-time object detection with region proposal networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>91</volume>
          {
          <issue>99</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manmatha</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Resnest: Split-attention networks (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Z.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , tao Xu,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          :
          <article-title>Object detection with deep learning: A review (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>