<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the 2020 ImageCLEFdrawnUI Task: Detection and Recognition of Hand Drawn Website UIs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dimitri Fichou</string-name>
          <email>dimitri.fichou@teleporthq.io</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raul Berari</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Brie</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Dogariu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liviu Daniel Ştefan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Gabriel Constantin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Ionescu</string-name>
          <email>bogdan.ionescu@upb.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University Politehnica of Bucharest</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>teleportHQ</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, companies' online presence and user interfaces are critical for their success. However, such user interfaces involve multiple actors. Some of them, like project managers, designers or developers, are hard to recruit and train, making the process slow and prone to errors. There is a need for new tools to facilitate the creation of user interfaces. In this context, the detection and recognition of hand drawn website UIs task was run in its first edition with ImageCLEF 2020. The task challenged participants to provide automatic solutions for annotating diferent user interfaces elements, e.g., buttons, paragraphs and checkboxes, starting from their hand drawn wireframes. Three teams submitted a total of 18 runs using diferent object detection techniques and all teams obtained better scores compared to the recommended baseline. The best run in terms of mAP 0.5 IoU obtained a score of 0.793 compared to the baseline score of 0.572. The leading overall precision score was 0.970, compared to the baseline score of 0.947. In this overview working notes paper, we present in detail the task and these results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In recent years, there has been a growing interest in data-driven approaches
to help user interface (UI) professionals. For instance, Deka et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] collected
the RICO data set containing 72,219 layouts mined from 9,722 mobile
applications. Its usefulness was proven by implementing an auto-encoder which queried
UIs and retrieved similar layouts. This data set was further used to create the
SWIRE data set [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], consisting of 3,802 hand drawn UIs. Here, the authors
demonstrated the use of nearest neighbour search to retrieve similar UI based
on sketch queries. An end-to-end approach was proposed by Bertelami with
pix2code [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and UI2code [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], in both cases the authors used a Convolutional
Neural Network (CNN) to encode pixels of a screenshot and a Recurrent Neural
Network (RNN) to decode it into a domain specific language translatable into UI
code. Online recognition of hand drawn gestures and strokes was also explored
by diferent teams [
        <xref ref-type="bibr" rid="ref12">12,13,15</xref>
        ] to provide support for both mouse and touchpad
inputs.
      </p>
      <p>
        We proceed in this direction by organizing the first edition of the Detection
and Recognition of Hand Drawn Website UIs task, ImageCLEFdrawnUI, with
the ImageCLEF3 benchmarking campaign [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], itself part of CLEF4 (CROSS
Language Evaluation Forum). Following a similar format as the 2019
ImageCLEFcoral annotation task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the task requires participants to perform automatic
UI element annotation and localisation on hand drawn wireframes of websites
and applications (Figure 1). In this overview working notes paper, we review the
task and the submitted runs.
      </p>
      <p>The rest of the paper is organized as follows: Section 2 presents the data set
and how it was collected. Section 3 describes the evaluation methodology. The
task results are reported in Section 4. Finally, Section 5 discusses conclusions
and future work in this field.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data set</title>
      <p>The data set consists of a collection of paper-drawn website drawings, i.e.,
wireframes. Each data point represents an abstracted and simplified version of a real
website or mobile application. Figure 1 is an example of such wireframes,
corresponding to the desktop version of a website. The data set is split into 2,363
images used for training and 587 used for testing. The average number of UI
elements per image is 28, with a minimum of 4 and a maximum of 131 elements.</p>
      <p>Creating and labeling such a large number of wireframes would not have
been possible in the absence of a common standard. To tackle this, a convention
providing 21 of the most common UI elements was provided (Figure 2), ensuring
as such that both annotators and drawers followed a single source of truth. The</p>
      <sec id="sec-2-1">
        <title>3 https://www.imageclef.org/2020/drawnui 4 http://www.clef-campaign.org/</title>
        <p>21 elements were selected after several discussions with developers and designers,
the objective being to cover most use cases.</p>
        <p>The convention was designed to minimize the level of ambiguity for the
annotators and the machine learning models. However, wireframes require simple
representations and creating such a standard will necessarily produce uncertainty
in some edge-cases.</p>
        <p>In a previous iteration of the data set, a series of assumptions were made
about the model capacity for distinguishing between elements based only on
their position and size. For example, headers and paragraphs were represented
using the same squiggly line, with the former being bigger and at the top of
the page, while the latter being smaller and disseminated throughout diferent
sections of a layout. In practice, those assumptions were ambiguous for both
the annotators and the model, and a visual distinction between the two was
established by placing a hash at the start of a header element.</p>
        <p>As a result, the current version of the standard ofers a clear representation
for each element, while still leaving space for further improvements (i.e. more
complicated UI features).</p>
        <p>The following is a set of short descriptions of the 21 UI element classes that
can be found throughout the data set:
– Paragraph: One or multiple lines of handwritten text or horizontal
squiggles.
– Label: A single line of text (or a squiggly line), with the added constraint of
being in the vicinity of an input (checkbox, radio button, toggle, text input,
date picker, stepper input or slider).
– Heading: One or multiple hashes (#) succeeded by a text or squiggly line.
– Link: A text or squiggly line enclosed in a pair of square brackets.
– Button: A small rectangular shape with a single line of text or a squiggly
line centered inside its area.
– Checkbox: A small square shape, with an X or a tick drawn inside its area.
– Radio Button: A small circle shape, optionally with a dot inside its area.
– Toggle: A small rectangle with one half of its area shaded. The rectangle
can also possess rounded corners.
– Rating: A collection of five star shapes aligned horizontally.
– Dropdown: A horizontal rectangle, with a V shape drawn in its right-most
side. Optionally, the empty space on the left can contain text or a squiggly
line.
– Text Input: An empty horizontal rectangle.
– Date Picker: A horizontal rectangle where a dot shape is present in the
right-most side and the space on the left is filled with a date text, which has
to provide the slash character as a delimiter. For example, valid date texts
include 02/03/04 or 20/12/20.
– Stepper Input: A horizontal rectangle where the right-most area consists of
two small rectangles distributed vertically, with the one on the top containing
a caret (^) and the one on the bottom including a V-like shape. These
shapes represent the control over the input, increasing or decreasing it by a
predefined step.
– Slider: A horizontal line with a small marker (such as a circle) between its
ends.
– Text Area: An empty rectangle with a small triangle shaded in its
bottomright corner.
– Image: A rectangle or a circle with an X spanning its whole area.
– Video: A rectangle with a small, right-pointing triangle centered inside its
area.
– Table: A rectangle subdivided into rectangular areas in a grid-like manner.
– Container: A rectangle shape which contains other classes of UI elements.
– List: A collection of multiple lines (text or squiggly), each being preceded
by either a bullet point or a number.
– Line Break: A long, horizontal line, usually spanning the whole length of
a section.</p>
        <p>Once the convention has been specified, the task of creating the data set could
be split between two external teams of drawers and annotators, supplemented
by overall supervision of the whole process by our team.
2.1</p>
        <sec id="sec-2-1-1">
          <title>Acquisition</title>
          <p>
            The process was externalized to a team of three professional wireframe drawers.
Drawing a wireframe requires recreating an already existing mobile or desktop
layout. For the former, the drawers used data points selected from the RICO
data set [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], which include automatically-generated screenshots of Android
applications. For the desktop layouts, a collection of screenshots was compiled by
automatically processing a list of popular sites, using an in-house web parser.
          </p>
          <p>The drawers followed the convention for representing UI elements when
creating the wireframes. Omitted elements include the ones which did not fit into
any description and those which would have cluttered the general layout. On top
of this, the instructions encouraged clearly drawing each element, to ensure that
annotation will proceed without dificulty. Drawers were asked to draw only the
visible elements of a page. Consequently, UI elements which are concerned with
providing the layout (such as containers or line breaks) were represented only
when they also provided a clear, visual indication of their presence.</p>
          <p>To diversify the data set, the drawers were asked to use diferent colors
when drawing the wireframes. After drawing was done, each wireframe was
photographed in three diferent lighting conditions: bright, dark and scanned
(Figure 4). Bright and dark versions of the wireframes were photographed
using a Xiaomi Poco F1 smartphone. The scanned versions were obtained using a
Canon MF240 scanner.
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Annotation</title>
          <p>Image annotation was provided by a diferent team of three data annotators,
following the same guidelines and using the desktop application VoTT5 as an
interface (Figure 3). Each element is annotated using a rectangle shape which
covers the object in its entirety, regardless of potential overlap with other
elements. Creating an annotation requires two clicks for drawing the bounding
5 https://github.com/microsoft/VoTT/
box (one in the top-left corner and one in the bottom-right corner), followed by
clicking on the corresponding UI element class.</p>
          <p>In terms of quality control, the annotations were thoroughly verified and
corrected by a single member of our team to ensure consistency. The process
consisted in checking the alignment of the bounding boxes and rectifying
erroneous labels. A small number of inconsistencies were removed as a result.
As shown in Figures 5 and 6, the number of certain types of elements drawn
in wireframes reflect the diferent use-cases solved by user interfaces. Firstly,
buttons and links represent the most common ways of navigating throughout
diferent pages of an interface. Then, elements such as paragraphs, headers or
images indicate the content-based nature of UIs. Lastly, elements which describe
the layout of the page (such as containers and line breaks) represent 15% of the
total number of elements found inside the data set. It must be noted that the
distribution of those elements is dificult to compare with the distribution found
in real websites and mobile applications, as there are multiple ways to program
what appears in the screenshots.</p>
          <p>Consequently, a single wireframe will contain information about website
layouts (and implicitly, hierarchies), navigation and content by using a simple,
abstracted representation.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation methodology</title>
      <p>
        Evaluation was performed using three methods, the overall precision, mean
average precision (mAP) and recall [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For all these metrics, each detection is
required to be higher than 0.5 IoU to register as a true positive when compared
to the ground truth. For mAP and recall, and adapted version of the cocoAPI
was used6.
      </p>
      <p>– Overall Precision: It refers to the number of true positive predictions out
of the total sum of true positives and false negatives.
– mAP@0.5 IoU: The localised Mean Average Precision for each submission.
– Recall@0.5 IoU: The localised mean recall for each submission,</p>
      <p>Throughout the contest, overall precision was used as the sole metric for
creating the leader board, to be consistent with the other ImageCLEF challenges
such as ImageCLEF Coral. However, several discussions with the participants
during the competition showed that Recall and Mean Average Precision are also
necessary to properly judge the results, so they have been added afterwards.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Three teams submitted a total of 18 runs. The task had a submission limit of
10 runs per team. Table 4 displays the overall precision, mean average precision
and recall at 0.5 IoU for each run.</p>
      <p>
        The baseline score was obtained by training a Faster R-CNN [14] model
(with a resnet101 backbone), using Tensorflow’s object detection API [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] on an
Amazon Web Services EC2 instance. The instance was equipped with an nVidia
K80 GPU, CUDA 10.0 and Python 3.6. In terms of hyperparameters, the batch
size was set to 1, the number of steps to 100,000 and the resizing range was set
between 600 and 800 pixels. Data augmentation was provided by the built-in
techniques of random cropping, horizontal flips, color distortion, and conversion
to gray-scale.
      </p>
      <p>
        All the participants made use of deep neural networks specifically tailored for
object detection, such as Mask R-CNN [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Cascade R-CNN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or YOLOv4 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
The baseline overall precision was surpassed by two of the teams. Furthermore,
each team had at least one run which outperformed the baseline mAP and Recall
scores. This confirms that this type of network remains the preferred standard
for object detection and could be used for larger and more dificult projects in
the domain of UI.
      </p>
      <p>With regards to model settings, team OG_SouL improved their results by
implementing a novel multi-pass inference technique for detecting smaller
elements. After running the image through Mask R-CNN, the detected bounding
boxes were filled with the color white and the image was passed once again to the
model, essentially pressuring it into detecting some of the remaining elements.
This method improved their mAP score from 0.573 to 0.641.</p>
      <p>Data analysis played an important role in improving the accuracy of the
results. The winning team, zip, performed a distribution analysis on the data set
and split it according to an 11:1 ratio between train and validation, ensuring that</p>
      <sec id="sec-4-1">
        <title>6 https://github.com/philferriere/cocoapi/</title>
        <p>for each run. The baselines and best values for each metric are in bold.
Overall Precision mAP@0.5 IoU R@0.5 IoU
0.970 0.582 0.445
the least common elements have not been omitted. The team OG_SouL
computed a similarity score between pictures of the same UI layout found throughout
the data set and removed them from training to prevent over-fitting.</p>
        <p>Data set augmentation was also provided through a variety of methods. Team
CudaMemError1 used YOLOv4 techniques such as CutMix or Mosaic, where
diferent areas of the pictures are cropped and combined with others to produce
synthetic data points. Team zip generated 500 new images containing the least
common UI elements by cropping them from the original data set, applying afine
transformations and pasting them on randomly sized papers which used a light
color as background.</p>
        <p>Finally, two of the teams used conversion to black and white or gray scale,
taking advantage of the fact that drawn UI elements are agnostic to the color
of the instrument used for representing them on paper. OG_SouL showcased
an OpenCV RGB to BW pipeline, applying brightness refining, erosion, Otsu’s
binarization algorithm, Gaussian thresholding and noise removal.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion and conclusions</title>
      <p>An overall precision as high as 0.97 may implicate that the task has already
been solved, but this metric does not properly account for a high number of
false negatives or poor results on the rarer elements from the data set. Mean
Average Precision and Recall are better ways of representing the performance
of a model in this case. Accordingly, the best run achieved only 0.79 on the
mAP@0.5 IoU measure, indicating that the techniques could still be further
improved.</p>
      <p>Since the drawn wireframes have been modeled after real applications and
websites, the data set was skewed from the start towards certain classes of UI
elements. This meant that the teams went at great lengths to compensate for
the lack of uncommon elements. As a result, synthetic data was created either
through YOLOv4’s ’cut-and-paste’ techniques or by cropping the least common
elements and applying afine transformations to them. Conversion to gray-scale
or black and white proved to be an eficient method for reducing the data set
ifle size and improving the detection score.</p>
      <p>The results and techniques presented by the teams are encouraging and show
the untapped potential provided by combining machine learning with user
interfaces. For the next editions of the task, we plan to tackle two, more dificult
problems regarding UI element detection and processing.</p>
      <p>The first problem consists in predicting the nested structure of a UI based
on a wireframe drawing. While the current challenge assumed that UI elements’
locations are identified by their absolute positioning, without any hierarchical
relationship between them, real-life scenarios presuppose relative positioning and
a hierarchy built out of diferent classes of elements. This particularly challenging
task may be solved through a mix of natural language processing and computer
vision.</p>
      <p>The second problem would necessitate performing object detection on real
screenshots instead of wireframes. In practice, designers often attach screenshots
of similar layouts along with the wireframes, with the intention of giving the
developer a more refined idea of the task at hand. A data set similar to the
current one can be generated by parsing a number of websites, analysing their
respective DOM trees and then screen-capturing the visible area. However, due
to the nature of most websites found throughout the web, cleaning the whole
data set can become a laborious and time-consuming problem. Consequently,
manual cleaning will be provided only for the test set, while the training set will
be ofered in its raw form. The task will require the participants to filter through
the data set on their own.
13. Mohian, S., Csallner, C.: Doodle2App: Native app code by freehand UI sketching.</p>
      <p>In: Proc. 7th IEEE/ACM International Conference on Mobile Software Engineering
and Systems (MOBILESoft), Tool Demos and Mobile Apps Track. ACM (May
2020), to appear.
14. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks. In: Advances in Neural Information
Processing Systems (2015)
15. Vanderdonckt, J., Roselli, P., Pérez-Medina, J.L.: !FTL, an Articulation-Invariant
Stroke Gesture Recognizer with Controllable Position, Scale, and Rotation
Invariances. 20th ACM International Conference on Multimodal Interaction pp. 125–134
(2018). https://doi.org/10.1145/3242969.3243032</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Beltramelli</surname>
          </string-name>
          , T.:
          <article-title>pix2code : Generating Code from a Graphical User Interface Screenshot</article-title>
          .
          <source>Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing</source>
          Systems pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bochkovskiy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>H.Y.M.:</given-names>
          </string-name>
          <article-title>Yolov4: Optimal speed and accuracy of object detection</article-title>
          . ArXiv abs/
          <year>2004</year>
          .10934 (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasconcelos</surname>
          </string-name>
          , N.:
          <string-name>
            <surname>Cascade</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Delving into High Quality Object Detection</article-title>
          .
          <source>Proceedings of the IEEE Computer Society Conference on Computer Vision</source>
          and Pattern Recognition pp.
          <fpage>6154</fpage>
          -
          <lpage>6162</lpage>
          (
          <year>2018</year>
          ). https://doi.org/10.1109/CVPR.
          <year>2018</year>
          .00644
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chamberlain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clift</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Seco De Herrera,
          <string-name>
            <surname>A.G.</surname>
          </string-name>
          :
          <article-title>Overview of ImageCLEFcoral 2019 task</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <volume>2380</volume>
          ,
          <fpage>9</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>From UI Design Image to GUI Skeleton : A Neural Machine Translator to Bootstrap Mobile GUI Implementation</article-title>
          .
          <source>International Conference on Software Engineering</source>
          <volume>6</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Deka</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franzen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hibschman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afergan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichols</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , R.:
          <article-title>Rico: A mobile app dataset for building data-driven design applications</article-title>
          .
          <source>In: UIST 2017 - Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology</source>
          . pp.
          <fpage>845</fpage>
          -
          <lpage>854</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1145/3126594.3126651
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Everingham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gool</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C.K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The PASCAL Visual Object Classes ( VOC ) Challenge</article-title>
          .
          <source>Int J Comput Vis</source>
          <volume>88</volume>
          ,
          <fpage>303</fpage>
          -
          <lpage>338</lpage>
          (
          <year>2010</year>
          ). https://doi.org/10.1007/s11263-009-0275-4
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkioxari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          , R.:
          <string-name>
            <surname>Mask</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>42</volume>
          (
          <issue>2</issue>
          ),
          <fpage>386</fpage>
          -
          <lpage>397</lpage>
          (
          <year>2020</year>
          ). https://doi.org/10.1109/TPAMI.
          <year>2018</year>
          .2844175
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canny</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichols</surname>
          </string-name>
          , J.: Swire:
          <article-title>Sketch-based User Interface Retrieval</article-title>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . CHI '
          <volume>19</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3290605.3300334, https://doi.org/ 10.1145/3290605.3300334
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rathod</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korattikara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fathi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wojna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Speed/accuracy trade-ofs for modern convolutional object detectors</article-title>
          .
          <source>In: Proceedings - 30th IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2017</year>
          . vol. 2017-January, pp.
          <fpage>3296</fpage>
          -
          <lpage>3305</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1109/CVPR.
          <year>2017</year>
          .351
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Péteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DemnerFushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ştefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in medical, lifelogging, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          .
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          - 25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coyette</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderdonckt</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>User interface design by sketching: A Complexity Analysis of Widget Representations</article-title>
          . EICS p.
          <volume>57</volume>
          (
          <year>2010</year>
          ). https://doi.org/10.1145/1822018.1822029
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>