<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assessing Completeness in Training Data for Image-Based Analysis of Web User Interfaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Gaedke</string-name>
          <email>martin.gaedke@informatik.tu-chemnitz.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Maxim Bakaev</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Novosibirsk State Technical University Novosibirsk</institution>
          ,
          <addr-line>Russia 0000-0002-1889-0692</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sebastian Heil</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Technische Universitt Chemnitz Chemnitz</institution>
          ,
          <addr-line>Germany 0000-0002-6729-2912</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Analysis of user interfaces (UIs) based on their visual representation (screenshots) is gaining increasing popularity, institutionalizing the HCI vision eld. Witnessing the same visual appearance of a UI like a human user provides the advantage of taking into account layouts, whitespace, graphical content, etc. independent of the concrete platform and framework used. However, visual analysis requires signi cant amounts of training data, particularly for the classi ers that identify UI elements and their types. In our paper we demonstrate how data completeness could be assessed in training datasets produced by crowdworkers, without the need to duplicate the extensive work. In the experimental session, 11 annotators labeled more than 42000 UI elements in nearly 500 web UI screenshots using the LabelImg tool with the pre-de ned set of classes corresponding to visually identi able web page element types. We identify metrics that can be automatically extracted for UI screenshots and construct regression models predicting the expected number of labeled elements in the screenshot. The results can be used in outlier analysis of crowdworkers in any existing microtasking platform.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning</kwd>
        <kwd>Crowdworking</kwd>
        <kwd>Image Recognition</kwd>
        <kwd>Human-Computer Vision</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Though futurologists have long been fearing that the development of AI is going to take jobs away from humans,
it actually caused boom in the demand for microworking services. These involve the use of general human
intelligence to complete tasks for which no algorithm is known or satisfactory e cient. The outcome is employed
in either solving some practical problem, providing an online service, or improving an AI model based on machine
learning. Popular examples include labeling images, audios and videos, moderating online content, sentiment
analysis, translating short texts in other languages, etc. These tasks by and large involve unskilled and tedious
work on data gathering and processing, so microworking services requestors can rarely get enough motivated
volunteers and tend to rely on low-paid microworkers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For instance, there are already disturbing reports
about involuntary microservitude prisoners in Finland who have been assigned with data tagging jobs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Aside
from concerns what the AI taught by socially dubious teachers will be like, the consequence is the growing need
for checking the quality of the outcome produced by such uninterested workers.
      </p>
      <p>
        Quality is also the current focus for the crowdwork done via Internet (see review in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), and the number of
specialized platforms has been growing lately: MTurk (2005), microworkers.com (2009), Yandex.Toloka (2014),
Googles AutoML (2018), etc. Controlling completeness and accuracy of data used for training is particularly
important, since these quality dimensions are linked to recall and precision in the resulting AI models. Todays
trend is not just implementation of the output data quality assessment tools in the platforms: e.g. Yandex.Toloka
allows speci cation of control rules for performance time, accuracy vs. the ground truth, majority consensus,
etc. There is also a growing number of related meta-tools: CDAS [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Crowd Truth [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], iCrowd [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], DOCS for
MTurk [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and more. Generally, they are concerned with online quality control and optimal assignment of tasks
and mostly rely on performance in completed work, which is evaluated based on ground truth and majority
consensus approaches. Nearly universally, these require that the same or very similar work is done by several
workers, so that an accuracy measure could be calculated.
      </p>
      <p>
        This work duplication is undesirable for some domains, where the tasks are labor-intensive and have no strictly
correct outcome. Particularly, in our work we focus on user interface (UI) labeling the speci cation of UI elements
positions and types in UIs visual representation. We propose assessing the output data completeness based on the
expected number of objects that we predict for UI based on certain metrics that can be automatically calculated
for the UI image. Most existing research even in general image labeling focuses on image complexity, in which the
number of objects is only one of the dimensions. Particularly, image compression metrics, such as the popular
JPEG or PNG, are known to be well correlated with image complexity, but their application for UIs assessment
is speci c and relatively novel (see in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). The potential advantage of the approach is increasing the e ciency
of producing training data via removing the necessity to perform spare work to ensure the data quality.
      </p>
      <p>The remainder of the paper is organized as follows. In Section 2, we detail the UI image-based (visual)
analysis approach and describe the related software tool that we previously built. Further, we run experimental
UI labeling session with 11 workers that processed about 500 screenshots of university website homepages. In
Section 3, we analyze the collected data, present the characteristics of the dataset, and construct regression
models for predicting the expected number of UI elements from JPEG, PNG, and entropy metrics combined
with edge detection-based recognition. In the nal section, we discuss the results, provide conclusions and
outline directions for further research. So far, the greatest limitation of our work is lack of its testing to see if
crowdworkers undermining data completeness can be identi ed in real conditions.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>Web UI Visual Analysis</title>
        <p>Image-based analysis of UIs is gaining in popularity, as it allows witnessing the same interface as the user, which
is particularly important for web UIs. The drawback of this approach is that considerable amount of training
data is needed (particularly for the classi ers that identify UI elements and their types), which is mostly produced
through human UI labeling.</p>
        <p>
          It is already widely noted that when data is annotated through crowdworking platforms, controlling its quality
is of foremost importance [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Particularly for UI labeling tasks, data completeness can su er if unfaithful
crowdworkers optimize their task performance for better revenue / e ort ratio. We believe that such outlier workers
can be identi ed without adhering to duplicate labeling, on which ground truth and majority consensus are
essentially founded. Objective characteristics of the material (that is, UIs being labeled) can provide meaningful
clues on the degree of a worker performance's regularity.
        </p>
        <p>
          Previously, we have developed a prototype visual analyzer tool capable of extracting several metrics from UIs
visual representation [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. It exhibited rather acceptable recognition of UI elements, which is mostly based on
edge detection and identi cation of vertical and horizontal lines, rectangular forms, etc. (see in Fig.1). However,
the performance of its trained classi ers responsible for detection of web UI elements types was found to be
inadequate. As we are developing the enhanced visual analyzer, we are concerned with e cient collection of
training data via web UIs labeling and assessment of its quality.
Data completeness is an important attribute of overall data quality, which indicates comprehensiveness of
available data with respect to a speci c informational requirement. General crowdworking is arguably more often
concerned with data accuracy, since tasks are rarely compound enough to be completed only partially. In UI
labeling, completeness can be undermined if too few UI elements are identi ed, which would further lead to
decreased recall of the automated visual analysis tool.
        </p>
        <p>To assess data completeness in this domain, the ground truth and majority consensus approaches could be well
used. That is, if a crowd worker repeatedly under-identi es UI elements, his or her output could be considered
invalid, and no further tasks would be assigned. However, web UIs are currently very diverse and the full number
of UI elements can vary dramatically. Processing a single UI takes considerable time, so each worker would only
label a couple dozens of them, ruling out statistically meaningful comparison of averaged values per workers
labeling di erent UIs. It means that in order for the ground truth and majority consensus approaches to be
e ective, several workers would have to process the same UI. I.e. the extensive labeling e ort would have to be
duplicated, without contributing much to the training data.</p>
        <p>
          Instead, the expected number of elements in a web UI could be predicted without the involvement of human
workers, based on metrics extracted from its image (screenshot, as demonstrated e.g. in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). Subsequently,
a kind of outlier analysis [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] could be employed to identify workers whose performance consistently does not
correspond to the expected values. To explore whether the prediction-based approach will hold true for the
trusted dataset and to identify the signi cant metrics, we collected labeling data in an experimental session.
2.3
2.3.1
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>The Experiment Description</title>
      </sec>
      <sec id="sec-2-3">
        <title>Participants</title>
        <p>The workers in our study were student members of the Novosibirsk State Technical University (Russia)
crowdintelligence lab, who volunteered to work on the project. In total, there were 11 of them (6 male, 5 female), with
age ranging from 20 to 24 (mean = 20.5, SD = 0.74), all Bachelor students of Applied Informatics major. All
the workers had normal or corrected to normal vision and reasonable experience with web UIs and IT.
2.3.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>Material</title>
        <p>The material was screenshots of higher educational organizations websites homepages (UIs). Initially, 10639
screenshots were collected automatically by the dedicated Python script crawling through URLs we took from
various catalogues (DBPedia, etc.). The screenshots were made for full web pages, as they were rendered, not
just of the part above the fold or of a xed size. Then we hand-picked 497 screenshots from the population,
using the following criteria:
1. University or college corporate website with reasonably robust functionality;</p>
        <sec id="sec-2-4-1">
          <title>2. Not overly famous university;</title>
          <p>3. Website content in English and reasonably diverse (i.e. no photos-only websites);</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>4. Reasonable diversity in website designs (colors, page layouts, etc.).</title>
          <p>2.3.3</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>Design</title>
        <p>The experiment used between-subjects design each UI screenshot was processed only by one worker, so there
was no duplication of work. The independent and derived independent variables were:</p>
        <sec id="sec-2-5-1">
          <title>1. The size of the UI screenshot le in PNG-24 format, in MB: PNG lesize;</title>
        </sec>
        <sec id="sec-2-5-2">
          <title>2. File size for the same screenshot in JPEG-100 format, in MB: JPEG size;</title>
          <p>3. The number of elements metric automatically produced by the visual analyzer for a UI screenshot:</p>
          <p>VA Elements;
4. Entropy value obtained for the .png le through MATLABs entropy(I) function: M Entropy.</p>
          <p>The dependent variable in our study was the number of UI elements labeled in UI screenshot by worker:
N Elements.
2.3.4</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>Procedure</title>
        <p>For labeling the UIs, the workers used LabelImg tool. It allows drawing bounding rectangle around an image
element, specifying a label for it (choosing from the set of pre-de ned classes or adding a custom class), and
saving the results as XML les in PASCAL VOC format. The workers were provided with instruction on using
the tool and were given the set of pre-de ned classes speci c for web UIs (see in Table 1).</p>
        <p>The 497 UI screenshots were distributed between the student workers nearly equally and based on their
alphabetical order (no random assignment). Each worker used his or her judgment in deciding which UI elements
to label, but they were asked to achieve maximum completeness in each UI. In total, it took the workers 6 days
to complete their assignment.
The workers in total labeled 495 UI screenshots (2 erroneous ones were removed). This resulted in 42716 labeled
UI elements, of which 39803 (93.2%) belonged to the pre-de ned classes (shown inee Table 1). Example of a
screenshot being labeled with the LabelImg tool is provided in Fig. 2.</p>
      </sec>
      <sec id="sec-2-7">
        <title>Class name Class description</title>
      </sec>
      <sec id="sec-2-8">
        <title>Graphical content elements:</title>
        <p>image foreground images that the web page displays
background image ipmlaacgeeds otnhattoparoef uthseedmaasnbdatchkegyrohuanvde, ni.oe.seomthaenrticUImEealenminegnts are</p>
        <p>an area that is visually separated from its surroundings by borders,
panel shadows, and/or background color and contains at least one other</p>
        <p>UI element
Textual content elements:</p>
        <p>any list (numbered or unnumbered) that uses bullet points,
numlist berings, borders, background color etc. to display a set of similar</p>
        <p>items
table garnoyunvdisucaollloyr rteocroegpnriezsaebnlte rtoawbslean(udsicnogluamlingsn)ment, lines or
back</p>
        <p>a portion of text consisting of one or more lines of text that are
paragraph not visually separated by white space and/or indentation from</p>
        <p>other text
textblock two or more subsequent paragraphs of text
text aornyteoxtthbelrocpkortion of text that is neither a label nor a paragraph
symbol eatncy. gorraspehpiacaraltseylmybol, can appear on buttons, tabs, links, in texts
Interface elements:
checkbox (mwuhsitchbmeulsatbbeleedmaornkee-dbya-solnaeb,elw)ithout the accompanying text
radiobutton (mwuhsitchbmeulsatbbeleedmaornkee-dbya-solnaeb,elw)ithout the accompanying text
selectbox taiolinsstbwohxicthhactanwobueldseelexcpteenddorwmheunltci-lsicekleecdt,eddisplaying several
optextinput single line (including password eld, data/calendar, etc.)
textarea multi line
button iofftthheebbuuttttoonndaissptlyapyes t"elaxbteoln" i(ts,epelbeaesloewa)dditionally label the text
label aarsemuasleldpotortgieotnheorf wteixtth, atynpoitchaelrlyUoInceownotrrodl olirkoenalyrafedwiowbuotrtdosn, that
tabs ipnlteraas-eppaglaecetatbhse crreecattaendgluesianrgouHnTd MthLe/tCaSbSh/aJnSd,lneot browser tabs,
scrollbar ebnottihreinptargae-piafgdeisep.gl a.yiendside textareas and the main scrollbar of the
pagination sahnodupldresvpiaonusthbeutetnotnirseapnadgipnaagteiolninckosntrols area, typically the next
link can be inside text (hyperlink), in navigation, etc.
(D495 = 0.047, p = 0.01).</p>
        <p>Between the workers, the average N Elements per UI ranged from 44.5 to 121.6, mean = 86.4, SD = 22.3.
Detailed statistics is provided in Table 2 (workers' names are shortened to initials). When counting the classes,
obviously errorneous ones (e.g. butto) were removed from the consideration. Notably, the relative standard
deviations (RSD), bar one outlier worker (SMl with RSD = 86.26%), ranged in a rather narrow interval of
24.24-49.05%. The Shapiro-Wilks test suggested that normality hypothesis could not be rejected (W11 = 0.972,
p = 0.903).
3.2</p>
      </sec>
      <sec id="sec-2-9">
        <title>Analyzing and Predicting the Number of UI Elements in Screenshots</title>
        <p>Running the 495 UI screenshots processed by the workers through our visual analyzer software, we were able
to obtain the number of UI elements metric (VA Elements) for 440 of them (another 55 or 11.1% encountered
technical problems). The resulting VA Elements ranged from 4 to 278, mean = 65.4, SD = 32.7, RSD = 50.1%.
Hence, on average human workers recognized 1.32 times more UI elements than the automation tool. The
Kolmogorov-Smirnov test suggested that normality hypothesis had to be rejected for VA Elements (D440 = 0.105,
p &lt; 0.001).</p>
        <p>We found that Pearson correlation between N Elements and VA Elements per UI was highly signi cant
(r440 = 0.381, p &lt; 0.001). The correlations for JPEG lesize (r495 = 0.278, p &lt; 0.001), PNG lesize (r495 = 0.174,
p &lt; 0.001), and MEntropy (r492 = -0.125, p = 0.006) were also signi cant, but somehow weaker.</p>
        <p>Further, we constructed regression model for N Elements with the 4 factors, which was found to be highly
signi cant (F4;432 = 26.0, p &lt; 0.001), although had rather mediocre R2 = 0.194. Its Akaike Information Criterion
(AIC) value was equal to 3100.</p>
        <p>N Elements = 70:5 + 24:9 J P EG f ilesize 12:0 P N G f ilesize + 0:253 V A Elements 5:4 M Entropy
(1)</p>
        <p>Since in some cases the visual analyzer failed to produce the metrics, we tested if the model could be
constructed without the VA Elements factor. The regression was found to be highly signi cant too (F3;488 = 36.1,
p &lt; 0.001), although it had somehow lower R2 = 0.182 and poorer AIC = 3498.</p>
        <p>N Elements = 87:4 + 38:5</p>
        <p>J P EG f ilesize
24:1</p>
        <p>P N G f ilesize
6:2</p>
        <p>M Entropy
(2)
We used the model (2) to obtain the predicted numbers of elements for the 55 screenshots that the visual analyzer
failed to process. Pearson correlation between the predicted values and the actual number of labeled elements
(N Elements) was found to be highly signi cant, r55 = 0.427, p = 0.001.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>Image-based analysis of user interfaces has recognized advantages as it allows taking into account layouts,
whitespace, graphical content, etc. independent of the concrete platform and framework used. The visual analysis (HCI
vision) software tools generally require lots of training data, particularly for detecting the type of elements in
today's manifold web UIs. Relying on internet-based crowdworkers who perform UI labeling is a popular
approach for collecting such training data, but controlling the output quality currently involves considerable work
overhead. In our work, we proposed to assess data completeness, which in UI labeling equates the number of
identi ed UI elements, via predicting this expected number with metrics automatically calculated for the input
image.</p>
      <p>For that, we constructed two regression models: (1) relies on the number of UI elements assessed by our
dedicated visual analysis tool based on edge detection, while (2) only uses JPEG, PNG and entropy metrics as
the factors. The quality of (1) was somehow better, as its R2 = 0.194 was 6.59% higher, for the 1.13 times
smaller sample. However, with (2) one is capable of predicting the expected number of UI elements in UI image
without the need to rely on external tools.</p>
      <p>We see another contribution of the work in the set of pre-de ned classes that we devised for web UI labeling
and which covered 93.2% of all labeled UI elements in our study. The classes are presented and described in
Table 1, and can be used by researchers working on similar problems.</p>
      <p>Undoubtedly, the main limitation of our study is lack of the models' testing in real crowdworking to identify
workers who undermine completeness in UI labeling tasks. Our future research prospects include collecting the
training data for the enhanced visual analyzer through a crowdwork platform and using the models together
with outlier analysis to identify neglecting performers.</p>
      <p>Another limitation is the relatively low R2 coe cients in the models, even though (2) allowed to predict the
values that had reasonably strong correlation of r = 0.427 with the actual number of labeled UI elements. We
plan to work on re ning the set of factors, probably drawing from the metrics of visual complexity, which is
currently extensively studied in HCI.</p>
      <p>Our further research prospects also include assessing other dimensions of training data quality, particularly
its accuracy, also based on the characteristics of input and output datasets, without the need for extra work
e ort. For that end, we plan to study the distribution of UI elements' classes and produce the characteristics of
the trusted dataset, so that each worker's output could be related to them.</p>
      <sec id="sec-3-1">
        <title>Acknowledgements</title>
        <p>The reported study was funded by Russian Ministry of Education and Science, according to the research project
No. 2.2327.2017/4.6.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Semuels</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Internet Is Enabling a New Kind of Poorly Paid Hell</article-title>
          .
          <source>The Atlantic. Next Economy, 23 Jan</source>
          <year>2018</year>
          . Accessed 20 May 2019 at https://www.theatlantic.com/business/archive/2018/01/amazonmechanical-turk/551192/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Inmates in Finland are training AI as part of prison labor</article-title>
          .
          <source>The Verge, Mar</source>
          <volume>28</volume>
          ,
          <year>2019</year>
          . Accessed 20 May 2019 at https://www.theverge.com/
          <year>2019</year>
          /3/28/18285572/prison-labor
          <string-name>
            <surname>-</surname>
          </string-name>
          nland
          <article-title>-arti cial-intelligencedata-tagging-vainu.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Daniel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          et al.:
          <article-title>Quality control in crowdsourcing: A survey of quality attributes, assessment techniques, and assurance actions</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>51</volume>
          (
          <issue>1</issue>
          ), article
          <volume>7</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          et al.:
          <article-title>CDAS: a crowdsourcing data analytics system</article-title>
          .
          <source>In Proc. of the VLDB Endowment</source>
          ,
          <volume>5</volume>
          (
          <issue>10</issue>
          ), pp.
          <fpage>1040</fpage>
          -
          <lpage>1051</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Inel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          et al.:
          <article-title>Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data</article-title>
          .
          <source>In Proc. International Semantic Web Conference</source>
          , pp.
          <fpage>486</fpage>
          -
          <lpage>504</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.:
          <article-title>iCrowd: An adaptive crowdsourcing framework</article-title>
          .
          <source>In ACM SIGMOD International Conference on Management of Data</source>
          , pp.
          <fpage>1015</fpage>
          -
          <lpage>1030</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          et al.:
          <article-title>QASCA: quality-aware task assignment system for crowdsourcing applications</article-title>
          .
          <source>In ACM SIGMOD International Conference on Management of Data</source>
          , pp.
          <fpage>1031</fpage>
          -
          <lpage>1046</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Boychuk</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakaev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Entropy and Compression Based Analysis of Web User Interfaces</article-title>
          .
          <source>Lecture Notes in Computer Science (International Conference on Web Engineering)</source>
          ,
          <volume>11496</volume>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>261</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bakaev</surname>
            <given-names>M.</given-names>
          </string-name>
          et al.:
          <article-title>Auto-extraction and integration of metrics for web user interfaces</article-title>
          .
          <source>Journal of Web Engineering</source>
          ,
          <volume>17</volume>
          (
          <issue>6</issue>
          &amp;7),
          <fpage>561</fpage>
          -
          <lpage>590</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>