<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop on Data Mining and Knowledge Engineering, October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Manual and Automated Labeling of Web User Interfaces for User Behavior Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Stepanova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maxim Bakaev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Novosibirsk State Technical University</institution>
          ,
          <addr-line>Novosibirsk, 630073</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>16</lpage>
      <abstract>
        <p>The article contrasts manual and automated identification of elements in images of web user interfaces (UIs), which is essential for machine learning (ML) models that describe user behavior. We consider the principal advantages and disadvantages of the two methods and compare linear regression models. The constructed ML models describe users' subjective perception of web UIs in such dimensions as complexity, aesthetics and ordering. Somehow unexpectedly, the resulting R2s of models built with certain factors obtained from automated labeling turned out to be slightly higher. Particularly, shares of text and images in the web UI, as well as the sizes of the elements, were rather influential. We believe that the main disadvantage of the manual labeling is the human factor, as mistakes made by the labelers and diversity of their outcome affect the quality of the models. In turn, the automated process has a number of drawbacks that must be taken into account and that we discuss in the paper. The results of our work might be of interest to both ML researchers and to usability engineers who seek to improve the subjective satisfaction of users with websites.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Image labeling</kwd>
        <kwd>human-computer interfaces</kwd>
        <kwd>machine learning</kwd>
        <kwd>linear regression</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Any design object needs effective presentation, in which structuring of textual and visual
information is highly important [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Many researchers and designers have been looking for the
principles of harmonious organization of compositional elements in architecture and website design.
For instance, visual appearance of web user interfaces (UIs) is known to affect behavior of users, and
its analysis can help to improve usability and thus increase KPIs of the website, such as e.g.
conversion rate [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The visual complexity assessment helps to identify and describe problems in the
website UI. Visual complexity is affected by the number of elements in an object or image, their
structural relations, the detail of the information that these elements provide, etc. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It has been
scientifically proven that aesthetic preferences for the visual complexity of web pages are influenced
by users’ age and previous experience [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, our article focuses on the dependence of visual
complexity in web UI screenshots: namely the common compositional elements in web pages
(buttons, texts, lists, etc.), as well as multimedia elements (images, videos, etc.).
      </p>
      <p>
        The identification of UI elements in a website page screenshot for further assessment of visual
complexity can be obtained through either manual labeling or automated recognition process [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Automation of any process makes it possible to simplify it and helps to free a person from routine and
tedious tasks, but often it involves additional costs and resources (time, labor, etc.), especially at the initial
stage. Table 1 shows a comparison of the automatic and manual methods with respect to UI labeling.
      </p>
      <p>Thus, the purpose of the current work is to determine the types of elements that affect the
subjective perception of websites, as well as to compare the models built with the factors’ values
obtained via automated vs. manual labeling of web UI screenshots.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Study Description</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. The Manual Labeling</title>
      <p>In the experiment, the subjects were offered about 500 website interface screenshots and asked to
label UI elements in them: highlight the element in a box and identify the elements’type. They were
using a dedicated software tool, LabelImg (see in Fig. 1). In total, 11 human labelers took part in this
activity, after providing informed consent.
2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>The Automated Labeling</title>
      <p>
        In addition to the manual labeling of the screenshots, we also performed their automated analysis,
using our dedicated Visual Analyzer (VA) software tool, available at http://va.wuikb.info/ and
described in detail in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It identifies UI elements in images based on previously trained ML models
(see in Fig. 2), but our previous studies suggest that its accuracy is somehow deficient, particularly in
determining the type of each UI element.
      </p>
      <p>
        In Table 2 we show the types of UI elements that had been identified in manual and automated
labeling that we performed. As one can seen from it, the main difference between the two methods
lies in determining the types of elements. In the manual labeling, the participants were able to rather
successfully identify 26 different UI types, while for the automated labeling, there were 8 types,
resulting from the pre-trained ML models. In addition, due to high visual diversity of today’s
designs, the accuracy of the automated type detection was the most problematic dimension in the
semantic-spatial analysis that our VA tool performed (see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for more detail).
      </p>
      <p>From the labeling datasets, the heights and widths of the elements were calculated (also by the
elements’ types: text paragraphs, buttons, images, panel,sa,ndettch.)ere was also the data on the
height and width of the screenshot. Based on this information, areas of each UI he element was
calculated, as well as the share occupied by this element in the total screenshot space (by the
elements’ types: texts, images, background, etc.).</p>
      <p>Thus, there were 2 groups of factors in the behavior models that we further constructed for
complexity, aesthetics and orderliness that were the dependent variables in the study:
1) the number of elements: the number of elements of a certain type located in one screenshot;
2) the proportions: the shares of the total area of the elements’ typetso the total area of the
screenshot.</p>
      <p>web</p>
    </sec>
    <sec id="sec-5">
      <title>The Subjective Perception Evaluation</title>
      <p>
        For each screenshot, we also had evaluations of complexity, orderliness and aesthetics, provided
by another 137 participants (67 female, 70 male). The majority of them were Russians (89.1%), while
the rest were from Bulgaria, Germany, South Africa, etc. More details on the participants and the
procedure can be found in one of our previous works [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Each of the subjective perception dimensions was assessed on a scale from 1 to 10. Then the
average values for each indicator were found using formulas (1)-(3), where i is the number of web
pages, nj is the number of participants who provided the evaluations for the i-th website.</p>
      <p>∑ (1)
where – indicator of the aesthetics of the interface of the i-th web page,
the aesthetics of the interface of the i-th web page</p>
      <p>∑
where – indicator of the complexity of the interface of the i-th web page,
of the complexity of the interface of the i-th web page
∑
– estimation of
– estimation
(2)
(3)
where – indicator of the ordering of the interface of the i-th web page, – estimation of the
ordering of the interface of the i-th web page.</p>
      <p>
        Complexity is the number of elements on the screen and their arrangement. Orderliness is an
ordered data structure of the web interface that allows the user to easily find the information they
need. Aesthetics is an assessment of the attractiveness of a product by the user. This indicator is
important because the aesthetics of any web interface has a strong impact on the user, even when he
tries to evaluate the functionality of the system [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>3. Results</title>
      <p>Using the SPSS statistical analysis software, we build linear regression user behavior models (as
representing the most universal ML method) with the factors resulting from the manual and the
automated labeling. The dependent variables in the models were the three subjective perception
evaluation scores, while the independent variables were the factors resulting from the two labeling
processes. The step-by-step method of the regression analysis allowed stepwise inclusion of the
factors in the models, thereby discarding those that did not make a significant contribution to
explaining the dependent variables.</p>
      <p>The results of the regression analysis are presented in Tables 3-5, each corresponding to a different
subjective perception dimension. The null hypothesis H0 was as usual in the regression analysis, that
the regression equation is not significant. For each of the dependent variable, the models turned out to
be significant (p &lt; 0.05), while the significances for the factors selected by the step-by-step method
are shown in the respective columns of the tables.</p>
      <p>As one can note from Table 3, the perceived complexity of websites was most influenced by the
amount of text, the number of radio buttons, the number of buttons and tabs. The subjective
orderliness (Table 4) was most influenced by the proportion of radio buttons and buttons. The
perception of aesthetics (Table 5) was most influenced by the proportion and amount of text on the
page, the number of buttons, labels and images.
t</p>
      <p>Despite the fact that the constructed models had rather low determination coefficients (R2), it is
still possible to draw a general conclusion about which elements affect the assessment of website
perception among users, and whether the degree of this dependence is influenced by the labeling
method (Table 6).</p>
      <p>As one can see from Table 6, the share of text on the page was significant for all the three
subjective perception dimensions. Moreover, the presence of the text had the greatest impact on the
assessment of complexity and aesthetics. At the same time, the obtained values for the automated and
the manual labeling did not differ much (4-10%), but with the automated labeling, the models’ quality
indexes, as represented by R2s, were somehow superior.</p>
    </sec>
    <sec id="sec-7">
      <title>4. Conclusion</title>
      <p>The study showed that the subjective assessment of the website perception is influenced by the
amount of text on the page and the share of images: the more text on the page, the more complex and
less aesthetic the website appears. However, this statement is not entirely correct, since in addition to</p>
      <sec id="sec-7-1">
        <title>Complexity</title>
      </sec>
      <sec id="sec-7-2">
        <title>Orderliness</title>
      </sec>
      <sec id="sec-7-3">
        <title>Aesthetics</title>
      </sec>
      <sec id="sec-7-4">
        <title>Number of elements on the page</title>
      </sec>
      <sec id="sec-7-5">
        <title>Average height of text</title>
      </sec>
      <sec id="sec-7-6">
        <title>Average height of text</title>
      </sec>
      <sec id="sec-7-7">
        <title>Average width of buttons</title>
      </sec>
      <sec id="sec-7-8">
        <title>Average height of images</title>
      </sec>
      <sec id="sec-7-9">
        <title>Average width of labels</title>
      </sec>
      <sec id="sec-7-10">
        <title>Average height of labels</title>
      </sec>
      <sec id="sec-7-11">
        <title>Average height of button</title>
      </sec>
      <sec id="sec-7-12">
        <title>Average height of text</title>
      </sec>
      <sec id="sec-7-13">
        <title>Number of elements on page</title>
      </sec>
      <sec id="sec-7-14">
        <title>Average height of label</title>
        <p>R2 (Automatic
method)
0.071
the share of the text area, the text style (font, size, etc.), and the presence of pictures that dilute the
text, and many other factors are also important. The number of images, their size, quality and position
on the web page affect the overall website subjective perception, including the assessment of the
aesthetics and orderliness. As a general rule, these indicators are interchangeable: the lack of text is
often compensated by a variety of graphic elements and images. Therefore, in order for the web UI to
comply with usability standards and to be simpler and more understandable for its users, it is
necessary not to clutter the interface with a large number of elements and break long texts into smaller
parts. At the same time, the number of element types did not significantly affect any of the subjective
perception dimensions.</p>
        <p>
          The quality of the user behavior models built on the factors resulting from the automated labeling
was slightly higher than for the manual one, which suggests feasibility of our UI visual analysis tool.
In general, this may indicate that the automation of the labeling process makes sense, but it requires
high costs and the presence of certain knowledge to implement it. After all, our VA software was
initially trained with the data once provided by human labelers [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>It should be noted that the constructed linear models have low R2 coefficients, so they should not
be used in production. The goal of our current study was merely to compare the two labeling methods,
while for real user behavior modeling, more advanced ML methods and architectures should be used.</p>
        <p>Our plans for further research include investigation of the effects of web page layouts (the size of
the element, its type and occupied area) on the subjective assessment of the perception of the site, but
also the colors, fonts, types of buttons, animations, etc.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgment</title>
      <p>The reported study was funded by RFBR according to the research project No. 19-29-01017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Rassadina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.V.</given-names>
            <surname>Ivanova</surname>
          </string-name>
          .
          <article-title>Assessment of visual perception of information design objects</article-title>
          .
          <source>Proc</source>
          .
          <article-title>XVI Int Conf on Cultural studies, philology, art history: urgent problems of modern science</article-title>
          ,
          <volume>30</volume>
          -
          <fpage>35</fpage>
          (
          <year>2018</year>
          ). - In
          <string-name>
            <surname>Russian</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.V.</given-names>
            <surname>Pesterev</surname>
          </string-name>
          ,
          <article-title>Influence of behavioral factors of ranking on the position of sites in search results</article-title>
          . Bulletin of the Belgorod State Technological University,
          <volume>2</volume>
          ,
          <fpage>219</fpage>
          -
          <lpage>221</lpage>
          (
          <year>2017</year>
          ). - In
          <string-name>
            <surname>Russian</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kusumasondjaja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tjiptono</surname>
          </string-name>
          ,
          <article-title>Endorsement and visual complexity in food advertising on Instagram</article-title>
          .
          <source>Internet Research</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ),
          <fpage>659</fpage>
          -
          <lpage>687</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>An investigation into visual complexity and aesthetic preference to facilitate the creation of more appropriate learning analytics systems for children</article-title>
          .
          <source>Computers in Human Behavior</source>
          ,
          <volume>92</volume>
          ,
          <fpage>706</fpage>
          -
          <lpage>715</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.V.</given-names>
            <surname>Tsyplyaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Vinokurov</surname>
          </string-name>
          ,
          <article-title>System and method for selecting significant page elements with implicit indication of coordinates for identifying and viewing relevant information</article-title>
          .
          <source>Patent, RU 2708790 C2</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Khvorostov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaedke</surname>
          </string-name>
          ,
          <article-title>Auto-extraction and integration of metrics for web user interfaces</article-title>
          .
          <source>Journal of Web Engineering</source>
          ,
          <volume>17</volume>
          (
          <issue>6</issue>
          &amp;7),
          <fpage>561</fpage>
          -
          <lpage>590</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Speicher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaedke</surname>
          </string-name>
          , I Don'
          <article-title>t Have That Much Data! Reusing Behavior Models for Websites from Different Domains</article-title>
          .
          <source>Proc. 20th International Conference on Web Engineering (ICWE</source>
          <year>2020</year>
          ),
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          (
          <year>2020</year>
          ). Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Triberti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chirico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>La Rocca</surname>
          </string-name>
          , G. Riva,
          <article-title>Developing emotional design: Emotions as cognitive processes and their role in the design of interactive technologies</article-title>
          . Frontiers in psychology,
          <volume>8</volume>
          ,
          <issue>1773</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>