<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Case-Based System of User Interface Reverse Engineering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pavel Myznikov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Novosibrisk State University 1 Pirogova str.</institution>
          ,
          <addr-line>Novosibirsk 630090</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>The paper is devoted to the implementation of case-based reasoning in reverse engineering of graphical user interfaces. In particular, web interfaces are considered. We suggest automating HTML/CSS markup building with aggregation of code samples from previous cases. The proposal is based on the hypothesis that analogy is employed to conceptualize HTML markup construction. The article considers the original theory of building an image structure, the modification of the case-based reasoning approach and the results of practical experiments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Web-technology is one of the most developed areas of
modern computer science. It is used not only in website
development but also as an important node of any IT-infrastructure.
Consequently, the whole information technology industry
uses web-development in one form or another.</p>
      <p>Nevertheless, automation of web-development itself
remains incomplete. One of the key tasks – creation of
html/css markups – does not have an extensive solution
yet. Some properties of this process prevents using
classical methods of automation. These properties are:
nonformalized requirements for a layout, variability of
industrial standards for code writing, and, finally, cross-browser
and cross-platform compatibility.</p>
      <p>The impact of automation of html/css markups building
is not limited to speeding up a process of web-applications
development. It also makes the development more flexible
allowing to scale results, verify hypotheses and help testing
applications.</p>
      <p>We can expand the results of the work to a more general
domain: any kind of technologies where markup languages
are used for building GUI.</p>
      <p>A methodological basis of the work is the Case-Based
Reasoning (CBR) approach (Kolodner 1992). Briefly
describing its essence, one can say it is a way of solving
problems with adaptation of similar problems from the past to a
current situation. This approach has been chosen due to the
hypothesis that formalization of images (which is, in fact,
creation of html markup) using transductive reasoning, or
reasoning by analogy, produces a result, the most similar to
what a human does. Also, such a scheme can solve the
problems with the automation described above.</p>
      <p>The state of the art solutions offer to generate code
without human involvement. They work as black boxes, where
the final result totally depends on the collected dataset. The
CBR approach is to fill this gap with the acquisition of
experts’ knowledge in case storage. The ultimate role of CBR,
therefore, is to help specialists to formalize their knowledge
in a relatively small dataset and then automatically combine
complex interfaces in such a way as if the specialists did it
manually.</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>To fill a gap in knowledge for readers who are not familiar
with the field, we overview two types of related work. The
first one is the papers about interfaces reverse engineering:
they are the benchmarks which we compare our approach
with. The second one is an outline of CBR. Since this is a
core of the given approach, this information is crucial for
understanding the paper.</p>
      <sec id="sec-2-1">
        <title>Interfaces reverse engineering</title>
        <p>We consider two types of interfaces that are typical objects
of the reverse engineering task: mobile interfaces and web
interfaces.</p>
        <p>Mobile interfaces Mobile development is supposed to be
a more common area for interfaces reengineering than web.
It can be explained by the fact that mobile interfaces are
usually relatively simplier and more unified, while web pages
have a great variance of different layouts. Here we point at a
few mobile interface reverse engineering researches that we
consider as robust baselines in terms of comparing OCR and
code generation parts.</p>
        <p>
          The framework of
          <xref ref-type="bibr" rid="ref19">(Nguyen and Csallner 2015)</xref>
          ,
REMAUI, is conceptually similar to our approach, especially
in the OCR part. REMAUI is also faced with the problem of
false-positive candidates and contains a merging step while
parsing an input. In both questions, they use a set of
heuristics specific to a mobile domain. Surely, there is a difference.
Firstly, our OCR algorithm is mainly focused on image
detection rather than text detection. Secondly, we suggest
another model of hierarchy building. However, we can state
that the idea of filtering candidates in the OCR stage was
added to our framework after familiarization with their
paper.
        </p>
        <p>
          Based on REMAUI,
          <xref ref-type="bibr" rid="ref17">(Natarajan and Csallner 2018)</xref>
          developed a tool – P2A – for generating a code for animated
mobile application. This is the next step of the problem: not
only to recognize a layout but also to find a relation between
two similar interfaces screenshot. What is important is that
this tool implies an interaction with a user while performing
a task as well as our approach. The difference is that we
suggest a long-term setting of the system in the beginning but
the absence of it later, while P2A offers a user’s involvement
in each session of image processing.
        </p>
        <p>
          The work of
          <xref ref-type="bibr" rid="ref8">(Chen and Burrell 2001)</xref>
          is devoted to
solving a very specific subtask of mobile interface reverse
engineering: automatic conversion of an Android application
into an iOS application and vice versa based on interface
screenshots. Having such restricted input and output of the
task, the authors were able to implement a more powerful
model. Namely, they reduced the task to component
detection and component classification, where for the second part
convolutional neural networks were used.
        </p>
        <p>Web interfaces Further, we would like to focus on web
reverse engineering, because in the case of solving this task,
a possible outcome can be more considerable, as web
technologies are applied in an extremely wide range of domains.
Researches in this field are not so numerous, however.</p>
        <p>
          Asiroglu in
          <xref ref-type="bibr" rid="ref3">(Asiroglu et al. 2019)</xref>
          presented a quite
unusual reverse engineering case. Their tool accepts a
handdrawn sketch and generates an HTML document using a
deep convolutional neural network. Such an application
seems very helpful in the industry. Unfortunately, it cannot
be used in solving the given research problem, because in
our case, we should detect exact features of the interface,
e.g. color, font, size, etc., while that tool perceives a
common idea of an interface.
        </p>
        <p>
          The most recent successful case of solving the problem in
the industry is the result of UIzard Technologies company
          <xref ref-type="bibr" rid="ref4">(Beltramelli 2017)</xref>
          with the title “pix2code”. The authors
note the similarity of the task of generating a code based
on an interface screenshot with the task of generating a text
in a natural language based on an image. Consequently, they
use a similar method: the basis of the solution is a cascade
of long-short term memory recurrent neural networks. At the
moment, this project is only on a proof-of-concept stage, so
it is too early to make valid conclusions about the viability
of the idea.
        </p>
        <p>As you can see above, there are many points of view on
the interface reverse engineering problem. Each of them is
suitable for specific conditions and goals. The contribution
of the given paper is building a reverse engineering
framework more as an expert system rather than an automation
tool. It means that compared with alternatives, it can
eventually generate less accurate results in terms of pixel-to-pixel
similarity, but an important thing is that specialists can
contribute their knowledge in a case storage, thereby managing
a style or methodology that they expect to see in the final
result.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Case-Based Reasoning</title>
        <p>As mentioned above, case-based reasoning is a core of the
approach. This section outlines basics of CBR.</p>
        <p>
          The origin of CBR is the work of Roger Schank
          <xref ref-type="bibr" rid="ref23">(Schank
1982)</xref>
          . His idea is representing knowledge as memory
organization packets (MOPs) that hold results of a person’s
experience. When a person solves a problem, he or she uses
MOPs to reapply a previously successful solution scheme in
a new similar context. This approach contradicts the
rulebased approach, where a person uses predefined scripts for
all cases. You can find more information in
          <xref ref-type="bibr" rid="ref1 ref28">(Watson and
Marir 1994)</xref>
          .
        </p>
        <p>
          A lot of works devoted to this approach were written
in the 1990s
          <xref ref-type="bibr" rid="ref1 ref1 ref20 ref28 ref28">(Kolodner 1992; Watson and Marir 1994;
Aamodt and Plaza 1994; Ram and Santamar´ıa 1997)</xref>
          . In
the 2000s case-based reasoning was implemented in
different domains: medicine
          <xref ref-type="bibr" rid="ref11 ref12">(Go´mez-Vallejo et al. 2016; Delir
Haghighi et al. 2013)</xref>
          , business
          <xref ref-type="bibr" rid="ref6 ref7 ref9">(Chou 2009; Chang, Liu,
and Lai 2008; Carrascosa et al. 2008)</xref>
          , finance
          <xref ref-type="bibr" rid="ref14 ref21 ref26">(Sartori,
Mazzucchelli, and Gregorio 2016; Sun et al. 2014)</xref>
          , government
(Lary et al. 2016), education
          <xref ref-type="bibr" rid="ref29">(Zeyen, Mu¨ller, and Bergmann
2017)</xref>
          , information technologies
          <xref ref-type="bibr" rid="ref10 ref18">(De Renzis et al. 2016;
Navarro-Ca´ceres et al. 2018; He 2013)</xref>
          , systems design
          <xref ref-type="bibr" rid="ref24">(Shen et al. 2017)</xref>
          , geosciences (Lary et al. 2016).
        </p>
        <p>In short, CBR is a method of solving a problem by
adaptation of solutions to similar problems in the past to a current
situation.</p>
        <p>The core concept in CBR is a case. A case is a threesome
of elements:</p>
        <p>Problem is a state of the world when the case occurs.
Solution is a suitable answer for the problem in the given
context.</p>
        <p>Outcome is a state of the world after the problem is
solved.</p>
        <p>The process of finding the solution is called a CBR-cycle
that consists of 4 stages (Kolodner 1992):
1. Retrieve. The best matching case is extracted from the
case base.
2. Reuse. The solution to the case retrieved is adapted to
solve the current problem.
3. Revise. The adapted solution is tested: if it is not
satisfactory, then either additional cases are retrieved, or the
retrieved solution must be adapted again.
4. Retain. The case with the give solution is added to the
case base.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Task definition</title>
      <p>
        When defining the task, we aim to reproduce the way that a
human performs HTML markup building. All criteria,
constrains and conditions given below are based on how the
industry operates. For more information, see
        <xref ref-type="bibr" rid="ref2">(Ali 2017)</xref>
        .
      </p>
      <p>The following life-cycle of interaction with the system is
suggested. In the very beginning, an expert (or a group of
experts) populates a case storage with code templates to setup
the system. Or, they can use a ready case storage populated
with someone else. This step is required to be done only
once. Then, the system is used to get a design of interface
as an input and generate code automatically. At this point,
we don’t expect an absolutely accurate result. A specialist,
in the end, makes final improvements in the code, or uses it
as a basis for more high-level tasks.</p>
      <p>Consequently, the main requirements to an image
processing part are that the algorithm must recognize an image
structure, find as many features (color, type, margins, etc.)
of objects as possible, and generate the code implementing
the structure recognized. Wherein, it is not necessary to
recognize all features
recreate the picture pixel-to-pixel</p>
      <p>The introduction and related work sections contain
conditions that should be considered in solving the task. It is
necessary to make a set of requirements for the system such
that it would be possible to get positive results in these
conditions.</p>
      <sec id="sec-3-1">
        <title>Condition 1. Non-formalized requirements. An image</title>
        <p>itself does not contain complete information about what
should be a final HTML-markup. Therefore, it must be
available to include additional knowledge about the
requirements for the system.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Condition 2. Variability of industrial standards to</title>
        <p>writing the code. There are a few methodologies
of HTML-markups (adaptive, BEM
(block-elementmodificator), bootstrap and so on). Moreover, as a rule,
each developer’s team has its own standards. Thus, the
system must generate code in several styles and standards.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Condition 3. Cross-browser and cross-platform com</title>
        <p>patibility. HTML-code must be not only valid, but also
identically processed in all modern browsers and
operating systems.</p>
        <p>In addition, we set input constraints on the current stage.
An image must not include
gradient background
animation elements
popup elements</p>
        <p>In the future, we intend to make an algorithm working
beyond these conditions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Architecture</title>
      <p>The basis of the architecture is an attempt to recreate the
process of writing code by a human. The hypothesis is, a
human uses transductive reasoning when formalizing visual
information: one extracts structures of different levels from
an image and describes this structure by analogy with one’s
own or others’ previous experience.</p>
      <p>In particular, such a situation is observed in writing
HTML-code. Meeting an unknown layout pattern, a
specialist finds in the literature, how to implement it on HTML.
Further, referrals to directories are becoming less and less,
but the principle remains the same: facing another pattern, a
human extracts from memory a suitable example and adapts
it to a current situation.</p>
      <p>The process is implemented in the following way (Fig. 2):
1. The elements are extracted from the image (text, pictures,
blocks, etc.).
2. The extracted images are combined in a tree-like structure
according to a special algorithm.
3. A prefix tree traversal is performed; on each iteration,
there is a request to the case base to find a suitable HTML
description of an architectural pattern in the tree node.
4. Artifacts received on the previous step are saved in files.</p>
      <p>The next sections are devoted to the description of these
steps.</p>
    </sec>
    <sec id="sec-5">
      <title>Image structure building</title>
      <p>
        The goal of this step is to receive a so-called structure of
a bitmap, i.e. a mathematical object describing a mutual
arrangement of image parts. In
        <xref ref-type="bibr" rid="ref16">(Myznikov 2018)</xref>
        , we
described an approach to solving this problem. We detect
borders of objects (embedded images, texts, background) and
then use a greedy algorithm to build a hierarchy of these
objects (see algorithms 1 and 2). The hierarchy then is
processed with the CBR cycle (see the next section).
15:
16:
Algorithm 1 Algorithm of hierarchy construction
Require: nodes – set of elements ordered by horizontal and
vertical axes of the top left coordinate
Ensure: jnodesj == 1 and nodes1 is a root node of the
tree containing all elements from the origin set
1: while jnodesj &gt; 1 do
2: for orient [HOR,VERT] do
3: i 0
4: while i &lt; jnodesj do
5: suitN ode f indN ode(nodesi; orient)
6: if suitN ode is not NULL then
7: if nodei is composite then
8: nodesi:addChild(suitN ode)
9: else
10:
11:
12:
13:
14:
newN ode new N ode
newN ode:addChild(nodesi)
newN ode:addChild(suitN ode)
newN ode:orientation orient
nodesi newN ode
nodes nodes n suitN ode
i i + 1
      </p>
    </sec>
    <sec id="sec-6">
      <title>Development of the CBR-system</title>
      <p>Remember that generation of code is performed with the
prefix tree traversal, where a tree is an image structure. On
each iteration, the CBR-cycle runs. We need to define the
problem, solution, and outcome in the context of the task. In
other words, to describe a case format.</p>
      <p>Problem is a features vector of a node and its children.
Solution is templates of HTML and CSS code, that
implement markup of the structure described.</p>
      <p>Outcome is HTML/CSS code generated from the
template applied to a specific context.</p>
      <p>Thus, when each node of a tree is processed, a “problem”
is formed: description of the current element and elements
on a lower level in an image structure. In storage, there are
cases in which problems are typical cases of elements
layout, and solutions are typical ways of markup. The task of a
CBR-cycle is to get HTML/CSS code to build a markup of
a current structure using the solution stored. Let us describe
the cycle in detail. As a notation, denote the case processed
as new case. Further terms will be included as they appear.
Algorithm 2 Search for a suitable node (findNode)
Require: nodei – a node for that a suitable node is
being found, nodex1 ; nodey1 – a left top coordinate,
nodex2 ; nodey2 – a right bottom coordinate
Require: orientation – a direction of joining nodes
Ensure: nodej is a node demanded or NULL if there is no
such a node
1: if orientation is HOR then
2: nodej the next node to the right of the nodei
3: else if orientation is VERT then
4: nodej the next node to the bottom of the nodei
5: x1 min(nodeix1 ; nodejx1 )</p>
      <p>1 1
6: y1 min(nodeiy ; nodejy )
7: x2 max(nodeix2 ; nodejx2 )</p>
      <p>2 2
8: y2 max(nodeiy ; nodejy ))
9: R rectangle (x1, y1) (x2, y2)
10: for all nodek 2 nodes do
11: if nodek \ R 6= ; then
12: nodej NULL
Retrieval phase A new case has a problem but no
solution and outcome. The task of this stage is to retrieve cases,
which are the most similar to the problem of a new case. In
general, it is a task of n-class classification, where n is the
number of cases in storage. On the proof-of-concept stage,
we choose the method of k-nearest neighbors with Euclid
distance. For this purpose, we vectorize and normalize
problems. This means that first, all categorial features are
transformed to a numerical type, second, absolute values are
replaced with relative ones by the formula: xi = xmaxxixmin .
Then, calculate a distance between the given vector and
vectors in a case base d(x; y) = pPin=1(xi yi)2 and select
the case, where distance is minimal. In terms of CBR, this
case is called retrieved.</p>
      <p>Adaptation phase The adaptation stage is performed with
an algorithm, receiving a problem of a new case and a
solution of a retrieved case and returning an outcome. A case
containing a problem of a new case, solution of a retrieved
case and the generated outcome is a solved case, and the
solution is called suggested.</p>
      <p>In the current work, a derivative type of adaptation is
suggested, that means an old solution is to be regenerated in
new conditions. Since a solution is a code template, a
result is built with using templating language. The difficulty
is that the results of different cases may conflict with each
other during the processing. It is especially true for conflicts
in CSS code: situations when different results describe the
same class differently. That is why conflict resolution is an
important part of the adaptation stage.</p>
      <p>The base of conflict resolution is a resolving of three
cases: 1) absence of conflict; 2) a complete conflict, that is
classes describe incompatible styles; 3) one class is a special
case of another. The following strategies are applied
accordingly: 1) the code is saved as is; 2) different classes with
own styles are created, and appropriate replacements in an
HTML-document are made; 3) a new class is added to a
styles set, which is a special case, and appropriate inserts
in an HTML-document are made.</p>
      <p>Evaluation phase In the classic CBR, an evaluation step
serves as validation of the suggested result. A case that
passed the test is called tested, and the solution is approved.
Regarding this work, the task of evaluation of the quality of
HTML/CSS code deserves a separate study. In the current
state of the research, this step is skipped, and validation of
the result is resolved for the whole document and not on each
iteration of a CBR-cycle.</p>
      <p>Updating phase The goal of the last step of the cycle is
to save an approved solution in case storage so that it can
be reused in the future. In the current work, this step differs
from the classic approach. Unlike using a centralized case
storage, the CBR-system developed maintains a “global”
and few “local” case storages (see table 1). The global case
storage is used for all images. The local ones are used only
in processing a specific document. We can say that a “local”
storage is a context of a document: it contains the results of
solved problems during its processing.</p>
      <p>Specifically, a tested case is always saved only into a local
storage. Wherein, on a retrieval step, a search is performed
in both global and local repositories, where local ones have
priority. This approach has several advantages:
1. A risk of different code generation for the same blocks of
an image is decreased. The system may process identical
blocks differently, because noise on an image can affect
retrieval of a suitable case. At the same time, the size
limitation of a local storage prevents generating redundant
solutions.
2. System performance is increased:
(a) a global storage is located in external memory, while a
local one is in RAM; as a result, the number of requests
to a hard disk reduces
(b) graphical user interfaces have a property to contain a
lot of similar blocks; when processing the next block, a
search in a relatively small local storage, which already
contains a suitable case, is performed much faster than
a repeating search in a global storage; also, an
adaptation stage requires less resources than initial adaptation
of an “unprepared” case.
3. There is no “clogging” of a global storage with specific
cases, which, first, positively influences the size of global
storage, and second, it reduces a level of overfitting of the
system.</p>
      <p>In other words, a global storage serves as a source of cases
that have solutions of code generation of a typical layout.
Then, local storage is used for accurate and rapid adaptation
of these solutions in the context of a specific document.
Case storage population The case storage is populated by
experts with a share of automation. Namely, a set of cases
with their problem parts is built automatically, and
corresponding solution parts are filled by specialists. The
following are the steps for generating cases:
1. A list of real websites is collected.
2. DOM-trees of random pages are saved.
3. Trees are split by sub-trees with prefix traversal of a whole
tree and selecting each node with its children.
4. A given set of sub-trees is clustered with DBSCAN
algorithm, when tree-edit distance is used to define the
similarity between objects. (See tree-edit distance overview in
the section below.)
5. From each cluster, a random object is selected. The
selected objects form the case storage, where each object is
a problem part of a single case.</p>
      <p>While managing parameters eps and min samples of
DBSCAN, one can control the size of a case storage. The bigger
the case storage, the more accurate the system, but there is
more work for experts. One must find a right balance
between these two criteria to set up the system adequately to
the existing conditions. Then, when the set of cases with
their problems is built, experts can populate it with solutions
- HTML/CSS codes.</p>
    </sec>
    <sec id="sec-7">
      <title>Results and further work</title>
      <p>Before the experiment evaluation, let us illustrate how the
system works in practice.</p>
      <p>As an example, let us consider a page of Novosibirsk State
University site.</p>
      <p>As an output of the first stage of image processing,
elements were extracted and grouped into nodes (Fig. 3). Then,
the algorithm (described earlier in the paper) created a
treelike structure (Fig. 4). We estimate that the result is of good
quality, because despite of some mistakes (wrong font was
selected, some pictures were missing, tiny errors in size
existed), the system managed to solve the main task: to recreate
a structure of elements and generate human-like code (Fig.
5).</p>
      <sec id="sec-7-1">
        <title>Evaluation method</title>
        <p>The question of evaluation is open. What should we compare
to estimate the result? One option is to compare images: an
initial screenshot and a screenshot of the interface generated
by the system. It is probably the most obvious way but as we
stated in the task definition part, we do not aim to recreate an
interface by pixel-to-pixel: it is much more important to
recognize the idea of the layout. Another option is to compare
source codes. We can collect a base of real applications with
their source codes, make screenshots, generate a code, and
find the difference between the texts. This way is too
sensitive to implementation specificity. There are numerous ways
to write a code for the same interface and there is no
argument to consider the result incorrect if the generated code
differs from the real one.</p>
        <p>We believe the best way is to compare tree models of the
source codes, because a tree model represent the way how a
code is organized. We collect the data with crawling web
pages so that among the screenshots we get HTML code
and extract the DOM model from them. As a rule (except
some tricky cases), the DOM model adequately describes
elements structure. The idea is to map structures received by
our algorithm with DOM trees and roughly estimate their
similarity. Such a procedure allows us to evaluate our
approach, although we understand that it is limited only on
a specific subset of images (only web interfaces) and also
some technical tricks of web technologies can affect the
correctness of DOM trees and thereby decrease the estimates.
Nevertheless, it is the best way of evaluating the method so
far, which can prove the viability of the solution and outline
the next steps.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Selection of a measure</title>
        <p>The crucial thing is the selection of a quality measure. There
are few approaches to comparing trees: some of them count
operations needed for the transformation of one tree to
another; others compare the longest common paths from a root
to a tree node; also there are variable-length doesn’t care
(VLDC) based methods.</p>
        <p>We select two measures: edit distance and bottom-up
distance.</p>
        <p>Tree edit distance. The review of edit distance methods
can be found in the survey of Philip Bille. This is how he
defines the problem:</p>
        <p>Given T is a labelled ordered tree (ordered means that
order among siblings matters). Define tree operations as
follows:</p>
        <sec id="sec-7-2-1">
          <title>Relabel. Change the label of a node.</title>
          <p>Delete. Delete a non-root node v in T with parent v’
keeping children of v order.</p>
          <p>Insert. Insert a node v as a child of v’ in T making v the
parent of a consecutive subsequence of the children of v’.</p>
          <p>Assume there is an edit cost is c : V V ! R. c(v; w)
denotes relabel, c(v; ?) denotes delete and c(?; w) denotes
insert operations. Given an edit script S between T1 and T2
is a sequence of edit operations s1; s2; : : : ; sn, si 2 V V
and cost of S is d(S) = Pin=1 c(si). Denote an optimal
edit script between T1 and T2 is Sopt(T1; T2) : d(Sopt) =
minSt d(St) and d(Sopt) is a tree edit distance. (Bille 2005)</p>
          <p>
            Based on the survey
            <xref ref-type="bibr" rid="ref5">(Bille 2005)</xref>
            , we considered that the
best solution of the tree edit distance problem for our case is
the Klein’s algorithm (Klein 1998), which requires a worst
case time bound of O(jT1j2jT2j log jT2j) and a space bound
of O(jT1jjT2j).
          </p>
          <p>In our case, we suggest the following edit cost function.
First, we denote a fixed finite alphabet containing values
for labels:</p>
          <p>= ft; p; ig
where t stands for an element with text, p stands for an
element with an embedded picture, and i stands for an internal
node.</p>
          <p>We denote edit cost as follows:
c(v; w) =
81; if v 6= ? ^ w = ?
&gt;
&gt;&lt;0:8; if v = ? ^ w 6= ?
&gt;0:1; if v 6= ? ^ w 6= ? ^ v; w 2 fp; tg
&gt;:0; otherwise
(1)</p>
          <p>Also, denote that a tree received by our algorithm would
be the first parameter, and a tree from a data set would be
the second one.</p>
          <p>This edit cost function penalizes the method the most if
a resulted tree misses any node, a little less if it contains an
extra node and very little if it mixes up a text with a picture.
You can mention that it does not penalize for mixing up an
internal node with no internal, because in this case the
algorithm either misses or adds an extra node, which already
implies a big penalty.</p>
          <p>Bottom-up distance. This method is presented in the
work of Valentie (Valiente ). The complexity of the
algorithm is O(jT1jjT2j log(T1 + T2)). For two trees T1 and T2
the bottom-up distance equals 1 f = max(jT1j; jT1j), where
f is the size of the largest common forest in T1 and T2. We
slightly change this formula and make it asymmetric:
basym(T1; T2) = 1
f =jT2j
(2)</p>
          <p>Edit and bottom-up distances as defined above can be
roughly interpreted as “precision” and “recall” respectively
in terms of machine learning evaluation. Technically, they
are not the same, but it gives us a good idea of how to read
an experiment results. Indeed, for a tree that correctly
represents a part of another tree but completely does not
contain another part, edit distance would be relatively low, while
bottom-up distance would be relatively high. Otherwise, if a
tree contains all nodes of another tree, but the structure is
different and some extra nodes exist, edit distance would be
relatively high and bottom-up distance would be relatively
low.</p>
          <p>To estimate the approach from different points of view we
use both measures as well as the F -score that generalizes a
common penalty. The problem of aggregating the scores is
that they have different scales: [0; +1) for an edit distance
and [0; 1] for a bottom-up one. We solve this problem by
transforming an edit distance measure. In the beginning, we
normalize its value by the size of a sample tree:
dnorm(T1; T2) =
d(Sopt(T1; T2))
jT2j</p>
          <p>It allows us to compare edit distance for trees of different
sizes but the measure is still not limited from above.
Therefore, we apply the logarithm to the measure:
(3)
dlog(T1; T2) =
log(</p>
          <p>1
1 + dnorm(T1; T2)
)
(4)</p>
          <p>As d is always positive, the domain of dlog is [0; 1).
Moreover, the logarithmic form of the measure perfectly suites
our idea about estimating the method: we assume that the
more trees differ, the less important the exact value of the
difference.</p>
          <p>Finally, we can denote the F -measure. In order to move
from cost scores to measure quality, we subtract distances
from one.</p>
          <p>F = 2
(1
(1
b) (1
b) + (1
dlog)
dlog)
= 2
(1
2
b) (1
b</p>
          <p>dlog)
dlog
(5)</p>
        </sec>
      </sec>
      <sec id="sec-7-3">
        <title>Data set collection</title>
        <p>We used Alexa service 1 to get a list of 1000 the most
popular websites. Then we made screenshots of the main
and two random pages of each website. Also, we crawled
HTML/CSS codes of each page and transformed them into
trees using DOM parser in Python. Due to technical
restrictions in some cases, we were only able to collect a dataset of
2640 examples.</p>
        <p>In addition, to make the experiment more useful and get
more insights we scored each web page with a measure
“Text-to-Image ratio”:
tti =</p>
        <p>T
T + I
(6)
where T is the number of nodes with text content and I is
the number of nodes with embedded pictures.</p>
        <p>The reason why we use this score is to estimate how a
share of text and images affects the result and to define the
next steps. It is important to understand which part of the
method is the most problematic, and where efforts should be
focused to increase the quality as much as possible.</p>
      </sec>
      <sec id="sec-7-4">
        <title>Experimental results</title>
        <p>In general, results are as follows (see Table 2):</p>
        <sec id="sec-7-4-1">
          <title>1https://www.alexa.com/topsites</title>
          <p>The mean of F-score is 0.78, and the variance is 0.033,
which appears to be a good result. Note, that the
bottomup distance is much better than the edit distance. It means
that the recognition part, on average, works better than the
structure building part.</p>
          <p>Let us analyze the dependence of scores on
“Text-toImage ratio” (figures 7 and 8).</p>
          <p>We see that the biggest issues are with the cases when an
image has both text and embedded pictures, approximately
in a proportion of seven to three. Comparing extreme cases,
when an image consists of mainly embedded pictures or
mainly text, in the first case the approach works far better
than in the second one. Herewith, edit distance is decreasing
when moving from mixed cases to more text-based, while
for the bottom-up distance this effect is not so strong.</p>
          <p>Also note the heteroskedasticity of the data: the variance
is bigger in “mixed” cases and smaller in extreme cases.
That is, the method is more unpredictable when a picture
has diverse content.</p>
          <p>Based on these outputs, we make the following
conclusions:</p>
        </sec>
        <sec id="sec-7-4-2">
          <title>The method demonstrated satisfactory results.</title>
          <p>The results can be advanced by applying forces in the
following areas:
– enhancement of the text detection part
– improvement of the processing of the cases when an
image has diverse content
In addition to making improvements from the previous
section, the next steps include:
1. Development of a procedure of automatic filling of case
storage.</p>
          <p>The current paper considers the development of a
system where case storage is populated manually by experts.
As explained above, this approach has strong advantages.
However, in general, we would like to collect cases
automatically, because it would be less time-consuming and
less prone to human errors. Our suggestion is to analyze
existing web-sites elements with one of the methods of
clustering and use centers of clusters as reference cases.
2. Development of a better similarity measure.</p>
          <p>One of the crucial elements of a case-based reasoning
solution is the selection of an appropriate similarity
measure. The current version uses a simple KNN principle.
Consequently, there is room for optimizing the measure
construction, because the nearest neighbors algorithm is
insensitive to categorical features. As categorical features
are the majority of elements properties processed, we are
planning to use classifier models based on decision trees.
3. Development of a revise stage in the CBR-cycle.</p>
          <p>When building a CBR cycle, the revising stage has been
skipped, because the evaluation of HTML document
correctness is an unsolved problem. This question should be
investigated to make the cycle complete.</p>
          <p>In conclusion, we developed a system that generates
markup language source code for a given interface
screenshot. The feature of the approach is using experts’
knowledge that is kept in a specific case storage. The experiments
demonstrated the satisfactory quality of the current solution
and provided grounds for the further development.</p>
          <p>He, W. 2013. Improving user experience with case-based
reasoning systems using text mining and Web 2.0. Expert
Systems with Applications 40(2):500–507.</p>
          <p>Klein, P. N. 1998. Computing the edit-distance between
unrooted ordered trees. In ESA.</p>
          <p>Kolodner, J. L. 1992. An introduction to case-based
reasoning. Artificial Intelligence Review 6(1):3–34.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Aamodt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>1994</year>
          .
          <article-title>CBR: foundational issues, methodological variations and system approaches</article-title>
          .
          <source>AI Communications</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A study of software development life cycle process models</article-title>
          .
          <source>International Journal of Advanced Research in Computer Science</source>
          <volume>8</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Asiroglu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mete</surname>
            ,
            <given-names>B. R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yildiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nalcakan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sezen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dagtekin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Ensari,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Automatic HTML Code Generation from Mock-Up Images Using Machine Learning Techniques</article-title>
          .
          <source>In 2019 Scientific Meeting on Electrical-Electronics &amp; Biomedical Engineering and Computer Science (EBBT)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Beltramelli</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2017</year>
          . pix2code:
          <article-title>Generating Code from a Graphical User Interface Screenshot. 1-9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Bille</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>A survey on tree edit distance and related problems</article-title>
          .
          <source>Theoretical Computer Science</source>
          <volume>337</volume>
          (
          <issue>1-3</issue>
          ):
          <fpage>217</fpage>
          -
          <lpage>239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Carrascosa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bajo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Julian</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Corchado</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Botti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Hybrid multi-agent architecture as a real-time problem-solving model</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <fpage>2</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          , P.-C.;
          <string-name>
            <surname>Liu</surname>
            , C.-H.; and Lai,
            <given-names>R. K.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>A fuzzy case-based reasoning model for sales forecasting in print circuit board industries</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <fpage>2049</fpage>
          -
          <lpage>2058</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Burrell</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2001</year>
          .
          <source>Case-Based Reasoning System and Artificial Neural Networks : A Review. Neural Comput &amp; Applic</source>
          <volume>10</volume>
          :
          <fpage>264</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Chou</surname>
            ,
            <given-names>J.-S.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Web-based CBR system applied to early cost budgeting for pavement maintenance project</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>36</volume>
          (
          <issue>2</issue>
          ):
          <fpage>2947</fpage>
          -
          <lpage>2960</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>De Renzis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Garriga</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Flores</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cechich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Zunino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Case-based Reasoning for Web Service Discovery and Selection</article-title>
          .
          <source>Electronic Notes in Theoretical Computer Science</source>
          <volume>321</volume>
          :
          <fpage>89</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Delir</given-names>
            <surname>Haghighi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Burstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Zaslavsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ; and Arbon,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Development and evaluation of ontology for intelligent decision support in medical emergency management for mass gatherings</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>54</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1192</fpage>
          -
          <lpage>1204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>Go´mez-</article-title>
          <string-name>
            <surname>Vallejo</surname>
          </string-name>
          , H.;
          <string-name>
            <surname>Uriel-Latorre</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sande-Meijide</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <article-title>Villamar´ın-</article-title>
          <string-name>
            <surname>Bello</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; Pavo´n, R.;
          <string-name>
            <surname>Fdez-Riverola</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Glez-Pen˜</surname>
            a,
            <given-names>D.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>A case-based reasoning system for aiding detection and classification of nosocomial infections</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>Decision Support Systems</source>
          <volume>84</volume>
          :
          <fpage>104</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          2016.
          <article-title>Machine learning in geosciences and remote sensing</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>Geoscience Frontiers</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Myznikov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Development of the Case-Based Approach of Web Interfaces Reverse Reengineering</article-title>
          .
          <source>Vestnik NSU. Series: Information Technologies</source>
          <volume>16</volume>
          (
          <issue>4</issue>
          ):
          <fpage>115</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Csallner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>P2A: A Tool for Converting Pixels to Animated Mobile Application User Interfaces</article-title>
          .
          <source>In 2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft)</source>
          ,
          <fpage>224</fpage>
          -
          <lpage>235</lpage>
          . Gothenburg, Sweden: IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          Navarro-Ca´ceres, M.; Rodr´ıguez, S.;
          <string-name>
            <surname>Bajo</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Corchado</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Applying case-based reasoning in social computing to transform colors into music</article-title>
          .
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>72</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Csallner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Reverse engineering mobile application user interfaces with remaui (t)</article-title>
          .
          <source>In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)</source>
          ,
          <fpage>248</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Ram</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and Santamar´ıa,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>1997</year>
          .
          <article-title>Continuous case-based reasoning</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>90</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>25</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Sartori</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mazzucchelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Gregorio</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>Bankruptcy forecasting using case-based reasoning: The CRePERIE approach</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>64</volume>
          :
          <fpage>400</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Schank</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>1982</year>
          .
          <article-title>Dynamic memory: A theory of reminding and learning in computers and people</article-title>
          . Cambridge: Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Fan,
          <string-name>
            <surname>H.</surname>
          </string-name>
          ; Wu,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ; and Zhang,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>An integrated system of text mining technique and case-based reasoning (TM-CBR) for supporting green building design</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>Building and Environment</source>
          <volume>124</volume>
          :
          <fpage>388</fpage>
          -
          <lpage>401</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Q. H.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K. Y.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches</article-title>
          .
          <source>Knowledge-Based Systems</source>
          <volume>57</volume>
          :
          <fpage>41</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>In Proceedings Eighth Symposium on String Processing and Information Retrieval</source>
          ,
          <fpage>212</fpage>
          -
          <lpage>219</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Watson</surname>
            ,
            <given-names>I. A. N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Marir</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>1994</year>
          .
          <article-title>Case-based reasoning : A review</article-title>
          .
          <source>The Knowledge Engineering Review</source>
          <volume>9</volume>
          (
          <issue>4</issue>
          ):
          <fpage>327</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Zeyen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; Mu¨ller, G.; and Bergmann,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2017</year>
          .
          <source>Conversational Process-Oriented Case-Based Reasoning</source>
          . Springer, Cham.
          <fpage>403</fpage>
          -
          <lpage>419</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>