<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Semi-automatic annotation of e-shops</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Peter Gurský, Matej Perejda ,Dávid Varga Institute of Computer Science Faculty of Science</institution>
          ,
          <addr-line>P.J.Šafárik University in Košice Jesenná 5, 040 01 Košice</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2203</volume>
      <fpage>152</fpage>
      <lpage>156</lpage>
      <abstract>
        <p>Extraction from web pages becomes a very popular way to acquire important data for better decision process. Acquisition of structured data from a new web portal requires an annotation of web pages of the portal to allocate the location and the type of the information. We present methods for semi-automatic annotation of e-shops' content, to create rules for extraction. The methods are implemented in Chrome extension named Exago. The aim of these methods is to generate XPaths and regular expressions. We use positive and negative examples to further specify which of the generated XPaths should be used for extraction. The annotation methods are tested on real data and the results show a high success rate.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Web scraping is a complex process that consists of web
pages annotation, crawling, data extraction and data
processing. This kind of data acquisition is very valuable
for decision process, because it can provide data from
various sources and process them together. Project Kapsa
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] deals with extraction and unification of information
from web pages, focusing on products on e-shops. The aim
of the project is a creation and management of a collection
of products which are offered by e-shops. In this paper, we
focus on the first step of the web scraping process, the web
pages annotation.
      </p>
      <p>The annotation has two main goals: recognition of
relevant page on portal and identification of positions of
relevant pages, where the data of our interest are.</p>
      <p>The positions of relevant data are usually specified by
XPaths of HTML source or by regular expressions, which
are used by extractor on each relevant page. It is also
possible to extract data using a procedural script. Writing
complex XPaths or regular expressions, as well as a
creation of scripts is not an easy task. It requires the
annotator to be an IT expert.</p>
      <p>
        Our goal is to make the annotation process of e-shops
easier and possible for ordinary person. We would like to
offer in our Chrome extension Exago[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] satisfactory results
of web scraping with annotation made only by mouse
clicking.
      </p>
    </sec>
    <sec id="sec-2">
      <title>State of the art</title>
      <p>
        Web annotation and extraction systems can be
categorized to four groups [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Manual systems [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]
require programming in some (pseudo) language.
Automatically constructed extractors [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] create
extraction system based on complete user annotation and
examples of extracted data from several pages.
Automatically constructed extractors with partial user
support [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] create extraction system without the need of
extraction examples. Automatic extractors with no user
support [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] analyze repeating patterns on web pages,
and extract every data that seem to be interesting. Our tool
Exago can be included in automatically constructed
extractors with partial user support.
      </p>
      <p>
        Currently there are more than 50 web scrapers available
on the internet. We have tried to use all of them to product
data extraction from two e-shops Alza.sk and Heureka.sk.
The majority of them were not able to extract the product
data. The web scrapers that are at least partially applicable
to product data extraction are [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref20 ref21 ref22 ref23 ref24 ref25 ref26 ref27 ref28">11-28</xref>
        ], so we examined
these web scrapers more deeply. Majority of these tools
provide the generation of one XPath by clicking on element
on the page. Some tools hide this functionality and do not
show the final XPath [
        <xref ref-type="bibr" rid="ref12 ref18 ref19 ref21 ref22 ref24 ref25 ref26">12, 18, 19, 21, 22, 24, 25, 26</xref>
        ], the
others [
        <xref ref-type="bibr" rid="ref11 ref13 ref15 ref17 ref20 ref23 ref27 ref28">11, 13, 15, 17, 20, 23, 27, 28</xref>
        ] allow the
modification of the XPath manually. The rest of them [
        <xref ref-type="bibr" rid="ref14 ref16">14,
16</xref>
        ] provides manual insertion of XPath only. None of the
tools provide regular expression generation, but some of
them allow writing regular expressions manually.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods for annotation</title>
      <p>Having an HTML element containing the relevant data
we can easily create an XPath as a path from the root
element. The XPath points to the target element and it is
used during the extraction process on this page. However
on the similar page (e.g. the page about different product on
e-shop), the created XPath can be unsuccessful – either it
finds no element or an element with different kind of data,
because the HTML source tree can be slightly different
(e.g. missing subtree, different highlighting of texts etc.)
than the one of the annotated product.</p>
      <p>Fortunately, many XPaths can be created that point to the
same element. The tree navigation of XPaths can be based
on elements’ attributes, order of element between its
siblings, various conditions and more. Choosing the right
navigation of XPath increases the success rate of extraction
considerably. The problem is that an unexperienced user is
not able to choose the right XPath from the hundreds of
possibilities. Therefore we provide semi-automatic XPath
and regular expression generation and interactive selection
of the right rules to make the annotation process easier.
3.1</p>
      <sec id="sec-3-1">
        <title>XPath generation</title>
        <p>XPath is a query language that is used to select nodes of
XML DOM model. All browsers create a DOM model out
of HTML file as the first step of page processing. XPath
language provides various approaches to specify a starting
node(s) and traversal of the DOM model. Creator of XPath
expression can utilize tag names, tag attributes and their
values, order of the elements, navigation functions, and
conditions with build-in functions. Such variability allows
many XPaths to localize the same node.</p>
        <p>Annotator’s goal is to specify the position of relevant
data that can be universal for all pages of the same type
(created from the same HTML template). In our case, we
focus primarily on pages, where the details of the products
are located. Unfortunately, when the template is combined
with the structured data of products, to create final detail
pages, the differences between result pages eventuate in
variety of HTML tree structures. They can vary in element
attributes as well as in absence of whole subtrees. Such
differences cause many XPaths, which work on one page,
fail on other page. They can point to different or no nodes,
while the corresponding data is still present somewhere on
the page.</p>
        <p>It is impossible to know, which parts of the template are
on all result pages, without complex analysis of pages,
because the templates are not public. Therefore we don’t
know which elements or attributes can be used as
navigation points of universal XPath. Annotation experts
usually examine various HTMLs of result pages and create
the universal XPath manually.</p>
        <p>&lt;h1&gt;Samsung 24FDX&lt;/h1&gt;
&lt;h2&gt;Specification&lt;/h2&gt;
&lt;table id="product parameters"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="color:powderblue;"&gt;Producer&lt;/td&gt;
&lt;td style="color:pink;"&gt;Samsung&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="color:powderblue;"&gt;Model&lt;/td&gt;
&lt;td style="color:pink;"&gt;24FDX&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</p>
        <p>
          Our approach generates several possible XPaths as
possible candidates to final universal XPath. Consider that
annotator wants to create an XPath leading to element with
value “24FDX” in HTML source on Figure 1. In this
example, the result of the method used for generating
XPaths is a set of 36 different correct XPaths, which point
to the same element. The shortest ones of them are listed
below:
 //tr[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]/td[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
 //tr[last()]/td[last()]
 //tr[last()]/td[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
 //tr[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]/td[last()]
 //tr[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]/*[@style="color:pink;"]
 //tr[last()]/*[@style="color:pink;"]
        </p>
        <p>XPaths are generated gradually along the path from the
clicked element to a specified root element occurs. If there
is no root element specified, the root element of HTML
document is chosen to be the element where generation
stops. At every current element during the generation, all
attributes, the name and the order between siblings of this
element are combined to create different XPaths.</p>
        <p>
          The extractor process, which extracts the data from all
similar pages, needs only one XPath per each value
position. Our approach in Exago tool [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] chooses the
shortest XPath as the result candidate by default. The
annotation process is a process where annotator determines
whether the candidate XPath is the most universal one, to
be chosen for extraction.
        </p>
        <p>During the annotation process, the annotator can
navigate to other similar page (e.g. the page about other
product) and check out the success of the chosen XPath.
There are three possibilities:
1. The element is correctly found using the XPath and
no changes need to be done.
2. The XPath addresses no element and annotator can
mark the correct element as the positive example.
3. The XPath addresses different element and
annotator can mark the addressed element as the
negative example.</p>
        <p>When the annotator marks the positive or negative
example, by clicking on an HTML element, the method for
generating XPaths generates new set of XPaths to this
element. Let this set of XPaths be named as B, and the
original set as A. In case of the positive example, the new
set of XPaths, that work fine on both pages is A∩B. In case
of negative example, the new set is A-B.</p>
        <p>After some iterations of this procedure, the result set
contains only XPaths that work on all pages. The annotator
can choose any of them to be part of final extraction rule,
or just keep the default one. As it was mentioned before,
our Exago tool chooses the shortest XPath by default.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Regular expression generation</title>
        <p>XPath is a very capable language at locating the whole
element of HTML source. There are cases, when we want
to extract a value only from a part of the element, or a value
spreads across two or more elements. In this case, XPath is
unusable.</p>
        <p>In Exago, we combine XPaths and regular expressions, if
needed. XPath localizes the HTML part in which the
regular expression can be used. It is also possible, that the
XPath localizes more than one element, and regular
expression can be used in all of them to find out the target
value.</p>
        <p>Regular expression is an effective tool for extraction of a
substring based on complex conditions with quite
complicated syntax. Sometimes, even IT experts have a
hard time at constructing more complex regular
expressions.</p>
        <p>In Exago, we created regular expression editor (Fig. 2)
that generates multiple regular expressions using mouse
events. . The process of generating regular expressions can
be performed only by clicking on buttons in the editor and
highlighting the text needed for extraction. Therefore, the
editor allows a less experienced user to create regular
expressions, and see the result. On the other hand, there is
still a possibility to edit generated regular expressions or
write custom ones.</p>
        <p>Our editor supports two approaches. First, user can select
the target text and click on the first Generate button. Exago
generates several regular expressions, collected in the
combo box. The method that generates the expressions
tries to generalize the two most common text parts:
 spaces and other whitespace characters are
converted to expression \s or \s+,
 numbers are converted to \d+ or \d+((\.|,)\d+)?,
which covers also decimal numbers.</p>
        <p>The second approach to regular expression generation is
selection of prefix and suffix of the target value and hitting
the appropriate “Generate” button. In Figure 2, user
selected text “unit-17220&gt;” as the prefix and “&lt;/span&gt;” as
the suffix and generated the related regular expressions.
Using the chosen regular expressions, a combined
expression is created and written in the text field on the top
of the screen. The result of the regular expression search is
emphasized on the bottom with green background.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>In this section we analyze the accuracy of annotation of
our new version of Exago. The new version is using the
semi-automatic annotation based on positive and negative
examples and generating regular expressions. The tests
compare our approach with the usual approach of element
localization used in other web scrapers and our previous
version of Exago, i.e. generation of one XPath per value.</p>
      <p>This analysis has been done on 20 different e-shops.
During the testing phase, every annotation has been
realized only by clicking with mouse on HTML elements
and Exago components. No manual editing of XPaths and
regular expressions has been used, in order to compare the
old and the new approach. The table on Table 1 shows the
results of each annotation per e-shop ordered by success
rate.</p>
      <p>We measure following aspects for both approaches:
 successful annotation of elements – the number of
successfully annotated elements or values compared
to the number of all elements or values that could be
annotated,
 success rate – percentage of correctly addressed
elements and values among annotated elements and
values
 use of regular expressions – information about the
use of generated regular expressions during
annotation; with every use of regular expressions in
Exago, regular expressions have been used in
combination with XPaths.</p>
      <p>With the new approach, we have achieved success rate of
100 % in 8 internet shops. In remaining 12 e-shops we have
not been able to achieve the 100 % success rate because of
the following reasons:
 in 5 cases, the HTML structures of detail web pages
in each e-shop have varied too much,
 in 5 cases, lists of product parameters have been
divided into more elements unrelated to each other,
 in 3 cases, complications occurred during annotation
of images that used dynamic styles of their
presentation on a web page,
 in 3 cases, we have not been able to annotate prices
of products, because e-shops displayed sale prices in
different elements compared to non-sale prices,
whilst both of these prices have been present on a
web page,
 in 3 cases, we have not accomplished to annotate
product ratings represented by pictures without any
further information given, for example, amount of
stars awarded.</p>
      <p>Success rate average of annotation of information about
products has been 59% in the case of common approach.
With the new version of Exago we have achieved the
average success rate of 93%. This growth has been
achieved with the help of generating regular expressions
and the functionality of positive and negative examples.</p>
      <p>Some of the unsuccessful cases could be eliminated by
manual editing of XPaths and regular expressions. For
example, in the cases when internet shops presented more
lists of product parameters, we would be able to manually
create a regular expression that would address all types of
elements representing these lists.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>This paper deals with improvement of the commonly
used data localization process in web annotation by
implementing the semi-automatic annotation. We have
created a plug-in module Chrome Extension named Exago
containing this functionality. With the help of methods for
generating XPaths, generating regular expressions and the
functionality of positive and negative examples, we are
able to annotate more information on detail web pages of
products, compared to the previous version of Exago.</p>
      <p>During the annotation accuracy analysis, we have been
comparing the common approach with the new one, and
discovered that the success rate has grown from 59% to
93%. We consider this result as notable, but there is still a
room for improvement. One improvement could be creating
a new component, which would be able to distinguish
different types of detail web pages in one e-shop and use
appropriate XPaths and regular expressions for annotation.</p>
      <p>Another improvement could be more intelligent
generation of XPaths by which we would generate XPaths
faster and reduce the generation of very similar XPaths.</p>
      <p>Designing methods for annotation of product ratings
represented by pictures of stars or other objects would be
also very benefiting. The solution to this problem could be
counting matches of the regular expression addressing one
star in the element containing these stars.</p>
      <p>Adding a sale price as a new type of known value
component would enable us to annotate sale prices as well
as non-sale prices and grow our success rate per e-shop.</p>
      <p>The new version of Exago could also be improved by
solving a problem with image annotation, where images are
being displayed in different dynamic styles. The solution
could be some kind of a mechanism that would get all
pictures from a detail web page, show them to the user, and
the user would pick the image he wants. XPath addressing
this image would be generated and used automatically.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Project</given-names>
            <surname>Kapsa</surname>
          </string-name>
          , web page: http://kapsa.sk/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          :
          <article-title>Web Data Mining: Exploring Hyperlinks, contents and Using Data, Second edition</article-title>
          ,
          <source>Springer 2011. ISBN 978-3-642-19459-7</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Crescenzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Mecca: Grammars have exceptions</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>23</volume>
          (
          <issue>8</issue>
          ):
          <fpage>539</fpage>
          -
          <lpage>565</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Furche</surname>
          </string-name>
          , G. Gottlob, G. Grasso,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schallhart</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Sellers: OXPath: A language for sca-lable data extraction, automation, and crawling on the deep web</article-title>
          .
          <source>The VLDB Journal</source>
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <fpage>47</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Dung: Generating finite-state transducers for semi-structured data extraction from the Web</article-title>
          .
          <source>Information Systems</source>
          ,
          <year>1998</year>
          ,
          <volume>23</volume>
          (
          <issue>8</issue>
          ): p.
          <fpage>521</fpage>
          -
          <lpage>538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Muslea</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Minton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Knoblock: A hierarchical approach to wrapper induction</article-title>
          .
          <source>In Proce-edings of Intl. Conf. on Autonomous Agents (AGENTS-1999</source>
          )
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Chang</surname>
          </string-name>
          , S.-C.
          <article-title>Kuo: OLERA: A semi-supervised approach for Web data extraction with visual support</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>19</volume>
          (
          <issue>6</issue>
          ):
          <fpage>56</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hogue</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Karger: Thresher: Automating the Unwrapping of Semantic Content from the World Wide</article-title>
          .
          <source>Proceedings of the 14th International Conference on World Wide Web (WWW)</source>
          , Ja- pan, pp.
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Crescenzi</surname>
          </string-name>
          , G. Mecca, P. Merialdo:
          <article-title>RoadRunner: towards automatic data extraction from large Web sites</article-title>
          .
          <source>Proceedings of the 26th International Conference on Very Large Database Systems (VLDB)</source>
          , Rome, Italy, pp.
          <fpage>109</fpage>
          -
          <lpage>118</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Arasu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Garcia-Molina</surname>
          </string-name>
          :
          <article-title>Extracting structured data from Web pages</article-title>
          .
          <source>Proceedings of the ACM SIGMOD International Conference on Management of Data</source>
          , San Diego, California, pp.
          <fpage>337</fpage>
          -
          <lpage>348</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Content</given-names>
            <surname>Grabber</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://contentgrabber</article-title>
          .com/)
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Data</given-names>
            <surname>Miner</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://data-miner</article-title>
          .io/)
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Helium</given-names>
            <surname>Scraper</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://www</article-title>
          .heliumscraper.com/)
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Import</surname>
          </string-name>
          .io.
          <article-title>Web scraper available on-line: (https://www</article-title>
          .import.io/)
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Mozenda</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://www</article-title>
          .mozenda.com/)
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>ParseHub</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://www</article-title>
          .parsehub.com/)
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Scrapinghub</surname>
          </string-name>
          (Portia) .
          <article-title>Web scraper available on-line: (https://scrapinghub</article-title>
          .com/)
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Web</given-names>
            <surname>Scraper</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://webscraper</article-title>
          .io/)
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Agenty</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://www</article-title>
          .agenty.com)
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Data</given-names>
            <surname>Toolbar</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://datatoolbar</article-title>
          .com/)
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Dexi</surname>
          </string-name>
          .io.
          <article-title>Web scraper available on-line: (https://dexi</article-title>
          .io/)
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Easy</surname>
            <given-names>Web</given-names>
          </string-name>
          <string-name>
            <surname>Extract</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://webextract</article-title>
          .net/)
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Fminer</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://www</article-title>
          .fminer.com/)
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <article-title>GetData</article-title>
          .IO.
          <article-title>Web scraper available on-line: (https://getdata</article-title>
          .io/)
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Grepsr</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://www</article-title>
          .grepsr.com/chrome-extension/)
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Instant</surname>
            <given-names>Data</given-names>
          </string-name>
          <string-name>
            <surname>Scraper</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (https://webrobots</article-title>
          .io/instantdata/)
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Visual</surname>
            <given-names>Web</given-names>
          </string-name>
          <string-name>
            <surname>Ripper</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://visualwebripper</article-title>
          .com/)
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Web</given-names>
            <surname>Sundew</surname>
          </string-name>
          .
          <article-title>Web scraper available on-line: (http://www</article-title>
          .websundew.com)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>