<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Flexible Integration of Any Parts from Any Web Applications for Personal Use</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hao Han</string-name>
          <email>han@tt.cs.titech.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junxia Guo</string-name>
          <email>guo@tt.cs.titech.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takehiro Tokuda</string-name>
          <email>tokuda@tt.cs.titech.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Tokyo Institute of Technology Meguro</institution>
          ,
          <addr-line>Tokyo 152-8552</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>69</fpage>
      <lpage>80</lpage>
      <abstract>
        <p>Mashup has brought new creativity and functionality to Web applications by the integration of Web services from different Web sites. However, most existing Web sites do not provide Web services currently, and the Web applications are more widely used than Web services as a method of information distribution. In this paper, we present a method to integrate any parts from any Web applications for personal use. For this purpose, we propose a flexible integration method by the description and extraction of Web application contents. Our implementation shows that we can integrate any parts easily from not only the ordinary static HTML pages but also the dynamic HTML pages containing Web contents dynamically generated by client-side scripts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>defined WACDL (Web Application Contents Description Language), an
XMLbased language that provides a model for describing Web application contents, to
configure the locations and scopes of target contents from Web applications. We
also constructed an extraction and integration system, which extracts the
target contents and executes the contents integration to generate the mashup Web
applications. Our implementation shows that we can integrate any parts easily
from not only the ordinary static HTML pages but also the dynamic HTML
pages containing Web contents dynamically generated by client-side scripts.</p>
      <p>The organization of the rest of this paper is as follows. In Section 2 we give
the motivation of our research and an overview of the related work. In Section
3 we construct an example of mashup Web application, and explain our Web
application contents description and integration system in detail. We give an
evaluation of our approach in Section 4. Finally, we conclude our approach and
give the future work in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation and Related Work</title>
      <p>
        Most integration technologies are based on the combination of Web services or
Web feeds. Yahoo Pipes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Microsoft Popfly [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are the composition tools
to aggregate, manipulate, and mashup Web services or Web feeds from different
Web sites with a graphical user interface. Mixup [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] can quickly build complex
user interfaces for easy integration by using the Web service APIs available.
Mashup Feeds [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] supports integrated Web service feeds as continuous queries.
It creates the new services by connecting the Web services using join, select and
map operations. Like these methods, Google Mashup Editor [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is also limited
to the combination of existing Web services or Web feeds.
      </p>
      <p>
        For the integration of parts from Web applications without Web service APIs,
the partial Web page clipping method is widely used. The users clip a selected
part of Web page, and paste it into a personal Web page. Internet Scrapbook
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a tool that allows users to interactively extract components of multiple
Web pages by clipping and assembles them into a single personal Web page.
However, the extracted information is a part of static HTML document and
the users can not change the layout of the extracted parts. C3W [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] provides
an interface for automating data flows. With C3W, the users can clip elements
from Web pages to wrap an application and connect wrapped applications using
spreadsheet-like formulas, and clone the interface elements so that several sets
of parameters and results may be handled in parallel. However, it does not
appear to be easy to realize the interaction between different Web applications
and needs a special Web browser. Extracting data from multiple Web pages
by end-user programming [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is more suitable to generate mashup applications
at client side. Marmite [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], implemented as a Firefox plug-in using JavaScript
and XUL, uses a basic screen-scraping operator to extract the content from
Web pages and integrate it with other data sources. The operator uses a simple
XPath pattern matcher and the data is processed in a manner similar to Unix
pipes. However, these methods can only extract Web contents from static HTML
pages as Mashroom [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and Dapper [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. MashMaker [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is a tool for editing,
querying, manipulating and visualizing continuously updated semi-structured
data. It allows users to create their own mashups based on data and queries
produced by other users and by remote sites. However, they do not appear to
support the integration of dynamically generated Web pages like the result pages
from form-based query.
      </p>
      <p>These current methods are based on the existing Web service APIs or Web
feeds, or need the end-user programming, or have other limitations. They have
realized the integration of Web applications to some extent, but still can not
extract and integrate the Web contents dynamically generated by client-side scripts
that become more and more in Web applications with the development of Web
2.0. To address these problems, we propose a novel approach to integrate any
parts from any Web applications by the description and extraction of the target
Web application contents. Compared with the developed work, our approach has
the following features.</p>
      <p>– We extend the range of Web contents from Web services to the general Web
applications. Any parts from any Web applications are available.
– We propose WACDL to describe the target parts of Web applications. The</p>
      <p>WACDL file is generated easily and does not need the programming.
– We integrate any parts from not only the ordinary static HTML pages but
also the dynamic HTML pages containing Web contents dynamically
generated by client-side scripts.</p>
      <p>We explain our approach by constructing an example of mashup Web
application, and give an evaluation in the following sections.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Web Application Contents Description and Integration</title>
      <p>Our integration is based on the description and combination of target parts of
Web applications. In our approach, the target parts of Web applications are the
visible contents in the Web pages such as the text, link, graph, video, flash and
etc. As shown in Figure 1, the whole process includes the following steps.
1. We describe the target parts of Web applications in a WACDL file.
2. We get the request from client side and send it to the target Web applications.</p>
      <p>We search for the target parts from the response Web pages according to
the description in WACDL.
3. We extract the contents and control the visibility of them.
4. We integrate the extracted contents and arrange their layouts to generate a
resulting page of mashup Web application.</p>
      <p>We generated an example of mashup Web application. It integrates the parts
from the following five Web applications and realizes the search function of
country information. As shown in Figure 2, after the users input the country
name and send the request, the mashup Web application sends the request to
each target Web application and receives the response Web pages. It searches for
the target parts from the Web pages and shows them in an integrated resulting
Web page.</p>
      <p>– Part A: Country name and country flag from Country Fast Facts of CBS</p>
      <p>
        News [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
– Part B: Weather information from Weatherbonk [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which is a mashup
application integrated by weather service and Google Maps service. The part
of weather information is created by client-side scripts, which can respond
to click and span events.
– Part C: The country’s location, basic information and leader’s photo from
      </p>
      <p>
        BBC Country Profiles [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
– Part D: The latest corresponding news articles from BBC News [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
– Part E: Pictures from Trippermap [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] shown with the map, which can
respond to click event and show the relevant pictures.
      </p>
      <p>We explain our integration approach based on the actual generation process
of this example.
3.1</p>
      <sec id="sec-3-1">
        <title>Web Application Contents Description Language</title>
        <p>We need a model to describe Web application contents if we want to use their
functionalities and contents like a Web service. Compared with the Web services,
it is not easy to use Web applications by the end-user programming. Without
the interface like SOAP or REST, we have to use the extraction and emulation
technologies to interact with Web applications. The extraction is used to find
and extract the target contents from Web pages, and the emulation is used to
realize the process of sending request and receiving response.</p>
        <p>We propose Web application contents description language (WACDL). It is
XML-based as shown in Figure 3, and used to describe the necessary information
for the extraction and emulation. A WACDL file represents a configuration of
mashup Web application and includes the following items for each target Web
application.</p>
        <p>– StartPage: StartPage is a Web page of target Web application. From this
StartPage, the request of end-user is submitted. The value of StartPage is
the URL of Web page.
– InputArea: InputArea is the position information of request-input element
in the StartPage. If there are other elements with the same InputType in a
StartPage, we have to define the InputArea. For example, we need to select
one InputBox as the request-input element if there are more than one
InputBoxes in StartPage. The value of InputArea is the XPath-like expression
of request-input element. Otherwise, the InputArea value is set as null.
– InputType: InputType is the type of request-input element in the StartPage.</p>
        <p>
          Usually, the value is InputBox (text input field), or OptionList (drop-down
option list in selectbox), or LinkList (anchor list).
– ContentArea: ContentArea is the position information of target contents in
the response Web page and used by the extraction. The value of ContentArea
is the XPath-like expression of target parts.
– ContentType: ContentType is the type of target contents. There are two
types of contents in a Web page: static Web contents and dynamic Web
contents. The static Web contents are the unchangeable parts shown on the
Web pages after the pages are fully loaded and during the viewing process.
They include two kinds of information: property and structure. Property
is text, image, link or object. Text is the character string in Web pages
such as an article. Image is one instance of the graph. Link is a reference
in a hypertext document to another document or other resource. Object
is one instance of the video or other multimedia file. Structure is single
occurrence or continuous occurrence. A single occurrence is a part without
similar ones such as the title of an article. A continuous occurrence is a
list of parts with similar ContentArea values such as result list in a search
result page. The dynamic Web contents are the parts dynamically generated
or changed by client-side scripts in dynamic HTML pages according to the
users’ operations.
– ContentStyle: ContentStyle is the layout of target contents in the integrated
resulting Web page. It is limited to the static Web contents usually. For
the static Web contents, the extraction results are in XML format and the
ContentStyle refers to XSLT [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] files defined by end-user. For the dynamic
Web contents, the extracted parts are shown in their original styles and the
ContentStyle value is null.
        </p>
        <p>
          These six items describe how to get and integrate the target Web contents.
Like a batch file, each WACDL represents a series of Web contents from different
sources. Table 1 gives the description of our example mashup Web application
shown in Figure 2. We developed Path Reader [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], a tool to read the path of
target part by GUI, which is modified from Mouseover DOM Inspector [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. The
users can get the paths easily by mouse clicking the target parts, and do not
need to read the HTML source codes manually.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Target Parts Searching</title>
        <p>According to the description in WACDL file, we search for the target Web
contents from the Web applications. There are two steps during this process. First,
we get the response Web pages as the target Web pages. Then, we search for the
target parts in the response Web pages.</p>
        <p>
          Web applications provide the request-submit functions for the users. For
example, search engine applications provide the text input field in the Web page
for keywords inputting by the users. The users give the query keywords and
submit the requests to server sides. There are three basic types of methods to
send requests and get the response Web pages. The first type is to click an
option in drop-down list of selectbox in a Web page by mouse to view a new
Web page. The second type is to enter the query keywords into a form-input field
by keyboard and click the submit button by mouse to send the query. The third
type is to click a link of link list in a Web page by mouse to go to the target Web
page. For the request submitting, there are POST and GET method, and some
Web sites use the encrypted codes or randomly generated codes. In order to get
the response Web pages from all kinds of Web applications, we use HtmlUnit
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to emulate the submitting operation instead of URL templating mechanism.
The emulation is based on the event trigger of the element of InputType within
the InputArea of StartPage as follows.
        </p>
        <p>– In the case of InputBox, the text input field is found according to the
InputArea and the query keywords are inputted. Then the click event of the
submit button is triggered to send the request and get the response Web
page.
– In the case of LinkList, the text contained inside each link tag within
InputArea is compared with keyword until the matched one is found. Then the
click event of link is triggered to get the target Web page.
– In the case of OptionList, the text of each option within InputArea is
compared with keyword until the matched one is found. Then the select event
of option is triggered to get the target Web page.</p>
        <p>
          ContentArea is used to find the target parts from the Web page. In the tree
structure of HTML document, each path represents a root node of subtree and
each subtree represents a part of Web page. Usually, the response Web pages have
the same or similar layouts if the requests are sent to the same request-submit
function. During the node searching, if a node can not be found by a path, the
similar paths would be used to try searching for the node. The definition and
usage of similar path are described in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] in detail.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Contents Extraction and Visibility Control</title>
        <p>
          The target Web contents are mixed with other unnecessary elements such as the
advertisements in a Web page, and shown in different fonts, sizes or colors if they
come from different Web applications. In order to get a well designed resulting
page, the users may define a customizable layout for the Web contents. After the
target parts are found, the Web contents are extracted from the nodes in text
format excluding the tags of HTML document according to the corresponding
ContentType for the static Web contents. The detailed extraction algorithm can
be found in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. The extracted static contents are in XML format, and would
be transformed into HTML document by ContentStyle.
        </p>
        <p>For the dynamic Web contents, we use a novel hide-and-display method to
control their visibility instead of the static contents extraction method because
we need to keep the functionalities of client-side scripts. The scripts use DOM
operation to control the dynamic parts of Web pages usually, and sometimes
access the elements outside the target parts such as the hidden values. If we
remove the other parts from the target parts, the original execution environment
of scripts would be broken and the scripts could not run normally. Here, we keep
all the parts of each Web page and change the visibility according to the following
steps in order to show the target parts and hide the other parts of Web page.
1. We create a node list L, and push the nodes found in Section 3.2 into L.
2. We create a node list L′, and push the ancestor nodes and descendant nodes
′
of each node in L into L .
3. We push all the nodes in L′ into L.
4. We hide all the parts of target Web page by setting the property ”display”
of attribute ”style” of all the nodes to ”none” (style.display=”none”) except
the nodes in L.
5. We modify the HTML source to accelerate the Web page loading procedure.</p>
        <p>It is not necessary to load the external files such as the image files or video
files if they are not within the target Web contents. These files are not shown
in resulting Web page and would cost the loading time. For example, we set
the attribute ”src” of &lt;img&gt; to null and the image files would not be loaded.
6. We add the &lt;base&gt; between the &lt;head&gt; and &lt;/head&gt; tag. The value of
attribute ”href” is the URL of the target Web page. By adding the &lt;base&gt;
tag, all files invoked by relative paths can be found correctly including the
external image files, flash files, script files and etc.</p>
        <p>By the Web contents extraction method and the hide-and-display method,
we can get any parts from any Web applications, and maintain the functionalities
of dynamic Web contents.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4 Integration and Layout Arrangement</title>
        <p>
          Finally, we integrate the parts from different Web applications in a resulting
page. We use iframe [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] as the Web content container of each part.
        </p>
        <p>IFrame (inline frame) is an HTML element which makes it possible to embed
an HTML document into another HTML document. While regular frames are
typically used to logically subdivide the Web contents of one Web page, iframes
are more commonly used to insert Web contents from another Web site into the
current Web page. Moreover, iframe is supported by all popular browsers.</p>
        <p>According to the WACDL file, we create as many iframes as the number of
&lt;target&gt; tags when the resulting page is loading. We show the Web contents
from each Web application in an iframe. Each iframe runs in an independent
manner, and does not exchange the data or events between each other. When
the users click a link in a iframe, a new window pops up to show the target
contents if the target of this link is another Web page or document. Our iframe
supports the layout arrangement of users. In the resulting Web page, end-users
can move iframes by dragging and dropping operations to adjust the locations as
shown in Figure 4, which is more compact than the default layout arrangement
of iframes in Figure 2.
Our mashup Web application is constructed by the integration of our Web
contents extraction method and hide-and-display method. It is developed by Java
and JavaScript, and works well on the Internet Explorer 7 and JDK 1.6 under
Windows XP SP3 and Windows Vista SP1.</p>
        <p>
          We integrated various parts from different kinds of Web applications, and
prove our approach is applicable to the general Web applications [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] including
the CNN News [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], Wikipedia [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and Yahoo Finance [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. However, the
emulation of HtmlUnit is slow for some Web sites and costs more time than URL
templating mechanism.
        </p>
        <p>Our extraction method is based on the fact that the response Web pages
from the same Web application use the same or similar layouts. If the response
Web pages use the different layouts, the extraction precision would become low
because the paths of the target parts vary with the layouts of Web pages.
Moreover, if the layout of the Web page is updated, the users have to change the
value of ContentArea in WACDL file. Our WACDL file is still manually
generated now. The users have to analyze the structure of target Web applications
and fill the corresponding information for each item though they do not need to
read the source codes of HTML documents.</p>
        <p>Although we add the &lt;base&gt; to deal with the relative paths, unfortunately,
the client-side scripts of some Web applications use the relative path as the
parameter of function as follows, and the external file can not be loaded correctly.
var flashfile = new ObjectLoad("flash.swf", "300", "400");</p>
        <p>Our integration approach makes it possible for end-users with no or little
programming experience to implement the integration of Web contents from
various Web applications without Web service APIs. The range of Web contents
are extended from Web services to the general Web applications. Any parts from
any Web applications are available, not only the ordinary static HTML pages but
also the dynamic HTML pages containing Web contents dynamically generated
by client-side scripts, even the parts from mashup Web application. Compared
with the programming, the WACDL is easy to read, write and update.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we have presented a novel approach to integrate any parts from
any Web applications for personal use. Our approach uses the WACDL to
describe the Web application contents and functionalities, and realizes the
integration by Web contents extraction method and hide-and-display method. By
our extraction and integration system, the users can construct the mashup Web
applications without the programming.</p>
      <p>As future work, we will modify our approach to propose a friendly GUI for
users to generate the WACDL file more easily. Moreover, we would like to explore
more flexible ways of integration of Web applications, Web services and other
Web contents. Additionally, besides the current developed Java-based emulation
and extraction system, we will develop a JavaScript-based system in future.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Google</given-names>
            <surname>Maps</surname>
          </string-name>
          <string-name>
            <surname>API</surname>
          </string-name>
          : http://code.google.com/apis/maps/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>YouTube</given-names>
            <surname>Data</surname>
          </string-name>
          <string-name>
            <surname>API</surname>
          </string-name>
          : http://code.google.com/apis/youtube/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. CNN: http://www.cnn.com.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>4. BBC Country Profiles: http://news.bbc.co.uk/2/hi/country profiles/.</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>5. Yahoo Pipes: http://pipes.yahoo.com/pipes/.</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>6. Microsoft Popfly: http://www.popfly.com.</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benatallah</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saint-Paul</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Casati</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daniel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matera</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A framework for rapid integration of presentation components</article-title>
          .
          <source>In: The Proceedings of the 16th International Conference on World Wide Web</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tatemura</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sawires</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Po</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Candan</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goveas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Mashup feeds: Continuous queries over Web services</article-title>
          .
          <source>In: The Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>9. Google Mashup Editor: http://editor.googlemashups.com.</mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Koseki</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugiura</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Internet scrapbook: Automating Web browsing tasks by demonstration</article-title>
          .
          <source>In: ACM Symposium on User Interface Software and Technology</source>
          . (
          <year>1998</year>
          )
          <fpage>9</fpage>
          -
          <lpage>18</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fujima</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lunzer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornbaek</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanaka</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>C3W: clipping, connecting and cloning for the Web</article-title>
          .
          <source>In: The Proceedings of the 13th International World Wide Web conference</source>
          . (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Han,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Tokuda</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>A method for integration of Web applications based on information extraction</article-title>
          .
          <source>In: The Proceedings of the 8th International Conference on Web Engineering</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>J.I.</given-names>
          </string-name>
          :
          <article-title>Making mashups with marmite: Towards end-user programming for the Web</article-title>
          .
          <source>In: The Proceedings of the SIGCHI Conference on Human factors in computing systems. (</source>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            , S., Han,
            <given-names>Y</given-names>
          </string-name>
          .:
          <article-title>Mashroom: end-user mashup programming using nested tables</article-title>
          .
          <source>In: The Proceedings of the 18th International Conference on World Wide Web</source>
          . (
          <year>2009</year>
          )
          <fpage>861</fpage>
          -
          <lpage>870</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>15. Dapper: http://www.dapper.net.</mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ennals</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garofalakis</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>MashMaker: Mashups for the masses</article-title>
          .
          <source>In: The Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>17. CBS News: http://www.cbsnews.com/stories/2007/08/30/country facts/ main3221371.shtml.</mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>18. WeatherBonk: http://www.weatherbonk.com.</mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>19. BBC News: http://www.bbc.co.uk.</mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>20. Trippermap: http://www.trippermap.com.</mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>21. XSL Transformations: http://www.w3.org/TR/xslt20/.</mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Guo</surname>
            , J., Han,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tokuda</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A new partial information extraction method for personal mashup construction</article-title>
          .
          <source>In: The Proceedings of the 19th European - Japanese Conference on Information Modelling and Knowledge Bases</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mouseover</surname>
          </string-name>
          DOM Inspector: http://slayeroffice.com/content/tools/modi.html.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>24. HtmlUnit: http://htmlunit.sourceforge.net/.</mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. Han,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Tokuda</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>WIKE: A Web information/knowledge extraction system for Web service generation</article-title>
          .
          <source>In: The Proceedings of the 8th International Conference on Web Engineering</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>26. iframe: http://en.wikipedia.org/wiki/iframe.</mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>27. Wikipedia: http://www.wikipedia.com.</mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>28. Yahoo Finance: http://finance.yahoo.com.</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>