<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tim Furche, Giorgio Orsi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Bozzon, Chiara Pasini, Luca</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oxford University, Department of Computer, Science</institution>
          ,
          <addr-line>Wolfson Building, Parks Road, Oxford, OX1 3QD</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tettamanti, Salvatore Vadacca, Politecnico di Milano</institution>
          ,
          <addr-line>Via Ponzio 34/5, 20133 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Thanks to the Web, access to an increasing wealth and variety of information has become near instantaneous. To make informed decisions, however, we often need to access data from many di erent sources and integrate di erent types of information. Manually collecting data from scores of web sites and combining that data remains a daunting task. The ERC projects SeCo (Search Computing) and DIADEM (Domain-centric Intelligent Automated Data Extraction Methodology) address two aspects of this problem: SeCo supports complex search processes drawing on data from multiple domains with a user interface capable of re ning and exploring the search results. DIADEM aims to automatically extract structured data from a domain's websites. In this paper, we outline a rst approach for integrating SeCo and DIADEM. We discuss how to use the DIADEM methodology to automatically turn nearly any website from a given domain into a SeCo search service. We describe how such services can be registered and exploited by the SeCo framework in combination with services from other domains (and possibly developed with other methodologies).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Recent years witnessed a paradigmatic shift in the way
people deal with information. The Web provides cheap
and ubiquitous access to an increasing wealth and variety
of data. Yet, making informed decisions, which often
require complex and articulated information retrieval tasks
involving access to information from many di erent sources,
remains a daunting task. Queries such as \Retrieve jobs
as Java Developer in the Silicon Valley, nearby a ordable
fully-furnished ats, and close to good schools" are,
unforThe research leading to these results has received funding
from the European Research Council under the European
Community's Seventh Framework Programme (FP7/2007{
2013) / ERC grant agreement no. 246858 (DIADEM) and
the 2008 Call for \IDEAS Advanced Grants" as part of the
Search Computing (SeCo) project.</p>
      <p>Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. This article was presented at:
WORKSHOP NAME.</p>
      <p>Copyright 2011.
tunately, not addressed by current search engines. From a
vast list of potential sources, it is left to the user to manually
extract and integrate the relevant data.</p>
      <p>The Search Computing (SeCo) project [1] aims at
building concepts, algorithms, tools, and technologies to
support complex Web queries, through a new paradigm
based on combining data extraction from distinct sources
and data integration by means of specialized integration
engines. Web data is typically published in two ways: as
structured (and possibly linked) data accessible trough Web
APIs (e.g. SPARQL, YQL, etc.), and as unstructured
resources (i.e. Web pages), possibly accessible only through
user-interaction such as form lling or link navigation.</p>
      <p>Unstructured data is typically accessible to
generalpurpose search engines, which exploits traditional
information retrieval techniques. To enable the consumption of such
data by automated processes, data accessible to humans
through existing Web interfaces needs to be transformed
into structured information: therefore, there is the need for
data extraction tools (e.g. screen scrapers); unfortunately,
the interactive nature of modern Web interfaces poses a big
challenge, as the dynamic nature of these user interfaces,
driven by client and server-side scripting, creates challenges
for automated processes to access this information.</p>
      <p>The DIADEM1 (Domain-centric Intelligent Automated
Data Extraction Methodology) project aims at developing
domain-speci c data extraction systems that take as input a
URL of a Web site in a particular application domain,
automatically explore the Web site, and deliver as output a
structured data set containing all the relevant information present
on that site. It is based on a novel, knowledge-driven
approach that combines low-level annotations with high-level
domain knowledge and sophisticated analysis rules encoding
common Web design patterns. The rst prototype for the
UK real-estate domain outperforms existing data extraction
tools and validates the premise that with a thin layer of
domain-speci c knowledge, nearly perfect automated data
extraction is feasible.</p>
      <p>Once a web site is analyzed, the DIADEM engine can
provide a one-time copy of all the data of that site, structured
according to the provided schema. Alternatively, an
extraction expression, formulated in OXPath [2], can be returned
that extracts all the data on-demand at high-speed.
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Motivations and Outline</title>
      <p>As users get acquainted with on-line search and
decision support systems, their information needs evolve, their
+,,-./(0.1)&amp;
21)3.*"$(0.1)&amp;</p>
      <p>411!"#$% &gt;-())#$
!"#$%&amp;'()(*#$
./"#%"+</p>
      <p>32&amp;@"
=&gt;"&amp;/,%*4')524+
?@#/"0.1) ?)*.)#
32&amp;@"
./"#-'
#"+/5,+
:)0#$)(- +&gt;:
5.6".7 !"#$.#8
9:</p>
      <p>2-.#)0&amp;
(,,-./(0.1)
./"#%"+'D'#"+/5,+
98#$ A$(B#C1$D</p>
      <p>32&amp;@"
!"#$% A$(B#C1$D</p>
      <p>32&amp;@"
;#$&lt;./#&amp;'($0 A$(B#C1$D</p>
      <p>32&amp;@"
;#$&lt;./#&amp;:)&lt;1/(0.1)
A$(B#C1$D 32&amp;E@"
("21'D'
:#%,"
("21'D'
:#%,"
("21'D'
:#%,"
("21'D'
:#%,"</p>
      <p>E))5%&amp;2,%*4
(")*+%,*#0+"# 12,2'
#")*+%,*#./"#(")*+%,*#!"#$%&amp;"'
(")*+%,*#;#$&lt;./#8
./"#%"+'D'#"+/5,+
?#&amp;@"+,#2,%*4
F"B"41
3*4,#*5'1")"41"4&amp;%"+'6/+"+7
82,2'95*:+'6;/"#%"+&lt;'#"+/5,+7
queries become moTroe aonbdtaminoarescpoemcipfilecxS,eaanrdchthCeior mdepmutainndg appli\cWatihoenr,e tahree ggeonoedral-purpose aarncdhitbeicntduirneg them to the
reschools?")
for correct and upodfatFeidgudraeta1iinscrceuassteosm.iWzehdilswtidthattaheexhterlapc-of toolsspteacrtgiveeterdelteovapnrtogdraatmasmoeurrsc,esexrepgeirstteurseedrsi,n the service mart
tion approaches suacnhd aesndDIuAseDrEs.M can greatly improve the repository; starting from this mapping, the query planner
quality of available• infoSremravtiicoen, PthuebnliesehdearrsisersefgoirstseyrsteSmesrvice pMroadrutcedseafninoitpitoinmsizewdiqthuienry tehxeecusteiornvipcelan, which dictates
and tools able to holistriecpalolysittoarcyk,le atnhde pdroebclleamre ofthceomcpolenxnectionthepasettqeurennsceuosafbslteepstoforjoeixnecuthteinmg.thTehqeuery. Finally, the
queries, while enablingruegseisrstratotiosenlepcrto,ceexspsloirseraenadlizceodmtbhinroeugh a eSxeercvuitcieonReenggisintreaaticotnuaTlloyoelxtehcautt:e1s)thheelpqusery plan, by
subdata sources in a customized way. A tight integration of mitting the service calls to designated services through the
DIADEM and SeCo catnheprpouvbidlieshaenrainnswtheer stpoescuicfhicanteieodn boyf the SsMer,viAcePinanvodcSatIioanttrfirbaumteewsoarnkd, bpuariladminegtetrhse query results by
combining high-precisiornesdpaetcatievxetlryactainodn, m2)ultiit-dhoimdaeisn steor- the ucsoemrbtihneingInttheernoaultpAutPsI,prtohdautceadllobwy setrhveice calls,
computvice integration, and excpolmormatuonriycasetiaornch binettewraecetnionth[3e]. sWerevices inagndthethgelobeanlgriannekinlegveolfs.queTrhyeressuerltvsi,ceand producing the
demonstrate how the dpautbaliesxhterrasctionarfeaciilnitiecshparrogveideodf byimplemqeunetriyngresmuletdoiauttoprust,s winraapnpeorsd,erotrhadtartea ects their global
DIADEM enable the dmataateinrtiaeglirzaattiioonnpcerofmorpmoendenintsS,esCoo atos to mraekleevadnactea. sources compatible with the
easily achieve novel, muSletri-vdiocmeaMinarsteastracnhdsaerrdviicnetseorfvaecrelaarngde expected behavior.
number of Web sit•es. Expert Users configure Search Computin3g. apAplUicTatOionMs,AbTyIseClectinDg AthTeAServiEceXTRACTION
The paper is organized as follows: Section 2 describes the</p>
      <p>Marts of interest, by choosing a data source suWppIoTrtHingDtIhAeSDeErvMice Mart, and by
search computing approach to information integration,
Section 3 presents the DIAcoDnEnMectainpgprtohaecmh tthoroduatgah ecxotnrancetciotino,n patternAs. fTrahmeyewaolsrok csounchfigausrSeetCheo caollmowpsletxhietyuser to search for
Section 4 discusses intoefgrtahteiounseisrsiunetse,rfSaeccet,ioinn t5ercmonscolufdceosntrolsoabnjdecctsonwfiitghuaragbivileitnyscphecoiicceastitoonbreatlheeftr ttohan just for
potenthe paper. the end user. tially relevant Web documents as keyword search engines.</p>
      <p>• End Users use Search Computing applicTatoiotnhsatceonndfi,gsutrreudctubryedexdpaetart iussreerqsu. iTrehde,ywhere objects and
2. WEB DATAinterIaNctTbEyGsRubAmTitItOingN queWrieIsT, HinspectitUnhgnefiorrreatsututnlrtaisbt,eultaye,nsdmaroresetfdiecnsoicnmrgimb/eeevdrociilnavlianWgweebtlhlseuiitnredsedrsotonoodt spcrhoevmidae.</p>
      <p>SEARCH COiMnfoPrUmTatiIoNnGneed according to an expltohreaitroroybjiencftosr(msuacthionas sjeoebkilnisgtinagp,pprrooapcehr,ties, or products)
Figure 1 shows an wovheircvhiewwe ocfaltlhLeiqSueiadrcQhueCroym[p4u].ting as structured data. This is particularly true for businesses
framework, whichSceoamrcphrisCesomsepveurtainlgsuba-ifmrasmeawtobrkusi.ldiTnhge two wnietwh licttolemtmecuhnniitcieasl exopf erutsiseer.s: Content
service descriptionprfroavmidewerosr,kw(ShDoFw)apnrtotvoidoersgathneizsecatheoilrd-content (Anouwto minattihcealflyortmurant ionfg deaxtiastcinoglleWcetibonsist,es into structured
ing for wrapping daantadbaresgeiss,tWerienbg pdaagteas)soinurocredseirntosemrvaikcee it avadialatbalheafsobreseenarmchosatclyceasns ubnyretahliirzdedpadrrteieasm, in the past.
Premarts, describing athned ienxfopremrtatuiosnersso,uwrcheos watadnit teoreonftfelrevneelsw servivcieosusbuaipltpbroyacchoemsptoosinfuglldy-oamutaoimn-astpeedcifdiacta extraction
adof abstraction. The user framework provides functionality dressed the problem by investigating general techniques that
and storage for regciostnetreinntg iunseorrsd,ewrittho dgioe"rbenetyoronldes" agnednecraa-l-purpcoasne bseearacphpleiendgitnoesansyucwheabs sGitoeo[g4l]e. aWnd.r.t. existing
appabilities. The query framework supports the management proaches, DIADEM is based on a fundamental observation:
and storage of queries as rst class citizens: a query can be if we combine knowledge about a domain (e.g., that a four
executed, saved, modi ed, and published for other users to gure price is more likely a rent price than a sales price in
see. The service invocation framework masks the technical real estate) with knowledge about the appearance of objects
issues involved in the interaction with the service mart, e.g., and search facilities in that domain (phenomenology), we
the Web service protocol and data caching issues. The core can automatically derive an extraction program for nearly
of the framework aims at executing multi-domain queries. any web page in the domain. The resulting program
proThe query manager takes care of splitting the query into sub- duces high precision data, as we use domain knowledge to
queries (e.g., \Which jobs as Java developer are available in improve recognition and alignment and to verify the
extracthe Silicon Valley?", \Where are a ordable, nearby ats?", tion program based on ontological constraints [5].</p>
      <p>DIADEM operates in two modes: in the analysis mode
a web site is scrutinized to nd relevant objects and search
forms and to understand how to extract all data from that
site. In the extraction mode, this knowledge is used to
extract all data at high speed, assuming that the site has not
changed fundamentally since the analysis.</p>
      <p>
        In analysis mode, DIADEM answers primarily three
questions: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) How do we have to navigate the site (e.g., by
clicking on links, following pagination links, etc.) to extract
all the results? (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Are there any forms to ll and how
to ll them to nd all results? (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) How are result records
and their attributes structured and displayed? For each of
these questions, DIADEM uses both domain-independent
heuristics encoding typical web design patterns and
domaindependent clues and high-level knowledge to locate speci c
objects and their attributes and to verify and align the
resulting structured data. Except for a thin browser
interaction layer and some o -the-shelf machine learning tools, the
whole process is encoded in logical rules maybe involving
probabilistic knowledge.
      </p>
      <p>Finally, all the collected models are passed to the OXPath
generator that uses simple heuristics to create a generalized
OXPath expression for use in extraction mode.</p>
      <p>To illustrate how DIADEM analyses a Web site, we
focus on result page analysis (the third question), see
Figure 2. First we extract the page model from a live
rendering of the Web page. This model logically represents the
DOM tree of the page along with information on the
visual rendering (e.g., CSS boxes), and linguistic annotations.</p>
      <p>The information provided by the browser model is mainly
domain-independent (e.g., DOM structure and CSS boxes)
while some of the linguistic annotations are generated by
domain-speci c gazetteers and rules. In the next step, we
locate mandatory attributes of the records that we expect
to nd on a web page of a given domain; then, we
proceed to the segmentation of the page into records through
domain-independent heuristics. The identi ed records are
then validated using a result-page model, see Figure 3.</p>
      <p>Not only HTML. In many domains non-HTML data makes
up a small, but signi cant part of the description of objects,
usually as PDF documents, but sometimes just as bitmap
images. Sometimes, this information is just supporting the
structured data (e.g., the pictures of a car an auto-trading
website); in other cases, however, these web resources carry
additional information that is not present in the structured
data and therefore cannot be accessed by either traditional
nor object search-engines.</p>
      <p>For instance, in almost all the UK real-estate Web sites,
users cannot search for an apartment by energy e ciency
or by size of the rooms despite this information is clearly
present on the websites. The reason is that the energy
ef</p>
      <p>ciency of a house is published as an EPC (Energy
Performance Certi cate) chart2 and the sizes of the rooms are
published in the oor-plan images.</p>
      <p>The automated extraction of this data is non trivial since
it might require computer vision and OCR techniques.
DIADEM addresses this problem by exploiting the knowledge
of the domain to improve existing image and PDF/PS
analysis techniques. As an example, the structure of the EPC
charts is standardized by a EU directive, therefore it is easy
to \reverse-engineer" their semantics. For PDF brochures,
it is possible to adopt analysis techniques similar to those
adopted for HTML, since the structure of such documents
is also reducible to few patterns that can be easily identi ed
by an automatic analysis.
4.</p>
    </sec>
    <sec id="sec-3">
      <title>TOWARD MULTI-DOMAIN, AUTO</title>
    </sec>
    <sec id="sec-4">
      <title>MATED WEB DATA CONSUMPTION</title>
      <p>Our approach for the integration of structured and
unstructured Web data sources is based on a service-oriented
vision of the resources. The source integration operates at
three levels: wrapping, registration, and invocation.</p>
      <p>Service wrapping consists in implementing appropriate
wrapping components that take care of invoking the
services and manipulating the input and output so as to be
consistent with the formats expected by the integration
platform. The SeCo platform natively supports generic Web
services, relational databases, YQL services, SPARQL
endpoints, etc. However, the system is open to support
additional data source types.</p>
      <p>We suggest two ways for integrating DIADEM data
sources into SeCo. In both cases, we assume that the schema
used in SeCo matches (a fragment of) the domain ontology
used in DIADEM. The rst, o -line approach extracts all
the data of a site contxtually with the analysis and stores it,
e.g., in an RDF database together with the domain ontology.
This database can be accessed as any other SPARQL
endpoint. The advantage of this approach is that it provides
very good query performance, but at the cost of storage
and consistency. In domains with fast changing data, the
database will often be outdated compared to the data on
the live web site.</p>
      <p>This de cit is addressed by the on-line approach, where
an OXPath expression is generated by the DIADEM
analysis and that expression is executed to extract the data at
query time. A slightly specialized OXPath invoker is needed
for this approach, as it needs to store the OXPath expression
together with possible parameters for form lling. OXPath
returns the extracted data in XML or RDF format
struc2wikipedia.org/wiki/Energy_Performance_Certificate
CachingInvoker
scheduler : ScheduledExecutorService
cache : Cache
wrappedInvoker
poolsize : Integer
connectionPools : Map&lt;String, BasicDataSource&gt;</p>
      <p>SPARQLInvoker
endpointURL : String
prefixes : Prefix [0..*]</p>
      <p>YQLinvoker
urlTemplate : String</p>
      <p>GoogleBaseInvoker
authorizationKey : String
urlTemplate : String
tured according to the SeCo schema. The latter is ensured
by the construction process in the analysis, where the SeCo
schema in form of the high-level DIADEM ontology is used
to verify the extraction expression.</p>
      <p>The disadvantage of this approach is that for large or
complex Web sites extraction may take too long for on-line
queries. This can be slightly alleviated by the high-level
caching provided in SeCo. In the future, we plan to
investigate techniques for incremental data extraction, where only
new data is extracted. This is also useful for the o -line
approach if frequent updates are desired.</p>
      <p>Service description in SeCo is based on the registration of
services within the Service Description Framework model,
which describes services at three levels of abstraction:
Service Marts (abstractions of several Web services dealing with
the same conceptual objects available on the Web such as
\ ights", \hotels", \restaurants"), Access Patterns (a
speci c signature of the Service Mart with the
characterization of each attribute as input, output, and/or ranking),
and service interfaces (a description of the invocation
interface of an actual source service)|leading from the
conceptual representation of Web objects to the implementation
of search services. If we combine SeCo with DIADEM, we
can easy instantiate service descriptions for any Website of
a domain. Starting from a description of conceptual objects
of a domain, shared between the SeCo service marts and
the DIADEM high-level ontology, DIADEM can
automatically recognize existing access patterns (by form analysis)
and translate them into SeCo service descriptions.</p>
      <p>Service execution is performed by an engine, which
exploits the Service Description Framework. The execution
engine consists of a runtime (a Panta Rhei [6] interpreter
able to translate an execution plan in a coordinated sequence
of service invocations) and a set of service invokers.
Lowlevel service invokers (one for each data source type,
including the one for on-line DIADEM sources) are implemented
and follow the chain of responsibility pattern (see Figure 4).
There is no need for a special invoker for o -line DIADEM
sources, as those reduce to SPARQL Invokers where the data
is the result of the o -line extraction. An high-level caching
invoker wraps the sequence of low-level invokers to read
results from the cache.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS</title>
      <p>
        Rich object search is one of the major challenges in Web
research. In this paper, we show how a combination of
SeCo and DIADEM has the potential to address the
major challenges involved in object search: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the integration
of multi-domain data sources including an easy interface for
formulating and re ning expressive, multi-domain queries.
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the automatic extraction of highly accurate, structured
data from most existing web sites.
      </p>
      <p>We plan to further investigate the integration of SeCo
and DIADEM. In particular, a further alignment of the
conceptual descriptions, access patterns, and service
interfaces would be useful. We are currently investigating the
automatic extraction of rich access patterns and integrity
constraints from existing Web forms. We also plan to
develop techniques for incremental data extraction to allow the
wrapping of time-sensitive services.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ceri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brambilla</surname>
          </string-name>
          , M., eds.
          <source>: Search Computing Trends and Developments</source>
          . Volume
          <volume>6585</volume>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Furche</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grasso</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schallhart</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sellers</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Oxpath: A language for scalable, memory-e cient data extraction from web applications</article-title>
          . In: VLDB. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bozzon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brambilla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fraternali</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Liquid query: multi-domain exploratory search on the web</article-title>
          .
          <source>In: Proceedings of the 19th international conference on World wide web. WWW '10</source>
          , New York, NY, USA, ACM (
          <year>2010</year>
          )
          <volume>161</volume>
          {
          <fpage>170</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Kayed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kayed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girgis</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaalan</surname>
            ,
            <given-names>K.F.</given-names>
          </string-name>
          :
          <article-title>A survey of web information extraction systems</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>18</volume>
          (
          <issue>10</issue>
          ) (
          <year>2006</year>
          )
          <volume>1411</volume>
          {
          <fpage>1428</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Furche</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al.:
          <article-title>Real understanding of real estate forms</article-title>
          .
          <source>In: WIMS '11</source>
          , New York, NY, USA, ACM (
          <year>2011</year>
          )
          <volume>13</volume>
          :
          <fpage>1</fpage>
          {
          <fpage>13</fpage>
          :
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Braga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcoglioniti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grossniklaus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vadacca</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Panta rhei: Optimized and ranked data processing over heterogeneous sources</article-title>
          .
          <source>In: ICSOC 2010. Volume 6470 of Lecture Notes in Computer Science</source>
          . Springer (
          <year>2010</year>
          )
          <volume>715</volume>
          {
          <fpage>716</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>