<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conceiving a Multiscale Dataspace for Data Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matheus Silva Mota</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andre´ Santanche`</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>A consequence of the intensive growth of information shared online is the increase of opportunities to link and integrate distinct sources of knowledge. This linking and integration can be hampered by different levels of heterogeneity in the available sources. Existing approaches focusing on heavyweight integration - e.g., schema mapping or ontology alignment - require costly upfront efforts to handle specific formats/schemas. In this scenario, dataspaces emerge as a modern alternative approach to address the integration of heterogeneous sources. The classic heavyweight upfront one-step integration is replaced by an incremental integration, starting from lightweight connections, tightening and improving them when benefits worth such effort. Based on several previous work on data integration for data analysis, this work discusses the conception of a multiscale-based dataspace architecture, called LinkedScales. It departs from the notion of integration-scales within a dataspace, and defines a systematic and progressive integration process via graph-based transformations over a graph database. LinkedScales aims to provide a homogeneous view of heterogeneous sources, allowing systems to reach and produce different integration levels on demand, going from raw representations (lower scales) towards ontology-like structures (higher scales).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>In order to integrate available sources, classical data integration approaches, found
in the literature, usually require an up-front effort related to schema recognition/mapping
in an all-or-nothing fashion [Halevy et al. 2006a]. On demand integration of distinct
and heterogeneous sources requires ad hoc solutions and repeated effort from specialists
[Franklin et al. 2005].</p>
      <p>Franklin et. al propose the notion of dataspaces to address the problems
mentioned above [Franklin et al. 2005]. The dataspace vision aims to provide the benefits of
the classical data integration approach, but via a progressive “pay-as-you-go” integration
[Halevy et al. 2006a]. They argue that linking lots of “fine-grained” information
particles, bearing “little semantics”, already bring benefits to applications, and more links can
be produced on demand, as lightweight steps of integration.</p>
      <p>Related work proposals address distinct aspects of dataspaces. Regarding the
architectural aspect, each work explores a different issue of a dataspace system. Among
all efforts, no dominant proposal of a complete architecture has emerged until now. We
observed that, in a progressive integration process, steps are not all alike. They can be
distinguished by interdependent roles, which we organize here as abstraction layers. They
are materialized in our LinkedScales, a graph-based dataspace architecture. Inspired by
a common backbone found in related work, LinkedScales aims to provide an
architecture for dataspace systems that supports progressive integration and the management of
heterogeneous sources.</p>
      <p>LinkedScales takes advantage of the flexibility of graph structures and proposes
the notion of scales of integration. Scales are represented as graphs, managed in graph
databases. Operations become transformations of such graphs. LinkedScales also
systematically defines a set of scales as layers, where each scale focuses in a different level
of integration and its respective abstraction. In a progressive integration, each scale
congregates homologous lightweight steps. They are interconnected, supporting provenance
traceability. Furthermore, LinkedScales supports a complete dataspace lifecycle,
including automatic initialization, maintenance and refinement of the links.</p>
      <p>This paper discusses the conceiving of the LinkedScales architecture and is
organized as follows. Section 2 discusses some concepts and related work. Section 3
introduces the LinkedScales proposal, also discussing previous work and how such experiences
led to the proposed architecture. Section 4 presents previous work on data integration
and discusses how such experiences are reflected in current proposal. Finally, Section 5
presents some conclusions and future steps.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. The Classical Data Integration</title>
      <p>Motivated by such increasingly need of treating multiple and heterogeneous data
sources, data integration has been the focus of attention in the database community in the
past two decades [Hedeler et al. 2013]. One predominant strategy is based on
providing a virtual unified view under a global schema (GS) [Kolaitis 2005]. Within GS
systems, the data stay in their original data sources – maintaining their original schemas
– and are dynamically fetched and mapped to a global schema under clients’ request
[Lenzerini 2002, Hedeler et al. 2013]. In a nutshell, applications send queries to a
mediator, which maps them into several sub-queries dispatched to wrappers, according to
metadata regarding capabilities of the participating DBMSs. Wrappers map queries to the
underlying DBMSs and the results back to the mediator, guided by the global schema.
Queries are optimized and evaluated according to each DBMS within the set, providing
the illusion of a single database to applications [Lenzerini 2002].</p>
      <p>A main problem found in this “classical” data integration strategy regards the big
upfront effort required to produce a global schema definition [Halevy et al. 2006b]. Since
in some domains different DBMSs may emerge and schemas are constantly changing,
such costly initial step can become impracticable [Hedeler et al. 2013]. Moreover, several
approaches focus on a particular data model (e.g., relational), while new models also
become popular [Elsayed et al. 2006]. As we will present in next section, an alternative
to this classical all-or-nothing costly upfront data integration strategy is a strategy based
on progressive small integration steps.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. The “Pay-as-you-go” Dataspace Vision</title>
      <p>Since upfront mapping between schemas are labor intensive and scheme-static domains
are rare, pay-as-you-go integration strategies have gained momentum. Classical data
integration (presented in Section 2.1) approaches work successfully when integrating modest
numbers of stable databases in controlled environments, but lack an efficient solution
for scenarios in which schemas often change and new data models must be considered
[Hedeler et al. 2013]. In a data integration spectrum, the classical data integration is at the
high-cost/high-quality end, while an incremental integration based on progressive small
steps starts in the opposite side. However, this incremental integration can be
continuously refined in order to improve the connections among sources.</p>
      <p>In 2005, Franklin et. al published a paper proposing the notion of
dataspaces. The dataspace vision aims at providing the benefits of the classical data
integration approach, but in a progressive fashion [Halevy et al. 2006a, Singh and Jain 2011,
Hedeler et al. 2010]. The main argument behind the dataspace proposal is that, in the
current scenario, instead of a long wait for a global integration schema to have access to the
data, users would rather to have early access to the data, among small cycles of integration
– i.e., if the user needs the data now, some integration is better than nothing. This second
generation approach of data integration can be divided in a bootstrapping stage and
subsequent improvements. Progressive integration refinements can be based, for instance, on
structural analysis [Dong and Halevy 2007], on user feedback [Belhajjame et al. 2013] or
on manual / automatic mappings among sources – if benefits worth such effort.</p>
      <p>Dataspaces comprise several challenges related to the design of Dataspace
Support Platforms (DSSPs). The main goal of a DSSP is to provide basic support for
operations among all data sources within a dataspace, allowing developers to focus
on specific challenges of their applications, rather than handling low-level tasks
related to data integration [Singh and Jain 2011]. Many DSSPs have been proposed
recently addressing a variety of scenarios, e.g., SEMEX [Cai et al. 2005] and iMeMex
[Dittrich et al. 2009] on the PIM context; PayGo [Madhavan et al. 2007] focusing on
Web-related sources; and a justice-related DSSP[Dijk et al. 2013]. As far as we know,
up to date, the proposed DSSPs provide specialized solutions, targeting only specific
scenarios [Singh and Jain 2011, Hedeler et al. 2009].</p>
    </sec>
    <sec id="sec-5">
      <title>3. LinkedScales: A Multiscale Dataspace Architecture</title>
      <p>The goal of LinkedScales is to systematize the dataspace-based integration process in
an architecture. It slices integration levels in progressive layers, whose abstraction is
inspired by the notion of scales. As an initial effort, LinkedScales strategy focuses on
a specific goal on the dataspace scope: to provide a homogeneous view of data, hiding
details about heterogeneous and specific formats and schemas. To achieve this goal, the
current proposal does not address issues related to access policies, broadcast updates or
distributed access management.</p>
      <p>LinkedScales is an architecture for systematic and incremental data integration,
based on graph transformations, materialized in different scales of abstraction. It aims
to support algorithms and common tools for integrating data within the dataspaces.
Integration-scales are linked, and data in lower scales are connected to their
corresponding representations in higher scales. As discussed in next section, each integration-scale
is based on experiences acquired in three previous experiences related to data integration.</p>
      <p>The lowest part of Figure 1 – the Graph Dumper and the Sources – represents the
different data sources handled by our DSSP in their original format. Even though we are
conceiving an architecture that can be extended to any desired format, we are currently
focusing on spreadsheets, XML files and textual documents as underlying sources. Data
at this level are treated as black-boxes. Therefore, data items inside the sources are still
not addressable by links.</p>
      <p>The lower scale – the Physical Scale – aims at mapping the sources available in
the dataspace to a graph inside a graph database. This type of database stores graphs
in their native model and they are optimized to store and handle them. The operations
and query languages are tailored for graphs. There are several competing approaches to
represent graphs inside the database [Angles 2012, Angles and Gutierrez 2008].</p>
      <p>The Physical Scale is the lowest-level raw content+format representation of data
sources with addressable/linkable component items. It will reflect in a graph, as far as
possible, the original structure and content of the original underlying data sources. The
role of this scale – in an incremental integration process – concerns making explicit and
linkable data within sources. In a dataspace fashion, such effort to make raw content
explicit can be improved on demand.</p>
      <p>The Logical Scale aims at offering a common view to data inside similar or
equivalent structural models. Examples of structural models are: table and hierarchical
document. In the previous scale, there will be differences in the representation of a table within
a PDF, a table from a spreadsheet and a table within a HTML file, since they preserve
specificities of their formats. In this (Logical) scale, on the other hand, the three tables should
be represented in the same fashion, since they refer to the same structural model. This
will lead to a homogeneous approach to process tables, independently of how tables were
represented in their original specialized formats. To design the structural models of the
Logical Scale we will investigate initiatives such as the OMG’s1 Information
Management Metamodel2 (IMM). IMM addresses the heterogeneity among the models behind
Information Management systems, proposing a general interconnected metamodel,
aligning several existing metamodels. Figure 2 presents an overview of the current state of
the IMM and supported metamodels. For instance, it shows that XML and Relational
metamodels can be aligned into a common metamodel.</p>
      <p>In the Description Scale, the focus is in the content (e.g., labels of tags within
a XML or values in spreadsheet cells) and their relationships. Structural information
pertaining to specific models – e.g., aggregation nodes of XML – are discarded if they
do not affect the semantic interpretation of the data, otherwise, they will be transformed
in a relation between nodes following common patterns – for example, cells in the same
1Object Management Group – http://www.omg.org
2http://www.omgwiki.org/imm
row of a table are usually values for attributes of a given entity. Here, the structures from
previous scales will be reflected as RDF triples.</p>
      <p>The highest scale of Figure 1 is the Conceptual Scale. It unifies in a common
semantic framework the data of the lower scale. Algorithms to map content to this scale
exploit relationships between nodes of the Description Scale to discover and to make
explicit as ontologies the latent semantics in the existing content. As we discuss in next
section, it is possible in several scenarios to infer semantic entities – e.g., instances of
classes in ontologies – and their properties from the content. We are also considering
the existence of predefined ontologies, mapped straight to this scale, which will support
the mapping process and will be connected to the inferred entities. Here, algorithms
concerning entity linking should be investigated.</p>
    </sec>
    <sec id="sec-6">
      <title>4. Previous Work</title>
      <p>This proposal was conceived after experiences acquired during three previous research
projects. Although with different strategies, they addressed complementary issues
concerning data integration. In each project, experiments were conducted in a progressive
integration fashion, starting from independent artifacts – represented by proprietary
formats, in many cases – going towards the production of connections in lightweight or
heavyweight integration approaches. As we will show here, our heavyweight integration
here took a different perspective from an upfront one-step integration. It is the end of a
chain of integration steps, in which the semantics inferred from the content in the first
integration steps influences the following integration steps.</p>
      <p>We further detail and discuss the role of each work in the LinkedScales
architecture. While [Mota and Medeiros 2013] explores a homogeneous representation
model for textual documents independently of their formats, [Bernardo et al. 2013] and
[Miranda and Santanche` 2013] focus, respectively, on extracting and recognizing relevant
information stored in spreadsheets and XML artifacts, to exploit their latent semantics in
integration tasks.</p>
    </sec>
    <sec id="sec-7">
      <title>4.1. Homogeneous Model – Universal Lens for Textual Document Formats</title>
      <p>One of the key limits to index, handle, integrate and summarize sets of documents is the
heterogeneity of their formats. In order to address this problem, we envisaged a
“document space” in which several document sources represented in heterogeneous formats are
mapped to a homogeneous model we call Shadow [Mota and Medeiros 2013].</p>
      <p>Figure 3 illustrates a typical Shadow (serialized in XML). The content and
structure of a document in a specific format (e.g., PDF, ODT, DOC) is extracted and mapped
to an open structure – previously defined. The model behind this new structure, which
is homogeneous across documents in the space, is a common hierarchical denominator
found in most textual documents – e.g., sections, paragraphs, images. In the new
document space a shadow represents format+structure of a document, decoupled from its
specialized format.</p>
      <p>Shadows documents are abstractions of documents in specific formats, i.e., they
do not represent integrally the information of the source, focusing in the common
information that can be extracted according to the context. This abstract homogeneous model
allowed us to develop interesting applications in: document content integration and
semantic enrichment [Mota et al. 2011]; and searching in a document collection considering
structural elements, such as labels of images or references [Mota and Medeiros 2013].</p>
      <p>Figure 4 illustrates how this homogeneous view for a document space fits in the
LinkedScales architecture. This document space is equivalent to the Logical Scale,
restricted to the document context. Different from the LinkedScales approach, Shadows map
the documents in their original format straight to the generic model, without an
intermediary Physical Scale.</p>
      <p>After the Shadows experience we observed three important arguments to
represent such intermediary scale: (i) since this scale is not aimed at mapping the resources
to a common model, it focus in the specific concern of making explicit and addressable
the content; (ii) it preserves the best-effort graph representation of the source, with
provenance benefits; (iii) the big effort in the original one-batch-way conversion is factored in
smaller steps with intermediary benefits.</p>
      <p>In the LinkedScales’ Logical Scale, the Shadows’ document-driven common
model will be expanded towards a generic perspective involving a family of models.
4.2. Connecting descriptive XML data – a Linked Biology perspective
[Miranda and Santanche` 2013] studied a particular problem in the biology domain,
related to phenotypic descriptions and their relations with phylogenetic trees. Phenotypic
descriptions are a fundamental starting point for several biology tasks, like
identification of living beings or phylogenetic tree construction. Tools for this kind of description
usually store data in independent files following open standards (e.g., XML). The
descriptions are still based on textual sentences in natural language, limiting the support of
machines in integration, correlation and comparison operations.</p>
      <p>Even though modern phenotype description proposals are based on ontologies,
there still are open problems of how to take advantage of the existing patrimony of
descriptions. In such scenario, [Miranda and Santanche` 2013] proposes a progressive
integration approach based on successive graph transformations, which exploits the existing
latent semantics in the descriptions to guide this integration and semantic enrichment.</p>
      <p>Since the focus is in the content, this approach departs from a graph-based schema
which is a minimal common denominator among the main phenotypic description
standards. Operations which analyses the content – discovering hidden relations – drive the
integration process. Figure 5 draws the intersection between our architecture and the
integration approach proposed by [Miranda and Santanche` 2013]. Data of the original
artifacts are mapped straight to the Description Scale, in which structures have a secondary
role and the focus is in the content.</p>
      <p>In spite of the benefits of the focus in the content, simplifying the structures, this
approach loses information which will be relevant for provenance. Moreover, in an
interactive integration process, the user can perceive the importance of some information
not previously considered in the Description Scale. In this case, since the mapping comes
straight from the original sources, it becomes a hard task to update the extraction/mapping
algorithms to afford each new requirement. The Physical and Logical Scales simplify this
interactive process, since new requirements means updating graph transformations from
lower to upper scales.</p>
    </sec>
    <sec id="sec-8">
      <title>4.3. Progressively Integrating Biology Spreadsheet Data</title>
      <p>Even though spreadsheets play important role as “popular databases”, they were designed
as self contained units. This characteristic becomes an obstacle when users need to
integrate data from several spreadsheets, since the content is strongly coupled to file formats,
and schemas are implicit driven to human consumption. In [Bernardo et al. 2013], we
decoupled the content from the structure to discover and make explicit the implicit schema
embedded in the spreadsheets.</p>
      <p>Figure 6 illustrates the [Bernardo et al. 2013] approach in a LinkedScales
perspective. The work is divided in four steps, going from the original spreadsheets formats
straight to the Conceptual Scale. The first step is to recognize the spreadsheet nature. The
work assumes that users follow and share domain-specific practices when they are
constructing spreadsheets, which result in patterns to build them. Such patterns are exploited
in order to capture the nature of the spreadsheet and to infer a conceptual model behind
the pattern, which will reflect in an ontology class in the Conceptual Scale.</p>
      <p>This work stresses the importance of recognizing data as semantic entities to guide
further operations of integration and articulation. Via this strategy, authors are able to
transform several spreadsheets into a unified and integrated data repository. Figure 7
shows an example summarizing how they are articulated, starting from the recognition of
semantic entities behind implicit schemas. Two different spreadsheets (S1 and S2) related
to the biology domain have their schema recognized and mapped to specific ontology
classes – shown in Figure 7 as (A) and (B).</p>
      <p>Semantic entities can be properly interpreted, articulated and integrated with other
sources – such as DBPedia, GeoSpecies and other open datasets. In an experiment
involving more than 11,000 spreadsheets, we showed that it is possible to automatically
recognize and merge entities extracted from several spreadsheets.</p>
      <p>Figure 8 shows a screencopy of our query and visualization prototype for data3
extracted from spreadsheets (available in http://purl.org/biospread/?task=
pages/txnavigator).</p>
      <p>This work subsidized our proposal of a Conceptual Scale as the topmost layer
of our LinkedScales architecture. Several intermediary steps of transformation from the
original datasources towards entities are hidden inside the extraction/mapping program.
As in the previous cases, the process can be improved by materializing these intermediate
steps in scales of our architecture.</p>
    </sec>
    <sec id="sec-9">
      <title>5. Concluding Remarks</title>
      <p>This work presented a proposal for a dataspace system architecture based on graphs. It
systematizes in layers (scales) progressive integration steps, based in graph
transformations. The model is founded in previous work, which explored different aspects of the
proposal. LinkedScales is aligned with the modern perspective of treating several
heterogeneous datasources as parts of the same dataspace, addressing integration issues in
progressive steps, triggered on demand. Although our focus is in the architectural aspects,
we are designing a generic architecture able to be extended to several contexts.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgments</title>
      <p>Work partially financed4 by CNPq (#141353/2015-5), Microsoft Research FAPESP
Virtual Institute (NavScales project), FAPESP/Cepid in Computational Engineering and
Sciences (#2013/08293-7), CNPq (MuZOO Project), INCT in Web Science,
FAPESPPRONEX (eScience project), and individual grants from CAPES.</p>
      <p>3All data is available at our SPARQL endpoint: http://sparql.lis.ic.unicamp.br
4The opinions expressed in this work do not necessarily reflect those of the funding agencies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Angles 2012] Angles,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>A comparison of current graph database models</article-title>
          .
          <source>In Data Engineering Workshops (ICDEW)</source>
          ,
          <year>2012</year>
          IEEE 28th International Conference on, pages
          <fpage>171</fpage>
          -
          <lpage>177</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Angles and Gutierrez</source>
          <year>2008</year>
          ] Angles,
          <string-name>
            <given-names>R.</given-names>
            and
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Survey of graph database models</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>40</volume>
          (
          <issue>1</issue>
          ):1:
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          :
          <fpage>39</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Belhajjame et al. 2013]
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>N. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Embury</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hedeler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Incrementally improving dataspaces based on user feedback</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>38</volume>
          (
          <issue>5</issue>
          ):
          <fpage>656</fpage>
          -
          <lpage>687</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Bernardo et al. 2013]
          <string-name>
            <surname>Bernardo</surname>
            ,
            <given-names>I. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mota</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          , and Santanche`,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Extracting and semantically integrating implicit schemas from multiple spreadsheets of biology based on the recognition of their nature</article-title>
          .
          <source>Journal of Information and Data Management</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Cai et al. 2005] Cai,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            ,
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            , and
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Personal information management with semex</article-title>
          .
          <source>In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD '05</source>
          , pages
          <fpage>921</fpage>
          -
          <lpage>923</lpage>
          , New York, NY, USA. ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Dijk et al. 2013]
          <string-name>
            <surname>Dijk</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choenni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leertouwer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spruit</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Brinkkemper</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A data space system for the criminal justice chain</article-title>
          . In Meersman, R.,
          <string-name>
            <surname>Panetto</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eder</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellahsene</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leenheer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dou</surname>
          </string-name>
          , D., editors,
          <source>On the Move to Meaningful Internet Systems: OTM 2013 Conferences</source>
          , volume
          <volume>8185</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>755</fpage>
          -
          <lpage>763</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Dittrich et al. 2009]
          <string-name>
            <surname>Dittrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salles</surname>
            ,
            <given-names>M. A. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Blunschi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>imemex: From search to information integration and back</article-title>
          .
          <source>IEEE Data Eng. Bull.</source>
          ,
          <volume>32</volume>
          (
          <issue>2</issue>
          ):
          <fpage>28</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Dong and Halevy</source>
          <year>2007</year>
          ]
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Indexing dataspaces</article-title>
          .
          <source>In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD '07</source>
          , pages
          <fpage>43</fpage>
          -
          <lpage>54</lpage>
          , New York, NY, USA. ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Elsayed and Brezany</source>
          <year>2010</year>
          ]
          <article-title>Elsayed, I. and</article-title>
          <string-name>
            <surname>Brezany</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Towards large-scale scientific dataspaces for e-science applications</article-title>
          .
          <source>In Database Systems for Advanced Applications</source>
          , pages
          <fpage>69</fpage>
          -
          <lpage>80</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Elsayed et al. 2006]
          <string-name>
            <surname>Elsayed</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brezany</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tjoa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Towards realization of dataspaces</article-title>
          .
          <source>In Database and Expert Systems Applications</source>
          ,
          <year>2006</year>
          . DEXA '
          <volume>06</volume>
          . 17th International Workshop on, pages
          <fpage>266</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Franklin et al. 2005] Franklin,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , and
            <surname>Maier</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>From databases to dataspaces</article-title>
          .
          <source>ACM Sigmod Record</source>
          ,
          <volume>34</volume>
          (
          <issue>4</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Halevy et al. 2006a]
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franklin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2006a</year>
          ).
          <article-title>Principles of dataspace systems</article-title>
          .
          <source>In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART, PODS '06</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          , New York, NY, USA. ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Halevy et al. 2006b]
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajaraman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ordille</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2006b</year>
          ).
          <article-title>Data integration: The teenage years</article-title>
          .
          <source>In Proceedings of the 32Nd International Conference on Very Large Data Bases, VLDB '06</source>
          , pages
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          . VLDB Endowment.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Heath and Bizer</source>
          <year>2011</year>
          ] Heath,
          <string-name>
            <given-names>T.</given-names>
            and
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Hedeler et al. 2009]
          <string-name>
            <surname>Hedeler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Embury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Dimensions of dataspaces</article-title>
          . In Sexton, A., editor,
          <source>Dataspace: The Final Frontier</source>
          , volume
          <volume>5588</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>55</fpage>
          -
          <lpage>66</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Hedeler et al. 2010]
          <string-name>
            <surname>Hedeler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Embury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Chapter 7: Dataspaces</article-title>
          . In Ceri, S. and
          <string-name>
            <surname>Brambilla</surname>
          </string-name>
          , M., editors,
          <source>Search Computing</source>
          , volume
          <volume>5950</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>114</fpage>
          -
          <lpage>134</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Hedeler et al. 2013]
          <string-name>
            <surname>Hedeler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Embury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A functional model for dataspace management systems</article-title>
          . In Catania,
          <string-name>
            <given-names>B.</given-names>
            and
            <surname>Jain</surname>
          </string-name>
          , L. C., editors,
          <source>Advanced Query Processing</source>
          , volume
          <volume>36</volume>
          <source>of Intelligent Systems Reference Library</source>
          , pages
          <fpage>305</fpage>
          -
          <lpage>341</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Hey et al. 2009] Hey,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Tansley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , and
            <surname>Tolle</surname>
          </string-name>
          , K., editors (
          <year>2009</year>
          ).
          <article-title>The Fourth Paradigm: Data-Intensive Scientific Discovery</article-title>
          .
          <source>Microsoft Research</source>
          , Redmond, Washington.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Kolaitis 2005] Kolaitis,
          <string-name>
            <surname>P. G.</surname>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Schema mappings, data exchange, and metadata management</article-title>
          .
          <source>In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems</source>
          ,
          <source>PODS '05</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>75</lpage>
          , New York, NY, USA. ACM.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Lenzerini 2002] Lenzerini,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Data integration: A theoretical perspective</article-title>
          .
          <source>In Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '02</source>
          , pages
          <fpage>233</fpage>
          -
          <lpage>246</lpage>
          , New York, NY, USA. ACM.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Madhavan et al. 2007]
          <string-name>
            <surname>Madhavan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeffery</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Web-scale data integration: You can afford to pay as you go</article-title>
          .
          <source>In CIDR</source>
          , pages
          <fpage>342</fpage>
          -
          <lpage>350</lpage>
          . www.cidrdb.org.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Miranda and Santanche` 2013]
          <string-name>
            <surname>Miranda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Santanche`</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Unifying phenotypes to support semantic descriptions</article-title>
          .
          <source>Brazilian Conference on Ontological Research - ONTOBRAS</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Mota and Medeiros</source>
          <year>2013</year>
          ] Mota,
          <string-name>
            <given-names>M.</given-names>
            and
            <surname>Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Introducing shadows: Flexible document representation and annotation on the web</article-title>
          .
          <source>In Proceedings of Data Engineering Workshops (ICDEW)</source>
          ,
          <source>IEEE 29th International Conference on Data Engineering - ICDE</source>
          , pages
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Mota et al. 2011] Mota,
          <string-name>
            <given-names>M. S.</given-names>
            ,
            <surname>Longo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S. C.</given-names>
            ,
            <surname>Cugler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            , and
            <surname>Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. B.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Using linked data to extract geo-knowledge</article-title>
          .
          <source>In GeoInfo</source>
          , pages
          <fpage>111</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Singh and Jain</source>
          <year>2011</year>
          ]
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>A survey on dataspace</article-title>
          . In Wyld, D.,
          <string-name>
            <surname>Wozniak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaki</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meghanathan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nagamalai</surname>
          </string-name>
          , D., editors,
          <source>Advances in Network Security and Applications</source>
          , volume
          <volume>196</volume>
          of Communications in Computer and Information Science, pages
          <fpage>608</fpage>
          -
          <lpage>621</lpage>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>