<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data collection architecture for Big Data - a framework for a research agenda</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wout Hofman TNO</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kampweg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>DE Soesterberg The Netherlands</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>As big data is expected to contribute largely to economic growth, scalability of solutions becomes apparent for deployment by organisations. It requires automatic collection and processing of large, heterogeneous data sets of a variety of resources, dealing with various aspects like improving quality, fusion and linking data sets to provide homogeneous data to data analytics algorithms that are able to detect patterns or anomalies. This paper introduces two components in a data sharing architecture for big data. It presents the state of the art as basis for a research agenda. Aspects like ransport, storage, and processing are important for big data, but not addressed by this paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Big – and open data are mentioned as the most important technology trends, contributing growth to our society and
economy (Brynjolfsson, 2012),
        <xref ref-type="bibr" rid="ref29">(OECD, 2013)</xref>
        . Organisations can make better predictions and better decisions by
increasing situational awareness
        <xref ref-type="bibr" rid="ref7">(Endsley, 1995)</xref>
        . To collect, enhance, and process large data a set, a data value chain
has to be put in place
        <xref ref-type="bibr" rid="ref40 ref9">(Esmeijer, et al., 2013)</xref>
        , supported by technology. Since different data sources provide
heterogeneous data with a variety of technical formats
        <xref ref-type="bibr" rid="ref3">(Berners-Lee, 2009)</xref>
        , functions like data transformation,
matching and linking of data sets are required
        <xref ref-type="bibr" rid="ref27">(Ngonga Ngomo &amp; Auer, 2011)</xref>
        . Other important aspects for data
sharing are data quality
        <xref ref-type="bibr" rid="ref2">(Batini &amp; Scannapieco, 2006)</xref>
        that can be expressed by a large number of properties
        <xref ref-type="bibr" rid="ref40">(Zaveri,
et al., 2013)</xref>
        , and data governance, that can lead to particular data sharing interventions
        <xref ref-type="bibr" rid="ref6">(Eckartz, et al., 2014)</xref>
        . Many
solutions for data sharing are based on Application Programming Interfaces (APIs) that need interpretation and thus
require a lot of time to make data processable by analytics
        <xref ref-type="bibr" rid="ref18 ref6">(Hofman &amp; Rajagopal, 2014)</xref>
        . As the number of open data
sources increases
        <xref ref-type="bibr" rid="ref41">(Zuiderwijk, et al., 2014)</xref>
        , it becomes hard to access the proper sources. Properties
        <xref ref-type="bibr" rid="ref40">(Zaveri, et al.,
2013)</xref>
        need to be added to support the data value chain
        <xref ref-type="bibr" rid="ref40 ref9">(Esmeijer, et al., 2013)</xref>
        .
      </p>
      <p>
        An analysis of data sharing platforms is already done
        <xref ref-type="bibr" rid="ref18 ref6">(Hofman &amp; Rajagopal, 2014)</xref>
        , but did not yet consider all
functionality required by a data value chain mentioned before. By proposing a data collection architecture, this paper
introduces a number of research questions for interoperability in big data applications. The architecture is based on
the results of research informed by practice with an alpha version solution developed and validated by practitioners in
various national and international projects
        <xref ref-type="bibr" rid="ref24 ref34">(Sein, et al., 2011)</xref>
        . This paper does not address transport, (distributed)
processing, and storage of large data sets, but assumes the availability of technology for these functions.
      </p>
      <p>First of all, the data value chain and resource orientation are discussed and secondly the architectural components
are presented for identifying research questions. When applicable, available software components that partly the
required functionality is presented.</p>
      <p>Copyright © 2015 by the paper’s authors. Copying permitted only for private and academic purposes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Introducing the data value chain</title>
      <p>
        This section introduces the data value chain
        <xref ref-type="bibr" rid="ref40 ref9">(Esmeijer, et al., 2013)</xref>
        providing requirements to an architecture and
presents two design principles for developing the components, namely the use of semantic web standards and a
resource oriented architecture.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Data value chain</title>
        <p>
          This section briefly introduces the data value chain and discusses its functionality. In order to create a data collection
architecture as a framework for research questions, a more elaborate and detailed model is required. Using expert
interviews, case interviews, and workshops as primary input – combined with desk research
          <xref ref-type="bibr" rid="ref29">(OECD, 2013)</xref>
          ,
          <xref ref-type="bibr" rid="ref19 ref40">(Kaisler,
et al., 2013)</xref>
          –
          <xref ref-type="bibr" rid="ref40 ref9">(Esmeijer, et al., 2013)</xref>
          have identified the following steps in a data value chain (see also figure 1):
1. Data generation and collection (e.g. inventory of data sources and its qualities, enabling access to data sources)
2. Data preparation (e.g. filtering, cleaning, verification, adding metadata)
3. Data integration (establishing a common data representation of data from multiple sources)
4. Data storage (e.g. local databases, cloud storage, hybrid solutions)
5. Data analysis (e.g. text mining, network analysis, anomaly detection)
6. Data output (e.g. visualization)
7. Data driven action (e.g. decision making, customer segmentation)
8. Data governance &amp; security (e.g. governance, privacy)
Taking a closer look at the steps in the data value chain, some design issues become apparent from a technical
perspective. First of all, metadata is added to data at data preparation stage, although this metadata is probably always
identical each time data is collected from that resource. Secondly, the data value chain positions data integration after
data preparation, implying that functionality like data cleaning is always specific for a particular data set and needs to
consider the particular formats and semantics of that data set. It would be better to first transform data to a common
format before filtering and cleaning the data. Filtering and cleaning could be based on machine learning technology
          <xref ref-type="bibr" rid="ref38">(Witten &amp; Frank, 2005)</xref>
          . Data analysis comprises also more than only the examples given; basic algorithms consider
machine learning and multi-view clustering, where the latter can support for instance customer segmentation.
Furthermore, they do not distinguish between real time streaming data like video and (un)structured data with discrete
values as stored in databases of organisations. Technically, these types of data require different technology, for
instance in terms of storage but also analytics. Analytics might be applied to extract structured dats out of multi media
data, where the ex tracked data can be further analyses. Transformation of multi media data is probably supported by
all types of open source components, either without or with a decrease of quality.
        </p>
        <p>
          According to the data value chain, data governance and security is part of the core of analytics. However, data
governance with interventions like Identification and Authentication is a requirement of a data owner to stay in
control of its particular data
          <xref ref-type="bibr" rid="ref6">(Eckartz, et al., 2014)</xref>
          . These considerations are the basis for developing the architecture.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Design principles</title>
        <p>
          This section briefly presents two design principles. The first is to apply semantic web standards like Ontology Web
Language (OWL) and Resource Description Format (RDF) for both modelling knowledge and linking data sets by
constructing networked ontologies of those sets, thus creating Linked Data
          <xref ref-type="bibr" rid="ref3">(Berners-Lee, 2009)</xref>
          . The ultimate goal is
to refer to data at its source and only access the data when required, which has several advantages like being in
control
          <xref ref-type="bibr" rid="ref6">(Eckartz, et al., 2014)</xref>
          and always access to the most up to date data. With respect to data collection of Linked
Data, different strategies can be taken, e.g. data crawling for collecting data of known resources, on-the-fly
dereferencing by evaluating links to data when required, and query federation by applying dereferencing by a
resource when data of that resource is required
          <xref ref-type="bibr" rid="ref15">(Heath &amp; Bizer, 2011)</xref>
          . Data crawling can be applied when data
resources are known; data can be collected from these resources at regular time intervals. On-the-fly dereferencing is
used when additional data is required to decide on an action and a reference (URI: Uniform Resource Identifier) is
available. Customs introduces this mechanism for improving risk analysis
          <xref ref-type="bibr" rid="ref16">(Hofman, 2013)</xref>
          . Query federation is
applied to answer queries formulated by an application or user, where the data resources providing data are not
known in advance. These patterns can be applied in the architecture for data collection.
        </p>
        <p>
          The second design principle is to consider every thing, person, and organization as a data resource from an
Information - and Communication Technology (ICT) perspective. A package, a container, a truck, a truck driver, and
a carrier are for instance all considered as a ‘resource’ in logistics. Each of these resources can have some type of
autonomy represented by its goals and capabilities
          <xref ref-type="bibr" rid="ref35">(Spohrer, May 2009)</xref>
          . A resource can have an owner, a user, and a
virtual representation in an ICT system, e.g. a carrier has a fleet management application storing truck data as a
virtual representation of its trucks and a driver uses that truck as employee of that enterprise on a particular trip.
Physical objects can also have processing capabilities, i.e. there virtual representation is attached to them somehow,
which makes them active objects that can share data and make decisions in the context of a goal (see also
          <xref ref-type="bibr" rid="ref26 ref40">(Montreuil,
et al., 2013)</xref>
          ). One resource can have one or more data sets, where these data sets can be quite simple, e.g. a sensor
providing a temperature in degrees Celsius, or complex, e.g. an Enterprise Resource Planning (ERP) system
providing transaction – and master product data files.
        </p>
        <p>
          Resources are autonomous, which imposes data sharing restrictions from a legal - like privacy or liability or a
commercial perspective. Since data is currently also considered as an asset
          <xref ref-type="bibr" rid="ref29">(OECD, 2013)</xref>
          , autonomous resources
also expect to gain economic benefits from sharing data, which is a barrier to process improvement (Brynjolfsson,
2012). Resource autonomy will be reflected in a data policy resulting from data governance
          <xref ref-type="bibr" rid="ref6">(Eckartz, et al., 2014)</xref>
          .
Another aspect of importance from the perspective of big data is to decompose a query into queries to one or more
resources, depending on particular query characteristics. Query decomposition can be compared with dynamic
service composition based on matching of goals with capabilities or services
          <xref ref-type="bibr" rid="ref35">(Spohrer, May 2009)</xref>
          , but differs by the
fact that a query is formulated on a common semantic model of one or more . Characteristics of data sets of these
resources in terms of for instance data quality are considered. Query decomposition, which differs from the three
patterns identified for Linked Data
          <xref ref-type="bibr" rid="ref15">(Heath &amp; Bizer, 2011)</xref>
          requires further research. In the context of this paper, it is
identified as a required functionality for data collection.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Towards data collection architecture</title>
      <p>The previous section introduced the data value chain and two design principles, resulting in requirements to ICT
components. This section introduces two ICT components that can be further decomposed, namely a a registry
component and a data processing engine. Both components have to interface with each other and can be deployed in
various ways, e.g. a data-sharing platform or peer-to-peer solutions. Deployment aspects will not be discussed;
requirements to functionality of both components are further discussed. By identifying functionality, research
questions can be formulated.</p>
      <sec id="sec-3-1">
        <title>3.1. A registry component</title>
        <p>
          In the past, various forms of registries have been specified, e.g. Uniform Description Discovery and Integration
(UDDI) for web services
          <xref ref-type="bibr" rid="ref8">(Erl, 2005)</xref>
          and the Uniform Service Description Language (USDL,
          <xref ref-type="bibr" rid="ref1">(Barros &amp; Oberle,
2012)</xref>
          ). Whereas UDDI comprises technical services in WSDL format (Web Service Definition Language), USDL
distinguishes several aspects of a service separately, e.g. an abstract specification of its functionality, pricing
structures, and its technical interfaces. The semantics of services is modelled by parameters, but it is not clear how
these are to be specified. However, we consider semantics as the core of data collection in big data (Big Data Value
Association, 2015) and transformation between various (technical) formats. Accessing the data of a resource is for
instance by a REST API or as streaming data.
        </p>
        <p>
          Semantics of a resource, that we will express as ontology (see before), can be specified at different levels, e.g. per
API, per data set of a resource, or a common information model of various APIs or data sets
          <xref ref-type="bibr" rid="ref8">(Erl, 2005)</xref>
          . We propose
to construct a networked ontology
          <xref ref-type="bibr" rid="ref28">(Noy &amp; McGuiness, 2011)</xref>
          based on matching and linking different ontologies
          <xref ref-type="bibr" rid="ref10">(Euzenat &amp; Shvaiko, 2010)</xref>
          . Resources can express their data sets in a common ontology if they all have data in the
same domain, e.g. logistics, energy, and health. Most business ontologies can be matched or linked, since
organisations exchange value based on services
          <xref ref-type="bibr" rid="ref35">(Spohrer, May 2009)</xref>
          . Semantics can be developed with tools like
Protégé and TopBraid Composer, where the latter can also be applied to automatically construct an OWL
representation of an XSD. Such an OWL representation of an XSD has to be matched to a networked ontology.
SPARQL Inference Notation (SPIN), a recommendation of the World Wide Web Consortium, can represent those
mappings. On the other hand, a resource can also express its API in terms of ontology by for instance specifying its
own ontology and generating an XSD of that API or implementing an RDF interface on top of its data.
        </p>
        <p>
          As we have seen, not only semantics and technical interface with communication protocols and Uniform Resource
Locator (URL) are important, but also the annotation of resource and its data in terms of data quality
          <xref ref-type="bibr" rid="ref40">(Zaveri, et al.,
2013)</xref>
          and data policies including pricing and security. Linked data quality dimensions are context, trust, intrinsic
dimensions, accessibility, representation, and dynamicity
          <xref ref-type="bibr" rid="ref40">(Zaveri, et al., 2013)</xref>
          . These dimensions are further
decomposed into for instance completeness, amount of data, and relevance of the retrieved data for contextual
dimensions. They do not distinguish between parameters of a data resource and a data set, since for instance trust can
be assigned to a resource and completeness to a data set. They also consider value assignment of these parameters by
data users, whereas resources will also assign values like licensing and completeness to address issues like liability
from a data governance perspective
          <xref ref-type="bibr" rid="ref6">(Eckartz, et al., 2014)</xref>
          . Further research is required with respect to metadata of
resources and data sets and annotation mechanisms by a resource and data user(s). From the perspective of a data
user, particular metadata is required to clean the data, whereas a user assigns this metadata based on machine learning
technology
          <xref ref-type="bibr" rid="ref22">(Lehman, 2009)</xref>
          . A feedback to a resource might improve data quality creating mechanisms similar to
Zoover or used by AirBnB common to travel and leisure. Research is also required whether annotations are specific
to each instance of a data set collected from a resource, i.e. each instance of a data set is diffeerent, or it can be
annotated for all instances.
        </p>
        <p>
          In open data communities, platforms like CKAN or WSO2 API manager are applied as registries, whereas the
functionality of these platforms differs
          <xref ref-type="bibr" rid="ref18 ref6">(Hofman &amp; Rajagopal, 2014)</xref>
          . These platforms do not support semantics or
annotation of APIs of data sets. They support data providers in publishing their APIs; searching for proper data sets is
by browsing through an a huge collection, possibly grouped in some reasonable way like geographical data sets.
        </p>
        <p>Queries can also be expressed in semantics using for instance SPARQL. Such a query might retrieve one or more
URLs of data sets or resources with a required annotation or to decompose the query into queries directly to two or
more resources (query federation). The latter might result in a particular sequencing in which these resources should
be queried, also including functionality like combining data sets of different resources to one or validating
consistency across data sets of different resources. Such sequencing is represented by a (data) workflow with
particular data manipulation functions (see next section). A data collection strategy might provide rules that govern
the data workflow. These rules are for further research.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. A data collection engine</title>
        <p>
          The objective of a data collection engine is to homogenise data of heterogeneous data sets, potentially link or fuse
those data sets, and improve data quality (see before). Data collection utilizes one or more of the three mechanisms
mentioned before
          <xref ref-type="bibr" rid="ref15">(Heath &amp; Bizer, 2011)</xref>
          . A data collection engine has to deal with different quality of data sets
provided. It has to support a data collection strategy that addresses aspects like ‘inconsistency’ by fusing data of two
or more resources and ‘trustworthiness’ as part of ‘provenance’
          <xref ref-type="bibr" rid="ref40">(Zaveri, et al., 2013)</xref>
          by validating the content of one
data set of a resource against a similar data set of one or two other resources.
        </p>
        <p>
          Data semantics and – annotation, expressing data quality and other aspects of both a data set and data resource,
determine data processing. More particular, the following functionality is required to automatically collect and
process large (structured) data sets:
- Data workflow engine. A data collection strategy based on data set annotation governs collection and
preprocessing. A data workflow engine provides such a control by for instance collecting and comparing data of two
or more resources for validating (in)consistencies. Such an engine can also support the various ways of data
collection
          <xref ref-type="bibr" rid="ref15">(Heath &amp; Bizer, 2011)</xref>
          . A data workflow can be modelled as any other workflow utilising standards like
Business Process Model and Notation (BPMN,
          <xref ref-type="bibr" rid="ref30">(OMG, 2011)</xref>
          ).
- Data set manipulation. This particular function processes an individual data set collected from a resource. Data is
transformed to a common format with known semantics and potentially cleansed based on data set annotations.
Cleansing can deal with incompleteness using for instance Bayesian Belief Networks
          <xref ref-type="bibr" rid="ref14">(Han &amp; Kamber, 2006)</xref>
          . It can
also be useful to extract data from unstructured, multi-media data sets using analytics
          <xref ref-type="bibr" rid="ref33">(Schutte, et al., 2014)</xref>
          .
- Data integration. This function considers more than one data set as input. Two types of approaches for data
integration are distinguished, namely fusion of two or more data sets to one data set and linking two or more data
sets. Fusion is performed on two or more data sets that are similar, i.e. their semantic models are for a large part
identical (there might be some deviations in two models). An example is the fusion of traffic data from two or more
resources. Notice that the instances of two data sets may be different, e.g. they contain traffic data of two areas.
Fusion can be applied for various reasons, e.g. completeness, consistency and provenance, and is thus closely related
to a data collection strategy. Linking data sets is only done on those data instances that have common concepts in two
or more data sets. A most common example of linked data is on geographical coordinates and visualising data sets on
maps.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Research questions</title>
      <p>This section introduces two main research questions that can be further specified. These two main research questions
that will be further detailed, are:
1. Resource profiles to find and collect (large) data sets in these adaptive complex systems.
2. Seamless interoperability for creating adaptive complex systems.</p>
      <p>Besides these basic research questions, there are also systems architecture questions, like the interface between a
registry and a data collection engine. Furthermore, individual components need to interface to construct a solution
that can be deployed. We will not address these questions in this paper; they are very relevant for actual deployment.</p>
      <sec id="sec-4-1">
        <title>4.1. Resource profiles</title>
        <p>
          In the previous sections, we have argued that it is difficult to find useful, annotated data due to all types of reasons.
We have also argued on the importance of data set annotation for data processing. Lack of semantics is one of the
reasons that we will address in section 4.2. We have also introduced some definitions in a loosely way, e.g. resource,
data set, and instance of a data set. Proper definitions are required to be able to develop software, like a resource is
the owner or keeper of one or more data sets, where each data set has a known semantics and many instances.
This section identifies the following research questions that build upon these definitions:
- Profile of a data set of a resource. There are not yet profiles specified that express available instances of a data set
in a known semantic model andprovide technical details as to how instances can be collected. Such a profile should
also include the value instances of a data setcan provide expressed in known semantics (e.g. a temperature in degrees
Celsius) or more complex ones like utilization of water depths for vessels. An example of profiles can be found in
          <xref ref-type="bibr" rid="ref20">(Kotok &amp; Webber, 2002)</xref>
          , but these focus on business processes instead of semantics of data sets.
- Annotation and metadata. There is not yet a set of metadata of data sets expressing all types of features like
presented in
          <xref ref-type="bibr" rid="ref40">(Zaveri, et al., 2013)</xref>
          , that can be used to annotate data sets. Metadata to annotate services is given by
          <xref ref-type="bibr" rid="ref1">(Barros &amp; Oberle, 2012)</xref>
          , a standardised set needs to be identified to express various features of data sets like quality
          <xref ref-type="bibr" rid="ref2">(Batini &amp; Scannapieco, 2006)</xref>
          , pricing, etc. There are tools that are able to improve data quality (e.g. ORE –
Ontology Repair and Enrichment – and RDFUnit), but it is not ambiguous which quality features they address and
thus which metadata and annotations they could benefit from.
- Data governance rules and interventions, resulting in data policies. A decision model
          <xref ref-type="bibr" rid="ref6">(Eckartz, et al., 2014)</xref>
          is
available, but there are not yet interventions specified, let alone supported by tooling. Tools like WSO API
Management provide some sort of access management, but only on API level, resulting in management of an API for
a (group of) data user(s). Access mechanisms are also not yet specified, e.g. Attribute Based Access Control
          <xref ref-type="bibr" rid="ref12">(Goyal,
et al., 2006)</xref>
          and homomorphic encryption
          <xref ref-type="bibr" rid="ref11">(Gentry, 2009)</xref>
          are examples of potential interventions implementing
particular data policies.
- Query decomposition, (federated) search and find data sets. Linked data addresses query federation by
          <xref ref-type="bibr" rid="ref15">(Heath &amp;
Bizer, 2011)</xref>
          and SPARQL can be used to query data according to a known semantic model, but as annotations are
lacking and queries need to be formulated in languages like SPARQL, it is yet difficult to query for data. Patents are
in request that address particular types of queries on images (e.g.
          <xref ref-type="bibr" rid="ref33">(Schutte, et al., 2014)</xref>
          ).
        </p>
        <p>
          A resource can have one or more data sets. In some occasions, a profile is already standardised by a community,
e.g. based on a common semantic model with its supporting technical solutions. In those occasions, that profile acts
as a template implemented by more than one resource. Its semantics is probably expressed by an ontology that can be
matched or linked to existing ontologies
          <xref ref-type="bibr" rid="ref10">(Euzenat &amp; Shvaiko, 2010)</xref>
          to construct networked ontologies. Tools like
LogMap or YAM++ can be used for this purpose.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Seamless interoperability</title>
        <p>
          Various sources have stressed the importance of seamless interoperability, e.g.
          <xref ref-type="bibr" rid="ref26 ref40">(Montreuil, et al., 2013)</xref>
          and
          <xref ref-type="bibr" rid="ref4">(Chituc,
et al., 2009)</xref>
          . Seamless interoperability is also loosely defined as ‘the ability to easily configure the transformation
between any two data structures’. Whereas a lot of research effort has focussed on service orientation (see for
instance
          <xref ref-type="bibr" rid="ref23 ref40">(Li, et al., 2013)</xref>
          , research towards constructing semantic models for interoperability and integration of
database management systems or other data resources has been limited. Common data models for internal integration
have been developed by organisations
          <xref ref-type="bibr" rid="ref8">(Erl, 2005)</xref>
          , but only on a limited scale to interoperability between
organisations, e.g. for logistics
          <xref ref-type="bibr" rid="ref24 ref32 ref34">(Pedersen, et al., 2011)</xref>
          and customs
          <xref ref-type="bibr" rid="ref39">(World Customs Organization, 2010)</xref>
          , to support
message specification and generating message implementation guides
          <xref ref-type="bibr" rid="ref17">(Hofman, 2011)</xref>
          . Research in the semantic web
community has been on ontology development and linked data see for instance
          <xref ref-type="bibr" rid="ref3">(Berners-Lee, 2009)</xref>
          , but the issue of
semantics and ease of transformation between any two data structures is not solved yet.
        </p>
        <p>
          Seamless interoperability can be expressed at different levels
          <xref ref-type="bibr" rid="ref36">(Wang, et al., 2009)</xref>
          . In this paper, we define
seamless interoperability as the ability of a resource to support its specified profile, which can be detailed to the
levels indicated by
          <xref ref-type="bibr" rid="ref36">(Wang, et al., 2009)</xref>
          :
- Syntax transformation, which considers the conversion between any two syntaxes for technically sharing data.
Examples are the support of data extraction from a database by transforming SQL into RDF. Tools like KARMA,
OnTop, and DataTank provide this kind of capabilities. Note that syntax transformation requires knowledge of
semantic transformation to assure data quality.
- Semantic transformation. Automatic transformation between any two data sets without any human intervention is
not yet solved. Semantic transformation is required to fuse and link data sets of different sources, detect
inconsistencies, etc. There are examples to apply machine learning for matching data sets, but only if these have the
same technical representation
          <xref ref-type="bibr" rid="ref10">(Euzenat &amp; Shvaiko, 2010)</xref>
          with tools like LogMapLite and LogMap. Although there is
an open standard for expressing semantic transformations, SPIN, it is yet only implemented by one enterprise. In case
machine learning is applied to automatic semantic transformations, it requires a large number of large data sets like
messages and XML documents with their implementation guides or XSDs.
- Governance. This is a separate aspect for further research that needs to address ontologies and transformations.
Ontologies can easily be incorporated into other ontologies, either by reference or as import, to construct networked
ontologies and match ontologies. In case of referencing, one needs to be sure that be sure that the referred ontology
will still be available. Applying machine learning to semantic transformation results in value since it dramatically
reduces interoperability implementation time. Governance mechanisms should be in place to assure this knowledge is
available to integrate resources.
        </p>
        <p>
          Semantic transformation addresses is required if a profile of a data set is going to be implemented by a data
resource or at data collection. Semantic – and syntactic behaviour is required for adaptive complex systems like
smart cities and logistics in which various application domains have to be interoperable and where the constellation
of participating resources changes dynamically. These systems have a network (see for instance
          <xref ref-type="bibr" rid="ref5">(Dekker, 2004)</xref>
          )
structure for collaboration. Interoperability implementation time, required for participating in such a system, needs to
be minimal or even zero.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and further work</title>
      <p>
        We have constructed an architecture with two basic components supporting to support a big data value chain and
used this architecture to identify research questions. Besides the basic research questions of profiling and seamless
interoperability, there needs to be further research towards deployment of an architecture that utilizes distributed
transport, processing, and storage of large data sets. For instance, to prevent unnecessary data sharing across the
Internet, it might be feasible to analyse data sets locally by transporting a data collection engine and an analytics tool
to a resource and provide the results to another data collection engine. These aspects, covered by for instance
Software Defined Networks
        <xref ref-type="bibr" rid="ref21">(Kreutz, 2015)</xref>
        , need further research including the integration between structured and
unstructured, multi-media data, e.g. by extracting structured data from unstructured data sets. Notice also that
utilizing RDF as syntax during data transport will rapidly increase data size. The architecture also has to be extended
to support behaviour in a more complex manner than data collection. Situational awareness
        <xref ref-type="bibr" rid="ref7">(Endsley, 1995)</xref>
        with
different types of analytics like descriptive, diagnostic, predictive, and prescriptive, results in actions that have to be
shared amongst participants in a complex system. As the number of participants in those networks increases, the
amount of data to be shared and interoperability complexity is also expected to increase. Where we only addressed
research questions for interoperability in big data, complex systems require additional functionality. Data governance
rules and data policies will also require functionality like identification and authentication, trust, privacy enhanced
technology, etc. that is also required by complex systems. These types of architectural questions need further
research.
      </p>
      <p>
        Another research issue is to apply semantics in data analytics. Current algorithms are agnostics of data semantics;
they just require homogeneous data sets to detect patterns and anomalies based on statistic or neuristic algorithms
        <xref ref-type="bibr" rid="ref38">(Witten &amp; Frank, 2005)</xref>
        . A research question is how to interface between data collection and data analytics in terms
of technical representation of data sets.
      </p>
      <p>
        Although this is not explicitly mentioned in our paper, the focus has been primarily on structured data sets
        <xref ref-type="bibr" rid="ref3">(Berners-Lee, 2009)</xref>
        . Unstructured, multi media data might require additional functionality that is for further research.
Big Data Value Association, 2015. European Big Data Value Strategic Research &amp; Innovation Agenda. [Online]
      </p>
      <p>Available at: bigdatavalue.eu [Accessed 22 March 2015].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Barros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Oberle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2012</year>
          .
          <article-title>Handbook of Service Description - USDL and its methods</article-title>
          . s.l.:Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Batini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Scannapieco</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Data quality: concepts, methodologies, and techniques</article-title>
          . Heidelberg: SpringerVerlag.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>Linked Data - four rules</article-title>
          . [Online] Available at: www.w3.org.DesignIssues/LinkedData Brynjolfsson, E.,
          <year>2012</year>
          .
          <article-title>Big Data: the management revolution</article-title>
          .
          <source>Harvard Business Review</source>
          ,
          <volume>10</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Chituc</surname>
            ,
            <given-names>C.-M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azevedo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Toscano</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>A framework proposal for seamless interoperability in a collaborative networked environment</article-title>
          . Computers in industry, Volume
          <volume>60</volume>
          , pp.
          <fpage>317</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Dekker</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <year>2004</year>
          .
          <article-title>Control of inter-organisational relationships: evidence on appropriation concerns and coordination mechanisms</article-title>
          .
          <source>Accounting, Organisations, and Society</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>27</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Eckartz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Veenstra</surname>
            ,
            <given-names>A. F. v.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>A decision model for data sharing</article-title>
          . Dublin, Ireland, Springer.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Endsley</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          ,
          <year>1995</year>
          .
          <article-title>Toward a theory of situation awareness in dynamic systems</article-title>
          .
          <source>Human Factors</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>32</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Erl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <year>2005</year>
          .
          <article-title>Service Oriented Architecture - concepts, technology and design</article-title>
          . s.l.:Prentice-Hall.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Esmeijer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakker</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Munck</surname>
          </string-name>
          , S. d.,
          <year>2013</year>
          .
          <article-title>Thriving and surviving in a data-driven society</article-title>
          , Delft: TNO.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <year>2010</year>
          . Ontology Matching. s.l.:Springer.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Gentry</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>A fully homomorphic encryption scheme</article-title>
          . s.l.:Stanford University.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pandey</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Waters</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Attribute-based encryption for fine-grained access control of encrypted data</article-title>
          . s.l.,
          <source>ACM</source>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Grefen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luftenegger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Linden</surname>
          </string-name>
          , E. v. d. &amp;
          <string-name>
            <surname>Weisleder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>BASE/X Business Agility through CrossOrganizaitonal Service Engineering - the business and service design approach developed in the CoProFind project</article-title>
          .
          <source>[Online] [Accessed March</source>
          <year>2015</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          . &amp;
          <string-name>
            <surname>Kamber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Data mining - concepts and techniques (second edition)</article-title>
          . s.l.:Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>Linked Data - evolving the Web into a Global Data Space, Synthesis lectures on the Semantic Web: Theory and Technology</article-title>
          . s.l.:Morgan &amp; Claypool Publishers.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Compliance management by business event mining in supply chain networks</article-title>
          .
          <source>Delft, s.n.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>W. J.</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>Applying semantic web technology to interoperability in freight logistics</article-title>
          . Munich, s.n.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Rajagopal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Constructing a Data Ecosystem - an overview of available software solutions</article-title>
          .
          <source>Journal of Theoretical and Applied Electronic Commerce Research</source>
          , p.
          <source>provisionally accepted.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Kaisler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Armour</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Espinosa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Money</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Big data: issues and challenges moving forward</article-title>
          . Hawai, IEEE, pp.
          <fpage>995</fpage>
          -
          <lpage>1004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Kotok</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2002</year>
          . ebXML
          <article-title>- the new global standard for doing business over the Internet</article-title>
          . s.l.:New Riders.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Kreutz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>Software-defined networking: a comprehensive survey</article-title>
          .
          <source>Proceedings of the IEEE</source>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Lehman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>DL-Learner: Learning Concepts in Description Logics</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          , Volume
          <volume>10</volume>
          , pp.
          <fpage>2639</fpage>
          -
          <lpage>2642</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.-H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>S.-M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yent</surname>
            ,
            <given-names>D. C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.-C.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Semantic-based transaction model for web service</article-title>
          .
          <source>Inf Syst Front</source>
          <volume>15</volume>
          , p.
          <fpage>249</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Lorenz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.,
          <year>2011</year>
          .
          <article-title>Discovery services in the EPC network</article-title>
          .
          <source>Designing an Deploying RFID Applications</source>
          , Intech, pp.
          <fpage>109</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>M. W.</given-names>
          </string-name>
          ,
          <year>1998</year>
          .
          <source>Architecting Principles of Systems-of-Systems. Systems Engineering</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>267</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Montreuil</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meller</surname>
            ,
            <given-names>R. D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ballot</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Physical Internet Foundations</article-title>
          . In:
          <article-title>Service Orientation in Holonic and Multi Agent Manufacturing Robots</article-title>
          . Heidelberg: Springer-Verlag, pp.
          <fpage>151</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C.</given-names>
            &amp;
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>LIMES - A time-efficient approach for large-scale Link discovery on the web of data, s</article-title>
          .l.: http://svn.aksw.org/papers/2011/WWW_LIMES/public.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>McGuiness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
          <source>Ontology Development</source>
          <volume>101</volume>
          :
          <article-title>a guide to creating your first ontology</article-title>
          ,
          <source>Stanford: Technical report KSL-01-05</source>
          , Computer Science Department, Stanford University.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>OECD</surname>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Exploring Data-Driven Innovation as a New Source of Growth: mapping policy issues raised by "Big Data"</article-title>
          .
          <source>OECD Digital Economy Papers, Issue</source>
          <volume>222</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>OMG</surname>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>Business Process Model and Notation (BPMN) - version 2.0</article-title>
          . [Online] Available at: www.omg.org/spec/BPMN/20100501
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Ostadzadeh</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shams</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Towards a software architecture maturity model for improving Ulta-Large scale systems interoperability</article-title>
          .
          <source>The International Journal of Soft Computing and Software Engineering , March</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>69</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>J. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paganelli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Westerheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
          <string-name>
            <given-names>A</given-names>
            <surname>Common Framework</surname>
          </string-name>
          <string-name>
            <surname>Freightwise</surname>
          </string-name>
          , Euridice and Smartfreight. [Online] Available at: www.efreightproject.
          <source>eu [Accessed 12 November</source>
          <year>2012</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Schutte</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schavemaker</surname>
            ,
            <given-names>J. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dijk</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>d</year>
          .,
          <year>2014</year>
          .
          <article-title>Method for detecting a moving ojbect in a sequence of images captured by a moving camera, computer system and computer program product</article-title>
          . U.S., Patent No.
          <volume>8</volume>
          ,
          <issue>629</issue>
          ,
          <fpage>908</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Sein</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          et al.,
          <year>2011</year>
          . Action Design Research.
          <source>MIS Quarterly</source>
          ,
          <volume>35</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>37</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Spohrer</surname>
            ,
            <given-names>J. K. S.</given-names>
          </string-name>
          , May
          <year>2009</year>
          . Service Science, Management, Engineering, and
          <string-name>
            <surname>Design</surname>
          </string-name>
          (SSMED)
          <article-title>- An emerging discipline - Outline and References</article-title>
          .
          <source>International Journal on Information Systems in the Service Sector.</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tolk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>The levels of conceptual interoperability model: applying systems engineering principles to M&amp;S. s</article-title>
          .l., Society for Computer Simulation International.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>Intelligent Transport System standards</article-title>
          . s.l.:
          <article-title>Artech House Inc</article-title>
          ..
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I. H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <year>2005</year>
          .
          <article-title>Data Mining: Practical Machine Learning tools and techniques - second edition</article-title>
          . s.l.:Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>World Customs Organization</surname>
          </string-name>
          ,
          <year>2010</year>
          .
          <article-title>WCO Data model - cross border transactions on the fast track, s</article-title>
          .l.: World Customs Organization.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          et al.,
          <year>2013</year>
          .
          <article-title>Quality assessment methodologies for Linked Open Data</article-title>
          .
          <article-title>Semantic-web-journal</article-title>
          .net.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>Zuiderwijk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Helbig</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gil-Garcia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Janssen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2014</year>
          . Guest Editors' Introduction.
          <article-title>Innovation through open data: a review of the state-of-the-art and an emerging research agenda</article-title>
          .
          <source>Journal of Theoretical and Applied Electronic Commerce Research</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>