=Paper=
{{Paper
|id=Vol-2357/paper7
|storemode=property
|title=Mapping Metadata from Different Research Infrastructures Into a Unified Framework for Use in a Virtual Research Environment
|pdfUrl=https://ceur-ws.org/Vol-2357/paper7.pdf
|volume=Vol-2357
|authors=Paul Martin,Laurent Remy,Maria Theodoridou,Keith Jeffery,Zhiming Zhao
|dblpUrl=https://dblp.org/rec/conf/iwsg/0002RTJZ18
}}
==Mapping Metadata from Different Research Infrastructures Into a Unified Framework for Use in a Virtual Research Environment==
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018
Mapping metadata from different research
infrastructures into a unified framework for use in a
virtual research environment
Paul Martin∗ , Laurent Remy† , Maria Theodoridou‡ , Keith Jeffery§ and Zhiming Zhao∗
∗ Institute for Informatics, University of Amsterdam, Amsterdam, Netherlands
† euroCRIS / IS4RI, France
‡ Institute of Computer Science, Foundation for Research and Technology—Hellas, Heraklion, Greece
§ Keith G Jeffery Consultants, United Kingdom
Emails: {p.w.martin, z.zhao}@uva.nl, lremy@is4ri.com, maria@ics.forth.gr, keith.jeffery@keithgjefferyconsultants.co.uk
Abstract—Virtual Research Environments (VREs) augment contrary to the recent drive towards open science and open
research activities by integrating tools for data discovery, data data, which discourages ‘walled garden’ solutions.
retrieval, workflow management and researcher collaboration,
Increasingly, what we observe instead is the creation of
often coupled with a specific computing infrastructure. The drive
towards open data science discourages ‘walled garden’ solutions dedicated research infrastructures (RIs) that aggregate and
however, and has led to the creation of dedicated research curate scientific data (including real-time observations) for a
infrastructures (RIs) that gather data and provide services to particular research community, which then provide access to
particular research communities without prejudice towards any these data via unified services [4], usually without prejudice
particular science gateway or virtual laboratory technology.
towards any particular VRE. Complicating this matter, there
There is a need for generic VREs that can be easily customised
to the needs of specific communities and coupled with the is now a substantive push to better integrate these efforts into
services and resources of many different RIs, but the resource a cohesive multidisciplinary commons for open science and
metadata produced by these RIs rarely adheres perfectly to any open research data, as embodied by initiatives such as the
particular standard or vocabulary, making it difficult to search European Open Science Cloud (EOSC) [5].
and discover resources independently of their provider. Cross-RI Developing generic VREs that can be easily coupled with
search can be expedited by metadata mapping services that can
harvest metadata published under different standards to build different RIs and customised for specific communities is a goal
unified resource catalogues—such an approach poses a number of many recent research projects, including VRE4EIC1 and
of challenges however. In this paper we take the example of the BlueBRIDGE2 , and is particularly challenging given the lack
VRE4EIC e-VRE metadata service, which uses X3ML mappings of conformity of standards and vocabularies in environmental
to build a single CERIF catalogue for describing data products science and similar domains. Significant software engineering
and other resources provided by multiple RIs. We consider the
extent to which it addresses the challenge of cross-RI search, effort is often required on the behalf of data scientists to build
and we also discuss how it might take advantage of semantic specific adaptors for such couplings, but even then it remains
harmonisation efforts in the environmental science domain. crucial to provide the capability to search across different RIs
Keywords—virtual research environment, research infrastruc- for similar data products or services to support integrative and
ture, metadata catalogue, metadata mapping. transdisciplinary research. This entails a complex interaction
between a VRE and multiple RIs, distributing queries through
I. I NTRODUCTION multiple adaptors and then aggregating the results—or else a
prior harvesting of metadata from all providers to allow pre-
Virtual Research Environments (VREs) [1], also known as liminary queries to be conducted on a single logical catalogue.
virtual laboratories or science gateways, are one of three In this paper we investigate how the use of a flexible
types of science support environment developed to support metadata mapping and publication service can expedite the
researchers in data science [2], focusing on supporting research coupling of a VRE with RI resources using different metadata
activities on a holistic rather than infrastructural or service schemes to provide cross-RI metadata search and discovery.
level. VREs provide integrated environments that typically As a case study, we take the VRE4EIC metadata service,
include tools for activities such as data discovery and retrieval, developed as a building block for an RI-agnostic VRE, and
collaboration, process scheduling and workflow management, we detail how X3ML mappings [6] from standards such as
and many are coupled with a particular computational infras- ISO 19139 [7] and DCAT [8] to CERIF [9] are used to
tructure, often making use of public e-infrastructures or the automatically ingest metadata published by different RIs to
Cloud. Data are brought into that infrastructure and manip-
ulated via a particular data processing platform or scientific 1 https://www.vre4eic.eu/
workflow management system [3]—however this approach is 2 http://www.bluebridge-vres.eu/
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018
produce a single resource catalogue. We weigh the benefits of Graphical User
Interface
this approach and discuss some ways in which such catalogues Authentication, Authorisation, Accounting Infrastructure (AAAI)
Application
Metadata Manager
can be further augmented, for example to facilitate semantic Linked Data
tier
Resource Manager System Manager Workflow Manager
Manager
search based on the harmonisation of vocabularies used for
Data Model
describing ecosystem and biodiversity data. Mapper
Interoperability
e-VRE Web Service Metadata Manager
tier
II. BACKGROUND Message Oriented
Middleware
Adapter Interoperability Manager
Modern environmental research depends on the collection Resource access
tier
Metadata Service
and analysis of large volumes of data gathered via sensors,
Research Infrastructure resources
observations, simulations and experimentation. Researchers provides functionality
are called upon to address societal challenges that are inex-
tricably tied to the stability of our native ecosystems such as Fig. 1. Providing a metadata service: the recommended microservice stack
food security and climate management, challenges intrinsically to implement the metadata manager in the e-VRE reference architecture.
interdisciplinary in nature, requiring collaboration across tra-
ditional disciplinary boundaries. The role of RIs in this context create
«Data transfer
is to support researchers with data, platforms and tools, but no service»
single RI can hope to encompass the full research ecosystem. request data prepare storage
The challenge therefore is to help researchers to freely and «Instrument «Raw data «Data store
controller» collector» controller»
effectively interact with the full range of research assets
deliver raw data import data for curation
potentially available to them across many RIs, allowing them
to collaborate and conduct their research more effectively. «PID service»
«Catalogue
service»
Publishing metadata about resources online (indicating type, acquire identifier update catalogues
coverage, provenance, etc.) allows RIs to advertise their facil-
ities and researchers to browse and discover data and other Fig. 2. A computational view of raw data acquisition: ENVRI RM specifies
resources useful to their research. While there exist standards components and activities using UML (in this case, a component diagram).
such as ISOs 19115 [10] and 19139 [7] for geospatial metadata
however, the implementation of such standards by RIs can
be somewhat idiosyncratic. Resource catalogues themselves RM-ODP [20], it models RIs from five viewpoints: science,
can be described using standards such as DCAT [8] and information, computation, engineering and technology. Each
harvested via CSW [11] or OAI-PMH [12], but many RIs view has its own concerns that correspond to those of the
also use Semantic Web [13] technologies such as OWL [14] other views, and is able to describe various key RI activities
and SKOS [15] to describe their resources, adapting ontologies (e.g. Figure 2). Open Information Linking for Environmental
such as OBOE [16] (for observations) and vocabularies such RIs (OIL-E) [21] is a small set of OWL specifications based on
as EnvThes [17] (for ecology) to meet their own community’s ENVRI RM that provide an upper ontology for RI descriptions
needs. Harmonisation of vocabulary and metadata between and which can be used to contextualise different kinds of RI
RIs thus remains a concern, with cluster projects such as asset from an architectural or interaction-based perspective—
ENVRIplus3 working to promote common models. Concur- as opposed to being a general-purpose ontology for describing
rently, initiatives like RDA4 address broader research data scientific phenomena like BFO [22]. A conceptual model with
management issues such as metadata standards cataloguing, a similar focus on the products and tools of research rather
standards for data collections and interoperability between than on scientific classification itself is CERIF [9], a European
repositories, providing recommendations to such projects. standard for describing research information systems. CERIF
From the VRE perspective, it is necessary to be pragmatic provides a framework for describing relationships between
when coupling with the services provided by RIs, a process people, projects, tools and research products (and more), and
that can also be assisted by the use of standard models and vo- has been applied to describing solid earth science RIs [23].
cabularies. Jeffery et al. [18] define a reference architecture for These models provide both the means to talk about research
enhanced VREs (‘e-VREs’) able to work with many different support environments such as VREs and RIs in a standard way,
RIs and e-infrastructures. In this architecture, microservices but can also be leveraged as a means to better classify different
are used to implement each of six key building blocks split kinds of resource as part of a faceted search mechanism, as we
across three tiers of operation, as shown in Figure 1 for shall discuss later in Section IV. For now, we consider how
the case of the metadata management. Meanwhile Nieva et VREs can be constructed that support rather than are hindered
al. [19] describe a reference model (ENVRI RM) for envi- by the heterogeneity of RI resources and resource metadata,
ronmental science RIs, defining their archetypical elements and how a VRE can facilitate cross-RI search and discovery.
in the context of the research data lifecycle. Being based on III. M ETHODOLOGY AND CHALLENGES
3 http://www.envriplus.eu/ According to Jeffery et al. [18], VREs can retrieve descrip-
4 https://rd-alliance.org/ tions of RIs’ resources either via separate interfaces with each
2
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018
Metadata Manager VRE Catalogue Research
Infrastructure A
Adaptor A Catalogue A
Virtual Research Research
Environment Adaptor B Catalogue B Infrastructure B
Adaptor C Catalogue C
Research
Interoperability Manager Infrastructure C
composition (part of) accesses publishes to
Fig. 3. An e-VRE produces adaptors to harvest and convert metadata from
different catalogues, building a common metadata catalogue for its users.
RI’s own resource catalogue, or via a joint resource catalogue
that already encompasses all of the RIs’ resources. The former
Fig. 4. The VRE4EIC metadata portal: searching for data publications
approach relies on the construction of separate discovery and published by Anna Artese through CNR Pisa’s mass spectrometry analytical
access interfaces with every RI, and makes it difficult to laboratory.
search over multiple RI resource catalogues simultaneously,
requiring the translation and distribution of queries over every
interface. Meanwhile, the latter approach simplifies search and catalogues in reasonable time.
discovery, but requires initial harvesting of metadata from all 3) How to manage the underlying catalogue schema—given
separate RI catalogues, translation of all metadata into a single new vocabularies, standards or simply evolution in how
common denominator standard, and careful management as the standards are applied, how to update the model underly-
number of original data sources scales upwards. ing a catalogue without losing existing data coherence.
In terms of the e-VRE reference architecture [18], there are 4) How to manage ever larger quantities of data—whether
a few needed steps to harvest resource metadata from an RI: by relying on more capable database technologies, dis-
1) A resource catalogue provided by an RI is identified tribution of the catalogue, or dynamic construction of
for harvesting. Identification might be performed by a the catalogue ‘on demand’ based on prior queries.
discovery service, or be part of the manual configuration In light of these challenges, we consider a particular im-
of a customised VRE metadata catalogue. plementation of the resource metadata harvesting approach
2) The VRE’s interoperability manager must provide an described above based on certain key technologies.
adaptor for the given resource catalogue—essentially,
IV. I MPLEMENTATION
the VRE must have the means to interact with the
catalogue via the correct protocol (e.g. OAI-PMH or The VRE4EIC Metadata Portal has been developed in
SPARQL [24]), but also have a model for (at least accordance with the e-VRE reference architecture, providing
partially) mapping metadata retrieved from the source the necessary components to implement the metadata manager
scheme to the scheme used internally by the VRE. functionality. The purpose of the portal is to provide faceted
3) The adaptor can then be used to harvest metadata records search over catalogue data harvested from multiple RIs, ag-
from the source, mapping them into a format suitable for gregated within a single CERIF-based VRE catalogue. Search
ingestion into the VRE’s own metadata catalogue. is based on the composition of queries based on the context
4) This ingested data is then made available to users of the of the research data, filtering by organisations, projects, sites,
VRE via its own search and query interface. instruments, people, etc., for example as shown in Figure 4.
The main entities involved in this process are shown in The portal supports map-based search, the export and storing
Figure 3. In this example, the result is that metadata can of specific queries, and the export of results in various formats.
now be harvested by the VRE’s metadata manager using The CERIF catalogue itself is implemented in RDF (based
the adaptors provided by the interoperability manager. This on an OWL ontology) as a Blazegraph5 triple store and is
activity may be a one-off event, but more likely the metadata structured according to CERIF version 1.66 .
harvested will need to be periodically updated. Metadata harvested from external sources is converted to
Whatever the chosen approach however, any VRE catalogu- CERIF RDF using the X3ML mapping framework [6]. The
ing solution should try to address certain challenges: mapping process is as illustrated in Figure 5:
1) How best to discover new resources—a VRE catalogue 1) Sample metadata, along with their corresponding meta-
may be carefully curated for a given community, but data schemes are retrieved for analysis.
even if automation is rejected, there should be a clear 2) Mappings are defined that dictate the transformation of
process for how to expand the catalogue. the selected RDF and XML based schemas to CERIF.
2) How to ensure the freshness of catalogue data—ensuring 5 https://www.blazegraph.com/
that updates to source catalogues are propagated to VRE 6 https://www.eurocris.org/cerif/main-features-cerif
3
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018
In summary, the Portal has many desirable characteristics: a
flexible model in CERIF for integrating heterogeneous meta-
data, a tool-assisted metadata mapping pipeline to easily create
or refine metadata mappings or refine existing mappings, and
a mature technology base for unified VRE catalogues. What
we foresee more development needed in is the discovery of
new resources and the acquisition of updates. In this respect,
RI-side services for advertisement of new resources or updates
to which a VRE can subscribe to trigger automated ingestion
of new or modified metadata would be particularly useful.
The VRE4EIC Metadata Portal has been provided as a
demonstrator to the cluster of environmental science RIs in
Europe via the ENVRIplus project as well as directly to
Fig. 5. e-VRE metadata acquisition and retrieval workflow: metadata records the European Plate Observing System (EPOS)10 , with sample
are acquired from multiple sources, mapped to CERIF RDF and stored in the data harvested from a subset of those RIs. Evaluation of
VRE catalogue; authenticated VRE users query data via the e-VRE. the demonstrator indicates a number of possible avenues of
development, particularly with regard to supporting richer
cross-RI search, the two most noteworthy here being:
1) Further exploitation of CERIF’s semantic layer.
2) Integration of semantic search facilities.
A notable feature of CERIF is how it separates its semantic
layer from its primary entity-relationship model. Most CERIF
relations are semantically agnostic, lacking any particular in-
terpretation beyond identifying a link. Almost every entity and
relation can be assigned though a classification that indicates
a particular semantic interpretation (e.g. that the relationship
between a Person and a Product is that of a creator), allowing a
Fig. 6. Example of mapping rules generated in 3M: result metadata in CKAN CERIF database to be enriched with concepts from an external
is mapped to a CERIF product with data properties corresponding to each
possible attribute in the original CKAN XML scheme. semantic model (or several linked models).
The vocabulary provided by OIL-E11 has been identified
within VRE4EIC as a means to further classify objects in
3) Metadata is retrieved from different data sources in their CERIF in terms of their role in an RI, e.g. classifying
native format, e.g. as ISO 19139 or CKAN7 data. individuals and facilities by the roles they play in research
4) These mappings are used to transform the source data activities, datasets in terms of the research data lifecycle,
into CERIF format. or computational services by the functions they enable. This
5) The transformed data are ingested into the CERIF meta- provides additional operational context for faceted search
data catalogue. (e.g. identifying which processes generated a given data prod-
Once ingested, these data become available to users of the uct), but providing additional context into the scientific context
metadata portal, who can query and browse data upon authen- for data products (e.g. categorising the experimental method
tication by the front-end authentication/authorisation service. applied or the branch of science to which it belongs) is also
X3ML mappings are described using the 3M Mapping necessary. Environmental science RIs such as AnaEE12 and
Memory Manager8 . Mappings are described by mapping rules LTER-Europe13 are actively developing better vocabularies for
relating subject-property-object triples from the source scheme describing ecosystem and biodiversity research data, building
to equivalent structures in the target scheme, subject to various upon existing SKOS vocabularies. The AnaEE data vocab-
syntactic conditions, as illustrated in Figure 6. 3M supports the ulary (anaeeThes) [25] and LTER’s environmental thesaurus
specification of generators to produce identifiers for new con- EnvThes [17] have mappings to other established domain
cepts constructed during translation of terms, and provides test vocabularies such as Agrovoc14 and GEMET15 . These RIs
and analytics facilities. Mappings into CERIF RDF have been are now collaborating with other RIs involved in ENVRIplus
produced for Dublin Core, CKAN, DCAT-AP, and ISO 19139 to harmonise their vocabularies in order to provide semantic
metadata, as well as RI architecture descriptions in OIL-E, as linking between terms used in their respective sub-domains.
part of the technical output of the VRE4EIC project9 . 10 https://www.epos-ip.org/
11 http://oil-e.net/ontology/
7 https://ckan.org/ 12 https://www.anaee.com/
8 https://github.com/isl/Mapping-Memory-Manager 13 http://www.lter-europe.net/lter-europe
9 Mappings are accessible at http://www.ics.forth.gr/isl/3M-VRE4EIC, user- 14 http://aims.fao.org/standards/agrovoc
name ‘vre4eicGuest’ and password ‘vre4eic’. 15 http://www.eionet.europa.eu/gemet/
4
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018
The identification of synonymous, subsuming and intersecting It is not only resource metadata that can be usefully accessed
terms (and the publication of links on the Semantic Web) via a VRE. Access to provenance data (which might be struc-
provides the basis for better semantic search, whereby a greater tured according to a standard such as PROV-O [36]) for data
range of data products with similar characteristics can be products and processes would also be useful to researchers,
retrieved on query without necessarily sharing precisely the and VREs can also be contributors of provenance data via their
same controlled vocabulary for their metadata. Making use of own workflow systems (e.g. for Kepler [37]). CERIF is able to
such linked vocabulary would simplify the task of integrating represent time-bounded role-based semantic relationships, but
resource metadata from multiple catalogues as it would reduce the source metadata provided by RIs still often lacks this kind
the need to map all metadata values into a single master of information; the adoption of standardised and ubiquitous
vocabulary (with the likely resulting loss of nuance), while provenance by RIs would address this either by enriching
still retaining the benefits of cross-RI search and discovery. the basic metadata for resources, or by providing additional
sources of provenance data that could be integrated with the
V. D ISCUSSION base metadata when producing unified catalogues.
The use of linked data [26] for describing resources (of The e-VRE reference architecture also addresses the need
all kinds) is already well-established, with research now for a workflow manager component, for composing processing
focusing on different approaches to generating linked data tasks in series or parallel on available computational resources.
from various sources and with how to navigate and query Most scientific investigations do follow a clear workflow,
distributed information—for example, recent research includes and there have been a number of workflow management
the generation of a navigable Graph of Things from an array systems developed with different characteristics and target
of live IoT data sources [27] and the use of crowdsourcing applications [38], several of which have been applied to sci-
to provide real-time transport data in rural areas [28], both ence [39]. The use of ontologies for verification and validation
topics with relevance to how RIs gather and expose field of workflows has already been explored (e.g. [40]), and the
observations acquired via sensors or human experts. On the ability to construct and validate such workflow specifications
topic of distributed query, various languages/frameworks have using metadata from service catalogues demonstrates that the
been proposed such as LDQL [29] and LILAC [30], which cataloguing problem is not wholly centred on datasets.
may make linked data based search over distributed catalogues VI. C ONCLUSION
more practical and efficient than is currently the case.
The Semantic Web is plagued by many of the problems In this paper we linked the development of VREs (also
of knowledge representation in AI including computability, science gateways and virtual laboratories) to the outgrowth
inconsistency and incompleteness, adding data redundancy, of dedicated RIs in Europe and beyond, and argued the need
unreliability and limited performance versus more tightly for new VREs that can be freely coupled with different RI
integrated data models. Considerable attention has been given resources based on the requirements of researchers and the
to the openness, extensibility and computability of Semantic evolving data research environment. We asserted that metadata
Web standards, weighing different options (e.g. the use of mapping is needed to facilitate cross-RI search and discovery
SKOS over OWL [31], [32]). Most geospatial technologies due to the diversity of metadata schemes, vocabularies and
used by environmental science RIs today have been developed protocols used to access resource catalogue data published by
independently of the Semantic Web however, with recom- different RIs, and furthermore that it is useful to be able to
mendations such as INSPIRE16 being mostly disjoint from it, aggregate distributed resource metadata into a single logical
though technologies such as OGC’s GeoSPARQL17 attempt to catalogue. We outlined a methodology for building such a
address this. This poses a barrier for integration of geospatial catalogue based on the e-VRE reference architecture and the
catalogues published via CSW or OAI-PMH into the Semantic adoption of a robust metadata mapping pipeline for handling
Web, and adaptors are still needed to query such data sources heterogeneous data sources. We provided an example in the
and present responses in RDF format (e.g. [33]). VRE4EIC Metadata Portal of how the methodology is applied,
For mapping between a modest set of standards, man- using CERIF as a framework for aggregating resource meta-
ual mapping with tool support remains most practical, but data from different metadata catalogues provided by EPOS and
automation may help to accelerate the construction of new ENVRIplus. We described the application of X3ML mappings,
mappings. How to best map between ontologies (or other kinds constructed using the 3M editor, to translate ISO 19139 XML,
of schema) remains an open question, but mapping techniques CKAN, Dublin Core, DCAT-AP and OIL-E data into CERIF
can be evaluated by comparing performance against ontology RDF for ingestion into a CERIF catalogue. We considered how
sets covering the same domain (e.g. OntoFarm for conference the CERIF semantic layer can be augmented with vocabulary
organisation [34]). Multi-lingual support is also important in from OIL-E to further contextualise research entities, and how
collaboration; for example Bella et al. [35] address how to recent semantic harmonisation work in environmental science
conduct mapping based on more than just English syntax. RIs can further augment the capabilities of VREs as clients for
semantic faceted search of RI resources. Finally, we discussed
16 https://inspire.ec.europa.eu/ the role that some of the technologies identified have in other
17 http://www.opengeospatial.org/standards/geosparql research literature, examined some related work, and suggested
5
10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018
future avenues of investigation for coupling VREs with other [19] A. Nieva de la Hidalga, B. Magagna, M. Stocker, A. Hardisty, P. Martin,
types of service provided by RIs, e.g. provenance services. Z. Zhao, M. Atkinson, and K. Jeffery, “The ENVRI Reference Model
(ENVRI RM) version 2.2, 30th October 2017,” Nov. 2017. [Online].
Available: https://doi.org/10.5281/zenodo.1050349
ACKNOWLEDGEMENTS [20] ISO 10746-1, “Information technology—Open Distributed Processing—
This work was supported by the European Union’s Hori- Reference model: Overview,” International Organization for Standard-
ization, ISO/IEC Standard, 1998.
zon 2020 research and innovation programme under grant [21] P. Martin, P. Grosso, B. Magagna, H. Schentz, Y. Chen, A. Hardisty,
agreements 654182 (ENVRIplus project), 676247 (VRE4EIC W. Los, K. Jeffery, C. de Laat, and Z. Zhao, “Open information
project) and 643963 (SWITCH project). linking for environmental research infrastructures,” in 2015 IEEE 11th
International Conference on e-Science (e-Science). IEEE, 2015, pp.
513–520.
R EFERENCES [22] R. Arp, B. Smith, and A. D. Spear, Building ontologies with Basic
[1] L. Candela, D. Castelli, and P. Pagano, “Virtual research environments: Formal Ontology. The MIT Press, 2015.
an overview and a research agenda,” Data Science Journal, vol. 12, pp. [23] D. Bailo, D. Ulbricht, M. L. Nayembil, L. Trani, A. Spinuso, and
75–81, 2013. K. G. Jeffery, “Mapping solid earth data and research infrastructures
[2] Z. Zhao, P. Martin, C. de Laat, K. Jeffery, A. Jones, I. Taylor, to CERIF,” Procedia Computer Science, vol. 106, pp. 112–121, 2017.
A. Hardisty, M. Atkinson, A. Zuiderwijk, Y. Yin, and Y. Chen, “Time [24] W3C SPARQL Working Group, “SPARQL 1.1 overview,” W3C, W3C
critical requirements and technical considerations for advanced support Recommendation, 2013, http://www.w3.org/TR/2013/REC-sparql11-
environments for data-intensive research,” in 2nd International workshop overview-20130321/.
on Interoperable infrastructures for interdisciplinary big data sciences [25] Anaee-France semantic group, “AnaEE Thesaurus,” 2016. [Online].
(IT4RIs 16), in the context of IEEE Real-time System Symposium (RTSS), Available: http://dx.doi.org/10.15454/1.4894016754286177E12
Porto, Portugal, 2016. [26] T. Berners-Lee, “Linked data,” W3C Design Issues,
[3] E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workflows and 2006, accessed 26th February 2018. [Online]. Available:
e-Science: An overview of workflow system features and capabilities,” https://www.w3.org/DesignIssues/LinkedData.html
Future Generation Computer Systems, vol. 25, no. 5, pp. 528–540, 2009. [27] D. Le-Phuoc, H. N. M. Quoc, H. N. Quoc, T. T. Nhat, and M. Hauswirth,
[4] P. Martin, Y. Chen, A. Hardisty, K. Jeffery, and Z. Zhao, “Computational “The graph of things: A step towards the live knowledge graph of
challenges in global environmental research infrastructures,” in Terres- connected things,” Web Semantics: Science, Services and Agents on the
trial Ecosystem Research Infrastructures: Challenges and Opportunities, World Wide Web, vol. 37, pp. 25–35, 2016.
A. Chabbi and H. W. Loescher, Eds. CRC Press, 2017, ch. 12, pp. [28] D. Corsar, P. Edwards, J. Nelson, C. Baillie, K. Papangelis, and
305–340. N. Velaga, “Linking open data and the crowd for real-time passenger
[5] European Commission, “Realising the european open science cloud,” information,” Web Semantics: Science, Services and Agents on the World
2016. Wide Web, vol. 43, pp. 18–24, 2017.
[6] Y. Marketakis, N. Minadakis, H. Kondylakis, K. Konsolaki, G. Samar- [29] O. Hartig and J. Pérez, “LDQL: A query language for the web of linked
itakis, M. Theodoridou, G. Flouris, and M. Doerr, “X3ML mapping data,” Web Semantics: Science, Services and Agents on the World Wide
framework for information integration in cultural heritage and beyond,” Web, vol. 41, pp. 9–29, 2016.
International Journal on Digital Libraries, pp. 1–19, 2016. [30] G. Montoya, H. Skaf-Molli, P. Molli, and M.-E. Vidal, “Decomposing
[7] ISO 19139:2007, “Geographic information—Metadata—XML schema federated queries in presence of replicated fragments,” Web Semantics:
implementation,” International Organization for Standardization, ISO/TS Science, Services and Agents on the World Wide Web, vol. 42, pp. 1–18,
Standard, 2007. 2017.
[8] J. Erickson and F. Maali, “Data catalog vocabulary (DCAT),” W3C, [31] A. Stellato, “Dictionary, thesaurus or ontology? disentangling our
W3C Recommendation, 2014, http://www.w3.org/TR/2014/REC-vocab- choices in the semantic web jungle,” Journal of Integrative Agriculture,
dcat-20140116/. vol. 11, no. 5, pp. 710–719, 2012.
[9] B. Jörg, “CERIF: The common european research information format [32] T. Baker, S. Bechhofer, A. Isaac, A. Miles, G. Schreiber, and E. Sum-
model,” Data Science Journal, vol. 9, pp. 24–31, 2010. mers, “Key choices in the design of simple knowledge organization
[10] ISO 19115-1:2014, “Geographic information—Metadata—Part 1: Fun- system (SKOS),” Web Semantics: Science, Services and Agents on the
damentals,” International Organization for Standardization, ISO Stan- World Wide Web, vol. 20, pp. 35–49, 2013.
dard, 2014. [33] K. Patroumpas, N. Georgomanolis, T. Stratiotis, M. Alexakis, and
[11] D. Nebert, U. Voges, and L. Bigagli, “OGC catalogue services S. Athanasiou, “Exposing INSPIRE on the semantic web,” Web Seman-
3.0—general model,” Open Geospatial Consortium, OGC Implemen- tics: Science, Services and Agents on the World Wide Web, vol. 35, pp.
tation Standard, 2016, http://docs.opengeospatial.org/is/12-168r6/12- 53–62, 2015.
168r6.html. [34] O. Zamazal and V. Svátek, “The ten-year OntoFarm and its fertilization
[12] C. Lagoze and H. Van de Sompel, “The making of the open archives within the onto-sphere,” Web Semantics: Science, Services and Agents
initiative protocol for metadata harvesting,” Library hi tech, vol. 21, on the World Wide Web, vol. 43, pp. 46–53, 2017.
no. 2, pp. 118–128, 2003. [35] G. Bella, F. Giunchiglia, and F. McNeill, “Language and domain aware
[13] T. Berners-Lee, J. Hendler, O. Lassila et al., “The semantic web,” lightweight ontology matching,” Web Semantics: Science, Services and
Scientific american, vol. 284, no. 5, pp. 28–37, 2001. Agents on the World Wide Web, vol. 43, pp. 1–17, 2017.
[14] W3C OWL Working Group, “OWL 2 web ontology language,” W3C, [36] D. McGuinness, S. Sahoo, and T. Lebo, “PROV-O: The PROV ontology,”
W3C Recommendation, 2012, https://www.w3.org/TR/2012/REC-owl2- W3C, W3C Recommendation, 2013, http://www.w3.org/TR/2013/REC-
overview-20121211/. prov-o-20130430/.
[15] S. Bechhofer and A. Miles, “SKOS simple knowledge orga- [37] I. Altintas, O. Barney, and E. Jaeger-Frank, “Provenance collection
nization system reference,” W3C, W3C Recommendation, 2009, support in the Kepler scientific workflow system,” Provenance and
http://www.w3.org/TR/2009/REC-skos-reference-20090818/. annotation of data, pp. 118–132, 2006.
[16] J. Madin, S. Bowers, M. Schildhauer, S. Krivov, D. Pennington, and [38] C. S. Liew, M. P. Atkinson, M. Galea, T. F. Ang, P. Martin, and
F. Villa, “An ontology for describing and synthesizing ecological obser- J. I. V. Hemert, “Scientific workflows: Moving across paradigms,” ACM
vation data,” Ecological informatics, vol. 2, no. 3, pp. 279–296, 2007. Comput. Surv., vol. 49, no. 4, pp. 66:1–66:39, Dec. 2016. [Online].
[17] H. Schentz, J. Peterseil, and N. Bertrand, “Envthes-interlinked thesaurus Available: http://doi.acm.org/10.1145/3012429
for long term ecological research, monitoring, and experiments.” in [39] R. Mork, P. Martin, and Z. Zhao, “Contemporary challenges for data-
EnviroInfo, 2013, pp. 824–832. intensive scientific workflow management systems,” in Proceedings of
[18] K. G. Jeffery, C. Meghini, C. Concordia, T. Patkos, V. Brasse, J. v. the 10th Workshop on Workflows in Support of Large-Scale Science.
Ossenbruck, Y. Marketakis, N. Minadakis, and E. Marchetti, “A refer- ACM, 2015, p. 4.
ence architecture for virtual research environments,” in Proceedings of [40] T. Miksa and A. Rauber, “Using ontologies for verification and valida-
the 15th International Symposium of Information Science (ISI 2017). tion of workflow-based experiments,” Web Semantics: Science, Services
Verlag Werner Hulsbusch, 2017, pp. 76–88. and Agents on the World Wide Web, vol. 43, pp. 25–45, 2017.
6