10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 Mapping metadata from different research infrastructures into a unified framework for use in a virtual research environment Paul Martin∗ , Laurent Remy† , Maria Theodoridou‡ , Keith Jeffery§ and Zhiming Zhao∗ ∗ Institute for Informatics, University of Amsterdam, Amsterdam, Netherlands † euroCRIS / IS4RI, France ‡ Institute of Computer Science, Foundation for Research and Technology—Hellas, Heraklion, Greece § Keith G Jeffery Consultants, United Kingdom Emails: {p.w.martin, z.zhao}@uva.nl, lremy@is4ri.com, maria@ics.forth.gr, keith.jeffery@keithgjefferyconsultants.co.uk Abstract—Virtual Research Environments (VREs) augment contrary to the recent drive towards open science and open research activities by integrating tools for data discovery, data data, which discourages ‘walled garden’ solutions. retrieval, workflow management and researcher collaboration, Increasingly, what we observe instead is the creation of often coupled with a specific computing infrastructure. The drive towards open data science discourages ‘walled garden’ solutions dedicated research infrastructures (RIs) that aggregate and however, and has led to the creation of dedicated research curate scientific data (including real-time observations) for a infrastructures (RIs) that gather data and provide services to particular research community, which then provide access to particular research communities without prejudice towards any these data via unified services [4], usually without prejudice particular science gateway or virtual laboratory technology. towards any particular VRE. Complicating this matter, there There is a need for generic VREs that can be easily customised to the needs of specific communities and coupled with the is now a substantive push to better integrate these efforts into services and resources of many different RIs, but the resource a cohesive multidisciplinary commons for open science and metadata produced by these RIs rarely adheres perfectly to any open research data, as embodied by initiatives such as the particular standard or vocabulary, making it difficult to search European Open Science Cloud (EOSC) [5]. and discover resources independently of their provider. Cross-RI Developing generic VREs that can be easily coupled with search can be expedited by metadata mapping services that can harvest metadata published under different standards to build different RIs and customised for specific communities is a goal unified resource catalogues—such an approach poses a number of many recent research projects, including VRE4EIC1 and of challenges however. In this paper we take the example of the BlueBRIDGE2 , and is particularly challenging given the lack VRE4EIC e-VRE metadata service, which uses X3ML mappings of conformity of standards and vocabularies in environmental to build a single CERIF catalogue for describing data products science and similar domains. Significant software engineering and other resources provided by multiple RIs. We consider the extent to which it addresses the challenge of cross-RI search, effort is often required on the behalf of data scientists to build and we also discuss how it might take advantage of semantic specific adaptors for such couplings, but even then it remains harmonisation efforts in the environmental science domain. crucial to provide the capability to search across different RIs Keywords—virtual research environment, research infrastruc- for similar data products or services to support integrative and ture, metadata catalogue, metadata mapping. transdisciplinary research. This entails a complex interaction between a VRE and multiple RIs, distributing queries through I. I NTRODUCTION multiple adaptors and then aggregating the results—or else a prior harvesting of metadata from all providers to allow pre- Virtual Research Environments (VREs) [1], also known as liminary queries to be conducted on a single logical catalogue. virtual laboratories or science gateways, are one of three In this paper we investigate how the use of a flexible types of science support environment developed to support metadata mapping and publication service can expedite the researchers in data science [2], focusing on supporting research coupling of a VRE with RI resources using different metadata activities on a holistic rather than infrastructural or service schemes to provide cross-RI metadata search and discovery. level. VREs provide integrated environments that typically As a case study, we take the VRE4EIC metadata service, include tools for activities such as data discovery and retrieval, developed as a building block for an RI-agnostic VRE, and collaboration, process scheduling and workflow management, we detail how X3ML mappings [6] from standards such as and many are coupled with a particular computational infras- ISO 19139 [7] and DCAT [8] to CERIF [9] are used to tructure, often making use of public e-infrastructures or the automatically ingest metadata published by different RIs to Cloud. Data are brought into that infrastructure and manip- ulated via a particular data processing platform or scientific 1 https://www.vre4eic.eu/ workflow management system [3]—however this approach is 2 http://www.bluebridge-vres.eu/ 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 produce a single resource catalogue. We weigh the benefits of Graphical User Interface this approach and discuss some ways in which such catalogues Authentication, Authorisation, Accounting Infrastructure (AAAI) Application Metadata Manager can be further augmented, for example to facilitate semantic Linked Data tier Resource Manager System Manager Workflow Manager Manager search based on the harmonisation of vocabularies used for Data Model describing ecosystem and biodiversity data. Mapper Interoperability e-VRE Web Service Metadata Manager tier II. BACKGROUND Message Oriented Middleware Adapter Interoperability Manager Modern environmental research depends on the collection Resource access tier Metadata Service and analysis of large volumes of data gathered via sensors, Research Infrastructure resources observations, simulations and experimentation. Researchers provides functionality are called upon to address societal challenges that are inex- tricably tied to the stability of our native ecosystems such as Fig. 1. Providing a metadata service: the recommended microservice stack food security and climate management, challenges intrinsically to implement the metadata manager in the e-VRE reference architecture. interdisciplinary in nature, requiring collaboration across tra- ditional disciplinary boundaries. The role of RIs in this context create «Data transfer is to support researchers with data, platforms and tools, but no service» single RI can hope to encompass the full research ecosystem. request data prepare storage The challenge therefore is to help researchers to freely and «Instrument «Raw data «Data store controller» collector» controller» effectively interact with the full range of research assets deliver raw data import data for curation potentially available to them across many RIs, allowing them to collaborate and conduct their research more effectively. «PID service» «Catalogue service» Publishing metadata about resources online (indicating type, acquire identifier update catalogues coverage, provenance, etc.) allows RIs to advertise their facil- ities and researchers to browse and discover data and other Fig. 2. A computational view of raw data acquisition: ENVRI RM specifies resources useful to their research. While there exist standards components and activities using UML (in this case, a component diagram). such as ISOs 19115 [10] and 19139 [7] for geospatial metadata however, the implementation of such standards by RIs can be somewhat idiosyncratic. Resource catalogues themselves RM-ODP [20], it models RIs from five viewpoints: science, can be described using standards such as DCAT [8] and information, computation, engineering and technology. Each harvested via CSW [11] or OAI-PMH [12], but many RIs view has its own concerns that correspond to those of the also use Semantic Web [13] technologies such as OWL [14] other views, and is able to describe various key RI activities and SKOS [15] to describe their resources, adapting ontologies (e.g. Figure 2). Open Information Linking for Environmental such as OBOE [16] (for observations) and vocabularies such RIs (OIL-E) [21] is a small set of OWL specifications based on as EnvThes [17] (for ecology) to meet their own community’s ENVRI RM that provide an upper ontology for RI descriptions needs. Harmonisation of vocabulary and metadata between and which can be used to contextualise different kinds of RI RIs thus remains a concern, with cluster projects such as asset from an architectural or interaction-based perspective— ENVRIplus3 working to promote common models. Concur- as opposed to being a general-purpose ontology for describing rently, initiatives like RDA4 address broader research data scientific phenomena like BFO [22]. A conceptual model with management issues such as metadata standards cataloguing, a similar focus on the products and tools of research rather standards for data collections and interoperability between than on scientific classification itself is CERIF [9], a European repositories, providing recommendations to such projects. standard for describing research information systems. CERIF From the VRE perspective, it is necessary to be pragmatic provides a framework for describing relationships between when coupling with the services provided by RIs, a process people, projects, tools and research products (and more), and that can also be assisted by the use of standard models and vo- has been applied to describing solid earth science RIs [23]. cabularies. Jeffery et al. [18] define a reference architecture for These models provide both the means to talk about research enhanced VREs (‘e-VREs’) able to work with many different support environments such as VREs and RIs in a standard way, RIs and e-infrastructures. In this architecture, microservices but can also be leveraged as a means to better classify different are used to implement each of six key building blocks split kinds of resource as part of a faceted search mechanism, as we across three tiers of operation, as shown in Figure 1 for shall discuss later in Section IV. For now, we consider how the case of the metadata management. Meanwhile Nieva et VREs can be constructed that support rather than are hindered al. [19] describe a reference model (ENVRI RM) for envi- by the heterogeneity of RI resources and resource metadata, ronmental science RIs, defining their archetypical elements and how a VRE can facilitate cross-RI search and discovery. in the context of the research data lifecycle. Being based on III. M ETHODOLOGY AND CHALLENGES 3 http://www.envriplus.eu/ According to Jeffery et al. [18], VREs can retrieve descrip- 4 https://rd-alliance.org/ tions of RIs’ resources either via separate interfaces with each 2 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 Metadata Manager VRE Catalogue Research Infrastructure A Adaptor A Catalogue A Virtual Research Research Environment Adaptor B Catalogue B Infrastructure B Adaptor C Catalogue C Research Interoperability Manager Infrastructure C composition (part of) accesses publishes to Fig. 3. An e-VRE produces adaptors to harvest and convert metadata from different catalogues, building a common metadata catalogue for its users. RI’s own resource catalogue, or via a joint resource catalogue that already encompasses all of the RIs’ resources. The former Fig. 4. The VRE4EIC metadata portal: searching for data publications approach relies on the construction of separate discovery and published by Anna Artese through CNR Pisa’s mass spectrometry analytical access interfaces with every RI, and makes it difficult to laboratory. search over multiple RI resource catalogues simultaneously, requiring the translation and distribution of queries over every interface. Meanwhile, the latter approach simplifies search and catalogues in reasonable time. discovery, but requires initial harvesting of metadata from all 3) How to manage the underlying catalogue schema—given separate RI catalogues, translation of all metadata into a single new vocabularies, standards or simply evolution in how common denominator standard, and careful management as the standards are applied, how to update the model underly- number of original data sources scales upwards. ing a catalogue without losing existing data coherence. In terms of the e-VRE reference architecture [18], there are 4) How to manage ever larger quantities of data—whether a few needed steps to harvest resource metadata from an RI: by relying on more capable database technologies, dis- 1) A resource catalogue provided by an RI is identified tribution of the catalogue, or dynamic construction of for harvesting. Identification might be performed by a the catalogue ‘on demand’ based on prior queries. discovery service, or be part of the manual configuration In light of these challenges, we consider a particular im- of a customised VRE metadata catalogue. plementation of the resource metadata harvesting approach 2) The VRE’s interoperability manager must provide an described above based on certain key technologies. adaptor for the given resource catalogue—essentially, IV. I MPLEMENTATION the VRE must have the means to interact with the catalogue via the correct protocol (e.g. OAI-PMH or The VRE4EIC Metadata Portal has been developed in SPARQL [24]), but also have a model for (at least accordance with the e-VRE reference architecture, providing partially) mapping metadata retrieved from the source the necessary components to implement the metadata manager scheme to the scheme used internally by the VRE. functionality. The purpose of the portal is to provide faceted 3) The adaptor can then be used to harvest metadata records search over catalogue data harvested from multiple RIs, ag- from the source, mapping them into a format suitable for gregated within a single CERIF-based VRE catalogue. Search ingestion into the VRE’s own metadata catalogue. is based on the composition of queries based on the context 4) This ingested data is then made available to users of the of the research data, filtering by organisations, projects, sites, VRE via its own search and query interface. instruments, people, etc., for example as shown in Figure 4. The main entities involved in this process are shown in The portal supports map-based search, the export and storing Figure 3. In this example, the result is that metadata can of specific queries, and the export of results in various formats. now be harvested by the VRE’s metadata manager using The CERIF catalogue itself is implemented in RDF (based the adaptors provided by the interoperability manager. This on an OWL ontology) as a Blazegraph5 triple store and is activity may be a one-off event, but more likely the metadata structured according to CERIF version 1.66 . harvested will need to be periodically updated. Metadata harvested from external sources is converted to Whatever the chosen approach however, any VRE catalogu- CERIF RDF using the X3ML mapping framework [6]. The ing solution should try to address certain challenges: mapping process is as illustrated in Figure 5: 1) How best to discover new resources—a VRE catalogue 1) Sample metadata, along with their corresponding meta- may be carefully curated for a given community, but data schemes are retrieved for analysis. even if automation is rejected, there should be a clear 2) Mappings are defined that dictate the transformation of process for how to expand the catalogue. the selected RDF and XML based schemas to CERIF. 2) How to ensure the freshness of catalogue data—ensuring 5 https://www.blazegraph.com/ that updates to source catalogues are propagated to VRE 6 https://www.eurocris.org/cerif/main-features-cerif 3 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 In summary, the Portal has many desirable characteristics: a flexible model in CERIF for integrating heterogeneous meta- data, a tool-assisted metadata mapping pipeline to easily create or refine metadata mappings or refine existing mappings, and a mature technology base for unified VRE catalogues. What we foresee more development needed in is the discovery of new resources and the acquisition of updates. In this respect, RI-side services for advertisement of new resources or updates to which a VRE can subscribe to trigger automated ingestion of new or modified metadata would be particularly useful. The VRE4EIC Metadata Portal has been provided as a demonstrator to the cluster of environmental science RIs in Europe via the ENVRIplus project as well as directly to Fig. 5. e-VRE metadata acquisition and retrieval workflow: metadata records the European Plate Observing System (EPOS)10 , with sample are acquired from multiple sources, mapped to CERIF RDF and stored in the data harvested from a subset of those RIs. Evaluation of VRE catalogue; authenticated VRE users query data via the e-VRE. the demonstrator indicates a number of possible avenues of development, particularly with regard to supporting richer cross-RI search, the two most noteworthy here being: 1) Further exploitation of CERIF’s semantic layer. 2) Integration of semantic search facilities. A notable feature of CERIF is how it separates its semantic layer from its primary entity-relationship model. Most CERIF relations are semantically agnostic, lacking any particular in- terpretation beyond identifying a link. Almost every entity and relation can be assigned though a classification that indicates a particular semantic interpretation (e.g. that the relationship between a Person and a Product is that of a creator), allowing a Fig. 6. Example of mapping rules generated in 3M: result metadata in CKAN CERIF database to be enriched with concepts from an external is mapped to a CERIF product with data properties corresponding to each possible attribute in the original CKAN XML scheme. semantic model (or several linked models). The vocabulary provided by OIL-E11 has been identified within VRE4EIC as a means to further classify objects in 3) Metadata is retrieved from different data sources in their CERIF in terms of their role in an RI, e.g. classifying native format, e.g. as ISO 19139 or CKAN7 data. individuals and facilities by the roles they play in research 4) These mappings are used to transform the source data activities, datasets in terms of the research data lifecycle, into CERIF format. or computational services by the functions they enable. This 5) The transformed data are ingested into the CERIF meta- provides additional operational context for faceted search data catalogue. (e.g. identifying which processes generated a given data prod- Once ingested, these data become available to users of the uct), but providing additional context into the scientific context metadata portal, who can query and browse data upon authen- for data products (e.g. categorising the experimental method tication by the front-end authentication/authorisation service. applied or the branch of science to which it belongs) is also X3ML mappings are described using the 3M Mapping necessary. Environmental science RIs such as AnaEE12 and Memory Manager8 . Mappings are described by mapping rules LTER-Europe13 are actively developing better vocabularies for relating subject-property-object triples from the source scheme describing ecosystem and biodiversity research data, building to equivalent structures in the target scheme, subject to various upon existing SKOS vocabularies. The AnaEE data vocab- syntactic conditions, as illustrated in Figure 6. 3M supports the ulary (anaeeThes) [25] and LTER’s environmental thesaurus specification of generators to produce identifiers for new con- EnvThes [17] have mappings to other established domain cepts constructed during translation of terms, and provides test vocabularies such as Agrovoc14 and GEMET15 . These RIs and analytics facilities. Mappings into CERIF RDF have been are now collaborating with other RIs involved in ENVRIplus produced for Dublin Core, CKAN, DCAT-AP, and ISO 19139 to harmonise their vocabularies in order to provide semantic metadata, as well as RI architecture descriptions in OIL-E, as linking between terms used in their respective sub-domains. part of the technical output of the VRE4EIC project9 . 10 https://www.epos-ip.org/ 11 http://oil-e.net/ontology/ 7 https://ckan.org/ 12 https://www.anaee.com/ 8 https://github.com/isl/Mapping-Memory-Manager 13 http://www.lter-europe.net/lter-europe 9 Mappings are accessible at http://www.ics.forth.gr/isl/3M-VRE4EIC, user- 14 http://aims.fao.org/standards/agrovoc name ‘vre4eicGuest’ and password ‘vre4eic’. 15 http://www.eionet.europa.eu/gemet/ 4 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 The identification of synonymous, subsuming and intersecting It is not only resource metadata that can be usefully accessed terms (and the publication of links on the Semantic Web) via a VRE. Access to provenance data (which might be struc- provides the basis for better semantic search, whereby a greater tured according to a standard such as PROV-O [36]) for data range of data products with similar characteristics can be products and processes would also be useful to researchers, retrieved on query without necessarily sharing precisely the and VREs can also be contributors of provenance data via their same controlled vocabulary for their metadata. Making use of own workflow systems (e.g. for Kepler [37]). CERIF is able to such linked vocabulary would simplify the task of integrating represent time-bounded role-based semantic relationships, but resource metadata from multiple catalogues as it would reduce the source metadata provided by RIs still often lacks this kind the need to map all metadata values into a single master of information; the adoption of standardised and ubiquitous vocabulary (with the likely resulting loss of nuance), while provenance by RIs would address this either by enriching still retaining the benefits of cross-RI search and discovery. the basic metadata for resources, or by providing additional sources of provenance data that could be integrated with the V. D ISCUSSION base metadata when producing unified catalogues. The use of linked data [26] for describing resources (of The e-VRE reference architecture also addresses the need all kinds) is already well-established, with research now for a workflow manager component, for composing processing focusing on different approaches to generating linked data tasks in series or parallel on available computational resources. from various sources and with how to navigate and query Most scientific investigations do follow a clear workflow, distributed information—for example, recent research includes and there have been a number of workflow management the generation of a navigable Graph of Things from an array systems developed with different characteristics and target of live IoT data sources [27] and the use of crowdsourcing applications [38], several of which have been applied to sci- to provide real-time transport data in rural areas [28], both ence [39]. The use of ontologies for verification and validation topics with relevance to how RIs gather and expose field of workflows has already been explored (e.g. [40]), and the observations acquired via sensors or human experts. On the ability to construct and validate such workflow specifications topic of distributed query, various languages/frameworks have using metadata from service catalogues demonstrates that the been proposed such as LDQL [29] and LILAC [30], which cataloguing problem is not wholly centred on datasets. may make linked data based search over distributed catalogues VI. C ONCLUSION more practical and efficient than is currently the case. The Semantic Web is plagued by many of the problems In this paper we linked the development of VREs (also of knowledge representation in AI including computability, science gateways and virtual laboratories) to the outgrowth inconsistency and incompleteness, adding data redundancy, of dedicated RIs in Europe and beyond, and argued the need unreliability and limited performance versus more tightly for new VREs that can be freely coupled with different RI integrated data models. Considerable attention has been given resources based on the requirements of researchers and the to the openness, extensibility and computability of Semantic evolving data research environment. We asserted that metadata Web standards, weighing different options (e.g. the use of mapping is needed to facilitate cross-RI search and discovery SKOS over OWL [31], [32]). Most geospatial technologies due to the diversity of metadata schemes, vocabularies and used by environmental science RIs today have been developed protocols used to access resource catalogue data published by independently of the Semantic Web however, with recom- different RIs, and furthermore that it is useful to be able to mendations such as INSPIRE16 being mostly disjoint from it, aggregate distributed resource metadata into a single logical though technologies such as OGC’s GeoSPARQL17 attempt to catalogue. We outlined a methodology for building such a address this. This poses a barrier for integration of geospatial catalogue based on the e-VRE reference architecture and the catalogues published via CSW or OAI-PMH into the Semantic adoption of a robust metadata mapping pipeline for handling Web, and adaptors are still needed to query such data sources heterogeneous data sources. We provided an example in the and present responses in RDF format (e.g. [33]). VRE4EIC Metadata Portal of how the methodology is applied, For mapping between a modest set of standards, man- using CERIF as a framework for aggregating resource meta- ual mapping with tool support remains most practical, but data from different metadata catalogues provided by EPOS and automation may help to accelerate the construction of new ENVRIplus. We described the application of X3ML mappings, mappings. How to best map between ontologies (or other kinds constructed using the 3M editor, to translate ISO 19139 XML, of schema) remains an open question, but mapping techniques CKAN, Dublin Core, DCAT-AP and OIL-E data into CERIF can be evaluated by comparing performance against ontology RDF for ingestion into a CERIF catalogue. We considered how sets covering the same domain (e.g. OntoFarm for conference the CERIF semantic layer can be augmented with vocabulary organisation [34]). Multi-lingual support is also important in from OIL-E to further contextualise research entities, and how collaboration; for example Bella et al. [35] address how to recent semantic harmonisation work in environmental science conduct mapping based on more than just English syntax. RIs can further augment the capabilities of VREs as clients for semantic faceted search of RI resources. Finally, we discussed 16 https://inspire.ec.europa.eu/ the role that some of the technologies identified have in other 17 http://www.opengeospatial.org/standards/geosparql research literature, examined some related work, and suggested 5 10th International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 future avenues of investigation for coupling VREs with other [19] A. Nieva de la Hidalga, B. Magagna, M. Stocker, A. Hardisty, P. Martin, types of service provided by RIs, e.g. provenance services. Z. Zhao, M. Atkinson, and K. Jeffery, “The ENVRI Reference Model (ENVRI RM) version 2.2, 30th October 2017,” Nov. 2017. [Online]. Available: https://doi.org/10.5281/zenodo.1050349 ACKNOWLEDGEMENTS [20] ISO 10746-1, “Information technology—Open Distributed Processing— This work was supported by the European Union’s Hori- Reference model: Overview,” International Organization for Standard- ization, ISO/IEC Standard, 1998. zon 2020 research and innovation programme under grant [21] P. Martin, P. Grosso, B. Magagna, H. Schentz, Y. Chen, A. Hardisty, agreements 654182 (ENVRIplus project), 676247 (VRE4EIC W. Los, K. Jeffery, C. de Laat, and Z. Zhao, “Open information project) and 643963 (SWITCH project). linking for environmental research infrastructures,” in 2015 IEEE 11th International Conference on e-Science (e-Science). IEEE, 2015, pp. 513–520. R EFERENCES [22] R. Arp, B. Smith, and A. D. Spear, Building ontologies with Basic [1] L. Candela, D. Castelli, and P. Pagano, “Virtual research environments: Formal Ontology. The MIT Press, 2015. an overview and a research agenda,” Data Science Journal, vol. 12, pp. [23] D. Bailo, D. Ulbricht, M. L. Nayembil, L. Trani, A. Spinuso, and 75–81, 2013. K. G. Jeffery, “Mapping solid earth data and research infrastructures [2] Z. Zhao, P. Martin, C. de Laat, K. Jeffery, A. Jones, I. Taylor, to CERIF,” Procedia Computer Science, vol. 106, pp. 112–121, 2017. A. Hardisty, M. Atkinson, A. Zuiderwijk, Y. Yin, and Y. Chen, “Time [24] W3C SPARQL Working Group, “SPARQL 1.1 overview,” W3C, W3C critical requirements and technical considerations for advanced support Recommendation, 2013, http://www.w3.org/TR/2013/REC-sparql11- environments for data-intensive research,” in 2nd International workshop overview-20130321/. on Interoperable infrastructures for interdisciplinary big data sciences [25] Anaee-France semantic group, “AnaEE Thesaurus,” 2016. [Online]. (IT4RIs 16), in the context of IEEE Real-time System Symposium (RTSS), Available: http://dx.doi.org/10.15454/1.4894016754286177E12 Porto, Portugal, 2016. [26] T. Berners-Lee, “Linked data,” W3C Design Issues, [3] E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workflows and 2006, accessed 26th February 2018. [Online]. Available: e-Science: An overview of workflow system features and capabilities,” https://www.w3.org/DesignIssues/LinkedData.html Future Generation Computer Systems, vol. 25, no. 5, pp. 528–540, 2009. [27] D. Le-Phuoc, H. N. M. Quoc, H. N. Quoc, T. T. Nhat, and M. Hauswirth, [4] P. Martin, Y. Chen, A. Hardisty, K. Jeffery, and Z. Zhao, “Computational “The graph of things: A step towards the live knowledge graph of challenges in global environmental research infrastructures,” in Terres- connected things,” Web Semantics: Science, Services and Agents on the trial Ecosystem Research Infrastructures: Challenges and Opportunities, World Wide Web, vol. 37, pp. 25–35, 2016. A. Chabbi and H. W. Loescher, Eds. CRC Press, 2017, ch. 12, pp. [28] D. Corsar, P. Edwards, J. Nelson, C. Baillie, K. Papangelis, and 305–340. N. Velaga, “Linking open data and the crowd for real-time passenger [5] European Commission, “Realising the european open science cloud,” information,” Web Semantics: Science, Services and Agents on the World 2016. Wide Web, vol. 43, pp. 18–24, 2017. [6] Y. Marketakis, N. Minadakis, H. Kondylakis, K. Konsolaki, G. Samar- [29] O. Hartig and J. Pérez, “LDQL: A query language for the web of linked itakis, M. Theodoridou, G. Flouris, and M. Doerr, “X3ML mapping data,” Web Semantics: Science, Services and Agents on the World Wide framework for information integration in cultural heritage and beyond,” Web, vol. 41, pp. 9–29, 2016. International Journal on Digital Libraries, pp. 1–19, 2016. [30] G. Montoya, H. Skaf-Molli, P. Molli, and M.-E. Vidal, “Decomposing [7] ISO 19139:2007, “Geographic information—Metadata—XML schema federated queries in presence of replicated fragments,” Web Semantics: implementation,” International Organization for Standardization, ISO/TS Science, Services and Agents on the World Wide Web, vol. 42, pp. 1–18, Standard, 2007. 2017. [8] J. Erickson and F. Maali, “Data catalog vocabulary (DCAT),” W3C, [31] A. Stellato, “Dictionary, thesaurus or ontology? disentangling our W3C Recommendation, 2014, http://www.w3.org/TR/2014/REC-vocab- choices in the semantic web jungle,” Journal of Integrative Agriculture, dcat-20140116/. vol. 11, no. 5, pp. 710–719, 2012. [9] B. Jörg, “CERIF: The common european research information format [32] T. Baker, S. Bechhofer, A. Isaac, A. Miles, G. Schreiber, and E. Sum- model,” Data Science Journal, vol. 9, pp. 24–31, 2010. mers, “Key choices in the design of simple knowledge organization [10] ISO 19115-1:2014, “Geographic information—Metadata—Part 1: Fun- system (SKOS),” Web Semantics: Science, Services and Agents on the damentals,” International Organization for Standardization, ISO Stan- World Wide Web, vol. 20, pp. 35–49, 2013. dard, 2014. [33] K. Patroumpas, N. Georgomanolis, T. Stratiotis, M. Alexakis, and [11] D. Nebert, U. Voges, and L. Bigagli, “OGC catalogue services S. Athanasiou, “Exposing INSPIRE on the semantic web,” Web Seman- 3.0—general model,” Open Geospatial Consortium, OGC Implemen- tics: Science, Services and Agents on the World Wide Web, vol. 35, pp. tation Standard, 2016, http://docs.opengeospatial.org/is/12-168r6/12- 53–62, 2015. 168r6.html. [34] O. Zamazal and V. Svátek, “The ten-year OntoFarm and its fertilization [12] C. Lagoze and H. Van de Sompel, “The making of the open archives within the onto-sphere,” Web Semantics: Science, Services and Agents initiative protocol for metadata harvesting,” Library hi tech, vol. 21, on the World Wide Web, vol. 43, pp. 46–53, 2017. no. 2, pp. 118–128, 2003. [35] G. Bella, F. Giunchiglia, and F. McNeill, “Language and domain aware [13] T. Berners-Lee, J. Hendler, O. Lassila et al., “The semantic web,” lightweight ontology matching,” Web Semantics: Science, Services and Scientific american, vol. 284, no. 5, pp. 28–37, 2001. Agents on the World Wide Web, vol. 43, pp. 1–17, 2017. [14] W3C OWL Working Group, “OWL 2 web ontology language,” W3C, [36] D. McGuinness, S. Sahoo, and T. Lebo, “PROV-O: The PROV ontology,” W3C Recommendation, 2012, https://www.w3.org/TR/2012/REC-owl2- W3C, W3C Recommendation, 2013, http://www.w3.org/TR/2013/REC- overview-20121211/. prov-o-20130430/. [15] S. Bechhofer and A. Miles, “SKOS simple knowledge orga- [37] I. Altintas, O. Barney, and E. Jaeger-Frank, “Provenance collection nization system reference,” W3C, W3C Recommendation, 2009, support in the Kepler scientific workflow system,” Provenance and http://www.w3.org/TR/2009/REC-skos-reference-20090818/. annotation of data, pp. 118–132, 2006. [16] J. Madin, S. Bowers, M. Schildhauer, S. Krivov, D. Pennington, and [38] C. S. Liew, M. P. Atkinson, M. Galea, T. F. Ang, P. Martin, and F. Villa, “An ontology for describing and synthesizing ecological obser- J. I. V. Hemert, “Scientific workflows: Moving across paradigms,” ACM vation data,” Ecological informatics, vol. 2, no. 3, pp. 279–296, 2007. Comput. Surv., vol. 49, no. 4, pp. 66:1–66:39, Dec. 2016. [Online]. [17] H. Schentz, J. Peterseil, and N. Bertrand, “Envthes-interlinked thesaurus Available: http://doi.acm.org/10.1145/3012429 for long term ecological research, monitoring, and experiments.” in [39] R. Mork, P. Martin, and Z. Zhao, “Contemporary challenges for data- EnviroInfo, 2013, pp. 824–832. intensive scientific workflow management systems,” in Proceedings of [18] K. G. Jeffery, C. Meghini, C. Concordia, T. Patkos, V. Brasse, J. v. the 10th Workshop on Workflows in Support of Large-Scale Science. Ossenbruck, Y. Marketakis, N. Minadakis, and E. Marchetti, “A refer- ACM, 2015, p. 4. ence architecture for virtual research environments,” in Proceedings of [40] T. Miksa and A. Rauber, “Using ontologies for verification and valida- the 15th International Symposium of Information Science (ISI 2017). tion of workflow-based experiments,” Web Semantics: Science, Services Verlag Werner Hulsbusch, 2017, pp. 76–88. and Agents on the World Wide Web, vol. 43, pp. 25–45, 2017. 6