<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data2Services: enabling automated conversion of data to services</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vincent Emonet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Malic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amrapali Zaveri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreea Grigoriu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michel Dumontier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Data Science, Maastricht University</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>While data are becoming increasingly easy to nd and access on the Web, signi cant e ort and skill is still required to process the amount and diversity of data into convenient formats that are friendly to the user. Moreover, these e orts are often duplicated and are hard to reuse. Here, we describe Data2Services, a new framework to semi-automatically process heterogeneous data into target data formats, databases and services. Data2Services uses Docker to faithfully execute data transformation pipelines. These pipelines automatically convert target data into a semantic knowledge graph that can be further re ned to conform to a particular data standard. The data can be loaded in a number of databases and are made accessible through native and autogenerated APIs. We describe the architecture and a prototype implementation for data in the life sciences.</p>
      </abstract>
      <kwd-group>
        <kwd>ETL</kwd>
        <kwd>data transformation</kwd>
        <kwd>data conversion</kwd>
        <kwd>API</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        There is a large and growing amount of valuable data available on the Web.
These data contain relevant information to answer questions and make novel
predictions. However, data come in a myriad of formats (e.g. CSV, XML, DB),
which makes them di cult to integrate into a coherent knowledge graph for
unspeci ed downstream use. Unsurprisingly, many tools have emerged to
facilitate integration and analysis of diverse data. However, data transformation
and integration often require substantial technical and domain expertise to do it
correctly. Moreover, such transformations are hard to nd and largely
incompatible across tool chains. Users duplicate e ort and are ultimately less productive
in achieving their true objectives. Thus, easy, reliable, and reproducible
transformation and publication of di erent kinds of data sources in target formats
are needed to maximize the potential to nd and reuse data in a manner that
follows the spirit of the FAIR (Findable, Accessible, Interoperable, Reusable)
principles [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>An important use case in the life sciences is the collection and analysis of
clinical and biomedical data to elucidate molecular mechanisms that underlie the
pathology and treatment of human disease. The National Center for
Advancing Translational Sciences (NCATS) Biomedical Data Translator program1 is
a iterative e ort to develop new software architectures and corresponding data
and software ecosystems to generate and explore biomedical hypotheses. This
project utilizes over 40 di erent datasets that are dispersed and represented in
largely incompatible data formats and standards. This lack of interoperability
makes it di cult to nd answers to even the simplest questions (e.g. how many
treatable diseases are there?), and even the more sophisticated questions that
are important for the implementation of personalized medicine.</p>
      <p>In this paper, we propose a software framework called Data2Services to
(semi)automatically process heterogeneous data (e.g. CSV, TSV, XML) into
a set of user-facing services (e.g. SPARQL endpoint, GraphQL endpoint, API).
Data2Services aims to enhance the availability of structured data to domain
experts by automatically providing a variety of services to the source data. This
open-source tool is composed of di erent Docker containers for e ortless usage
by expert and non-expert users alike. We demonstrate the utility of our tool by
applying it to a speci c query relevant to the Translator program.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data2Services Framework</title>
      <p>
        The overall framework, illustrated in Figure 1, is based on 4 key steps:
1. Automatic generation of RDF data from input data
2. Loading of RDF data into an RDF store
3. Transformation of RDF data to RDF standard
4. Auto-con guration and deployment of data access services
Data2Services makes use of the Resource Description Framework (RDF) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as a
formal, shared, accessible and broadly application knowledge representation
language, in line with FAIR principle (Findable Accessible Interoperable Reusable).
RDF o ers a common data format for both data and their metadata, as well as
data and the vocabularies used to describe them. RDF statements are largely
in the form of "subject", "predicate", "object", "graph" quads, and can be
serialized in a number of standard formats including CSV, TSV, JSON, XML,
JSON-LD, Turtle, RDF/XML, N-Triples. RDF, in combination with the Web
Ontology Language [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], thereby making them more expressive than simple le
formats such as CSV, JSON, and XML.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Automated generation of RDF data</title>
        <p>
          The generation of RDF data from input data is performed in a semi-automated
manner. A semi-automated approach is currently needed, because the while RDF
data can be automatically generated from a wide variety of data formats, these
1 https://ncats.nih.gov/translator/about, which we shall refer to as \Translator
program" in the rest of the paper.
resulting data may generate incorrect relations and lack the intended semantics.
To produce more accurate RDF data, we need a language to either add the
RDF semantics prior to the transformation or after the transformation. We use
of relational mapping languages such as R2RML2, a W3C standard, while we
turn to SPARQL [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to transform an existing RDF dataset to some community
standard. Input data are processed by an R2RML processor along with the
R2RML mapping le to generate the output data.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Tabular les and relational database transforms: Tabular les (e.g.: CSV,</title>
        <p>TSV, PSV) are exposed as a relational database tables using Apache Drill3.
Each le is represented as a table, and each column is considered as the attribute
(properties in RDF) of the table. Attributes are named after the column headers.</p>
        <p>R2RML mapping les are automatically generated from relational databases
and Apache Drill accessible les through SQL queries issued by our AutoR2RML4
tool. Table names are used to generate the subject type identi ers while attribute
names are the basis for predicate identi ers. An R2RML processor5 generates
the resulting RDF using the mapping les in combination with the source data.
XML transforms: We developed an xml2rdf6 tool to stream process an XML
le to RDF, largely following the XML data structure. The output RDF captures
2 https://www.w3.org/TR/r2rml/
3 https://github.com/amalic/apache-drill
4 https://github.com/amalic/AutoR2RML
5 https://github.com/chrdebru/r2rml
6 https://github.com/MaastrichtU-IDS/xml2rdf
name of the XML node, its XPath location, its value, any children and their
attributes. We envision to broader version of this tool to process any kind of
tree-like document format (JSON, YAML).
2.2</p>
      </sec>
      <sec id="sec-2-3">
        <title>RDF Upload</title>
        <p>The generated RDF data is then loaded into an RDF database through its
REST API or SPARQL interface using RdfUpload7. RdfUpload is a project
that automatically uploads an RDF le into a speci ed GraphDB SPARQL or
HTTP Repository endpoint.
2.3</p>
      </sec>
      <sec id="sec-2-4">
        <title>Transform RDF to target model</title>
        <p>Finally, SPARQL insert are run to transform generic RDF representation of the
XML data structure into the target data model. Those SPARQL queries are
manually designed by a user aware of the input le structure, the target data
model and the SPARQL query language.
2.4</p>
      </sec>
      <sec id="sec-2-5">
        <title>Access the data through services</title>
        <p>Once data is transformed into the target database, it can be accessed using a
variety of services. RDF databases typically provide a SPARQL endpoint to
query RDF data.</p>
        <p>
          Here, we envision the development or inclusion of a variety of service
interfaces on top of the RDF databases. This can include the automatic generation
of REST APIs to manipulate entities and relations in the knowledge graph8,
the encapsulation of SPARQL queries as REST operations 9, the provision of
standardized dataset metadata10 and API metadata [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], the use of standardized
hypermedia controls 1112, the use of graph query languages (GraphQL,
HyperGraphQL, openCypher, Gremlin, SPARQL) and user interfaces1314.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data2Services Evaluation</title>
      <p>To demonstrate the Data2Services framework we use following query, relevant
to the Translator program: Q1. Which drugs, or compounds, target gene
products of a [gene]?. To answer the query, we use two datasets from a pool of 40
datasets used by the Translator program. These datasets are: (i) HUGO Gene
7 https://github.com/MaastrichtU-IDS/RdfUpload
8 http://www.dfki.uni-kl.de/~mschroeder/demo/sparql-rest-api/
9 https://github.com/CLARIAH/grlc
10 https://www.w3.org/TR/hcls-dataset/
11 https://www.w3.org/TR/ldp
12 https://spring.io/understanding/HATEOAS
13 http://yasgui.laurensrietveld.nl
14 http://www.irisa.fr/LIS/ferre/sparklis/
Nomenclature Committee (HGNC)15, a curated repository of HGNC-approved
gene names, gene families and associated resources that provides detailed
information about genes and gene products, available in TSV format, and (ii)
DrugBank, a resource containing detailed information on drugs and the gene
products they target16, available in XML format. Listings 1.1 and 1.2 show
excerpts of the datasets used. We chose these particular datasets because they
contain relevant information to answer the query.</p>
      <p>In the following, we outline the steps taken to execute the Data2Services
pipeline on these two datasets using two di erent services to answer this query.
1 hgnc_id
2 HGNC :3535
3 HGNC :3537
symbol
F2
F2R
name
coagulation factor II , thrombin
coagulation factor II thrombin receptor</p>
      <p>Listing 1.1. Excerpt of the HGNC TSV dataset.
1 &lt;? xml version ="1.0" encoding =" UTF -8" ?&gt;
2 &lt; drugbank xmlns =" http :// www . drugbank . ca " version =" 5.1 " &gt;
3 &lt;drug type =" biotech " created =" 2005 -06 -13 " updated =" 2018 -07 -02 " &gt;
4 &lt;drugbank - id primary =" true " &gt; DB00001 &lt;/ drugbank - id &gt;
5 &lt;name &gt; Lepirudin &lt;/ name &gt;
6 &lt; target &gt;
7 &lt;id &gt; BE0000048 &lt;/ id &gt;
8 &lt;name &gt; Prothrombin &lt;/ name &gt;
9 &lt;external - identifier &gt;
10 &lt; identifier &gt;HGNC :3535 &lt;/ identifier &gt;</p>
      <p>Listing 1.2. Excerpt of the DrugBank XML dataset.
3.1</p>
      <sec id="sec-3-1">
        <title>Automated generation of generic RDF</title>
        <p>As the rst step, we downloaded the HGNC dataset from ftp://ftp.ebi.
ac.uk/pub/databases/genenames/hgnc_complete_set.txt.gz and the
Drugbank dataset from https://www.drugbank.ca/releases/5-1-1/downloads/
all-full-database17. To execute the Data2Services pipeline, the downloaded
les need to be uncompressed and placed in di erent directories mapped into
/data directory inside the Apache Drill Docker container. For convenience we
have created two Shell scripts to build the Data2Service pipeline Docker
containers, and start both Apache Drill and Ontotext GraphDB as services, as shown
in Listing 1.3.
1 $ git clone -- recursive https :// github . com / MaastrichtU - IDS / data2services
pipeline . git
2 $ cd data2services - pipeline
3 $ git checkout tags / swat4ls
4 $ ./ build . sh
5 $ ./ startup . sh</p>
        <p>Listing 1.3. Buiding the pipeline Docker images from GitHub
15 https://www.genenames.org
16 https://www.drugbank.ca/
17 Needs an account to be created to download.</p>
        <p>Before continuing, a repository needs to be created for GraphDB. It can be
done by accessing GraphDB at http://localhost:7200 and go to: Setup !
Repositories ! Create new repository. Choose "test" as repository ID and check
"Use context index".</p>
        <p>To automatically run the data2services-pipeline that generates RDF out of
the input dataset, the only requirement is to de ne a YAML con guration for
each dataset as shown in Listing 1.4. The YAML allows the user to con gure
different parameters such as the path to the input le, Apache Drill and GraphDB
parameters.
1 WORKING_DIRECTORY : "/data/hgnc/hgnc_complete_set.txt" # for HGNC
2 WORKING_DIRECTORY : "/data/drugbank/full_database.xml" # for DrugBank
3
4 JDBC_URL : "jdbc:drill:drillbit=drill:31010"
5 JDBC_CONTAINER : "drill"
6 GRAPHDB_URL : "http://graphdb:7200"
7 GRAPHDB_REPOSITORY : "test"
8 GRAPHDB_USERNAME : "import_user"
9 GRAPHDB_PASSWORD : "test"</p>
        <p>Listing 1.4. YAML con guration le for HGNC or DrugBank</p>
        <p>Then, Data2Services can be executed by providing the YAML con guration
le to the run.sh script in the data2services-pipeline directory with the command
$ ./run.sh /path/to/config.yaml.</p>
        <p>
          HGNC processing: As a result of executing the YAML con guration le, a
R2RML mapping le is produced for HGNC, in the directory where the input
le is stored, as shown in Listing 1.5.
1 @prefix rr : &lt;http :// www . w3 . org / ns / r2rml # &gt;.
2 &lt;# HgncMapping &gt;
3 rr : logicalTable [ rr : sqlQuery """
4 select row_number () over ( partition by filename ) as autor2rml_rownum
5 , columns [0] as `HgncId `
6 , columns [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] as ` ApprovedSymbol `
7 from dfs . root . `/ data / hgnc / hgnc_complete_set .tsv `;"""];
8 rr : subjectMap [
9 rr : termType rr : IRI ;
10 rr : template " http :// data2services / data / hgnc / hgnc_complete_set . tsv /{
autor2rml_rownum }";
11 ];
12 rr : predicateObjectMap [
13 rr : predicate &lt;http :// data2services / data / hgnc / hgnc_complete_set . tsv / HgncId &gt;;
14 rr : objectMap [ rr : column " HgncId " ];
15 ];
        </p>
        <p>Listing 1.5. Excerpt of the R2RML mapping le for HGNC TSV le</p>
        <p>Then the R2RML implementation is executed to extract the data from the
TSV input le to produce the generic RDF from the R2RML mapping les as
shown in 1.6.
1 PREFIX d2s : &lt;http :// data2services / data / hgnc / hgnc_complete_set . txt /&gt;
2 d2s :3535 d2s : symbol " F2 " ;
3 d2s : name " coagulation factor II , thrombin " .</p>
        <p>Listing 1.6. Excerpt of the produced triples for HGNC TSV le
DrugBank processing: Executing the Data2Services pipeline on the
Drugbank XML le produces a generic RDF model representing the DrugBank XML
structure as shown in Listing 1.7.
1 d2sdata :3 a7d47b8 - c734 rdf : type d2smodel : drugbank / drug / drugbank - id .
2 d2sdata :0 c1f3d83 -5563 d2smodel : hasChild d2sdata :3 a7d47b8 - c734 .
3 d2sdata :3 a7d47b8 - c734 d2smodel : model / hasValue " DB00001 " .</p>
        <p>Listing 1.7. Excerpt of triples generated from Drugbank XML structure
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>RDF Upload</title>
        <p>As the next step, RdfUpload is executed to load the RDF data to the \test"
repository of the GraphDB service running on http://localhost:7200.
RdfUpload has so far only been tested on GraphDB, but we envision to develop it
as a generic tool that can be used on most of the popular RDF databases like
Virtuoso.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Mapping to the BioLink model</title>
        <p>As part of the standardization of knowledge graphs, the Translator project has
created BioLink18, is a high level datamodel of biological entities (genes, diseases,
phenotypes, pathways, individuals, substances, etc) and their associations. We
crafted two SPARQL INSERT queries 19 20 to generate new RDF datasets that
are compliant with the BioLink model as shown for HGNC in Listing 1.8.
1 PREFIX d2s :&lt; http :// data2services / data / hgnc / hgnc_complete_set . ts /&gt;
2 PREFIX rdfs : &lt;http :// www . w3 . org /2000/01/ rdf - schema #&gt;
3 PREFIX bioentity : &lt;http :// bioentity . io / vocab /&gt;
4 INSERT {
5 ? hgncUri a bioentity : Gene .
6 ? hgncUri rdfs : label ? geneName .
7 ? hgncUri &lt;http :// purl . org / dc / terms / identifier &gt; ? hgncid .
8 ? hgncUri bioentity : id ? hgncUri .
9 ? hgncUri bioentity : systematic_synonym ? symbol .
10 } WHERE {
11 SELECT ?s ? hgncid ? geneName ? symbol ? hgncUri {
12 ?s d2s : ApprovedName ? geneName .
13 ?s d2s : HgncId ? hgncid .
14 ?s d2s : ApprovedSymbol ? symbol .
15 ?s ?p ?o .
16 BIND ( iri ( concat (" http :// identifiers . org /" , lcase (? hgncid ))) AS ? hgncUri )
17 }}</p>
        <p>Listing 1.8. SPARQL construct query to convert HGNC to BioLink</p>
        <p>The transformed RDF data contains drug data from Drugbank, linked to
gene data from HGNC in a manner that is compliant with the BioLink model.
The next step is to use the available interfaces to answer the research question.
18 https://biolink.github.io/biolink-model/
19 https://github.com/vemonet/ncats-grlc-api/blob/master/insert_biolink_
drugbank.rq
20 https://github.com/vemonet/ncats-grlc-api/blob/master/insert_biolink_
hgnc.rq
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Services</title>
        <p>Executing the Data2Services pipeline enables the BioLink data to be queried
through two services: (i) SPARQL and (ii) an HTTP API.</p>
        <p>SPARQL: The original question can be answered by executing a SPARQL
query which retrieves all drugs linked to a given gene, as shown for the gene
\coagulation factor II, thrombin" in Listing 1.9. The results show that 23 drugs
are a ecting this gene. Then any other question can be translated from natural
language to a SPARQL query that will extract the requested informations from
the graph.
1 PREFIX rdfs : &lt;http :// www . w3 . org /2000/01/ rdf - schema #&gt;
2 PREFIX bioentity : &lt;http :// bioentity . io / vocab /&gt;
3 SELECT distinct ? gene ? geneLabel ? geneProductLabel ? drug ? drugLabel
4 { ? gene a bioentity : Gene .
5 ? gene bioentity : id ? geneId .
6 ? gene rdfs : label ? geneLabel .
7 ? gene bioentity : has_gene_product ? geneProduct .
8 ? geneProduct rdfs : label ? geneProductLabel .
9 ? drug bioentity : affects ? geneProduct .
10 ? drug a bioentity : Drug .
11 ? drug rdfs : label ? drugLabel .
12 FILTER regex ( str (? geneLabel ) , " coagulation factor II , thrombin ") .
13 }</p>
        <p>Listing 1.9. SPARQL query to answer Which drugs, or compounds, target gene
products of thrombin.</p>
        <p>HTTP API: We used grlc21 to expose the SPARQL query to answer the
question as an HTTP web service. grlc is a lightweight server that takes SPARQL
queries curated in GitHub repositories, and translates them to Linked Data Web
APIs. Users are not required to know SPARQL to query their data, but instead
can access a web API. We implemented a Swagger API with one call to
retrieve URI and label of drugs that a ect a gene de ned by the user using its
HGNC identi er e.g: HGNC:3535. This API is available at http://grlc.io/
api/vemonet/ncats-grlc-api.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        A signi cant amount of research has been conducted in the domain of data
conversion to a standard, semantically meaningful format. OpenRe ne22 o ers
a web user interface a to manipulate tabular data and generate RDF23.
However, this user must be knowledgeable about RDF data modeling to generate
sensible RDF data. Karma [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] allows customizable RDF conversion through a
21 https://github.com/CLARIAH/grlc
22 http://openrefine.org/
23 https://github.com/fadmaa/grefine-rdf-extension/releases
web browser, but producing high quality mappings from ontologies with
various structured sources (formats including databases, spreadsheets, delimited
text les, XML, JSON, KML) requires expert ontology knowledge. Another
example of user controlled RDF conversion is Sparqlify24, which depends on the
non-standardized Sparqli cation Mapping Language(SML) and can be di cult
for inexperienced users [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Other approaches also involve the transformation of
XML data by using XSLT stylesheets [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or templates [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. SETLr [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is another
tool which can convert a variety of data types to RDF using JSLDT, together
with Jinja Templates and Python Expressions, and is a great option for
professionals familiar with those languages.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions, Limitations and Future Work</title>
      <p>In this paper, we describe Data2Services, a software framework to
automatically process heterogeneous data with multiple interfaces to access those data.
This open-source framework makes use of Docker containers to properly
congure software components and the execution of the work ow. We demonstrate
the utility of Data2Services by transforming life science data and answering a
question that has arisen in the Translator program.</p>
      <p>
        This work represents a preliminary e ort in which there are several
limitations. While the automatic conversion of data does produce an RDF graph, it
lacks a strong semantics that is obtained by mapping data to domain ontologies.
Our strategy in this work was to show how a second transformation, expressed
as a SPARQL INSERT query over the automatically converted data, could be
mapped to a community data model (BioLink) for use by that community. We
also acknowledge that it may be possible to edit the autogenerated R2ML le to
produce BioLink data, but this would not be available for the XML conversion.
Indeed, it would be desirable to have one declarative language for the
transformation of a greater set of initial data format into RDF. RML [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] promises
such a language, with existing processors for relational, XML, JSON data.
However, preliminary work with the RML processor revealed that it does not scale
currently with large data les because it loads the entire data into memory
into so called logical structures, which enable joining data from so called
logical sources25. Nonetheless, e orts to craft friendly user interfaces that hide the
complexity of mapping languages could be useful to generate mappings from
non-traditional users for all kinds of data les.
      </p>
      <p>
        Our future work will explore the automated capture of metadata such as
provenance of the data collection and processing in a manner that is compliant
to community standards, the incorporation of data quality assessments on RDF
data [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and the evaluation of the framework in terms of performance with
other use cases and data sources, both large and small.
24 https://github.com/SmartDataAnalytics/Sparqlify
25 http://rml.io/spec.html#logical-join
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>Support for the preparation of this project was provided by NCATS, through
the Biomedical Data Translator program (NIH awards OT3TR002019 [Orange]
and OT3TR002027 [Red]). Any opinions expressed in this document are those
of the Translator community writ large and do not necessarily re ect the views
of NCATS, individual Translator team members, or a liated organizations and
institutions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bechhofer</surname>
          </string-name>
          , S., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>PatelSchneider</surname>
            ,
            <given-names>P.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          :
          <article-title>OWL Web Ontology Language Reference</article-title>
          .
          <source>Tech. rep., W3C</source>
          , http://www.w3.org/TR/owl-ref/ (
          <year>February 2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Breitling</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A standard transformation from xml to rdf via xslt</article-title>
          .
          <source>Astronomische Nachrichten: Astronomical Notes</source>
          <volume>330</volume>
          (
          <issue>7</issue>
          ),
          <volume>755</volume>
          {
          <fpage>760</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brickley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guha</surname>
          </string-name>
          , R.:
          <article-title>Rdf vocabulary description language 1.0: Rdf schema</article-title>
          .
          <source>Tech. rep., W3C Recommendation</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vander</surname>
            <given-names>Sande</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          .
          <source>In: Proceedings of the 7th Workshop on Linked Data on the Web (Apr</source>
          <year>2014</year>
          ), http://events.linkeddata.org/ldow2014/papers/ldow2014_ paper_01.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ermilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Csv2rdf: User-driven csv to rdf mass conversion framework</article-title>
          .
          <source>In: Proceedings of the ISEM</source>
          . vol.
          <volume>13</volume>
          , pp.
          <volume>04</volume>
          {
          <issue>06</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lerman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muslea</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taheriyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mallick</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Semi-automatically mapping structured sources into the semantic web</article-title>
          .
          <source>In: Extended Semantic Web Conference</source>
          . pp.
          <volume>375</volume>
          {
          <fpage>390</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Krextor{an extensible xml rdf extraction framework</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>McCusker</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chastain</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Setlr: the semantic extract, transform, and load-r</article-title>
          .
          <source>PeerJ Preprints</source>
          <volume>6</volume>
          ,
          <issue>e26476v1</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Prud'hommeaux</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SPARQL Query Language for RDF</article-title>
          . W3C
          <string-name>
            <surname>Recommendation</surname>
          </string-name>
          (
          <year>January 2008</year>
          ), http://www.w3.org/TR/rdf-sparql-query/, http://www.w3.org/TR/rdf-sparql-query/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al:
          <article-title>The FAIR Guiding Principles for scienti c data management and stewardship</article-title>
          .
          <source>Scienti c Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ). https://doi.org/http://doi.org/10.1038/sdata.
          <year>2016</year>
          .18
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dastgheib</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Avillach</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korodi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terryn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jagodnik</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>: smartapi: Towards a more intelligent network of web apis</article-title>
          . In: Blomqvist,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <surname>O</surname>
          </string-name>
          . (eds.)
          <article-title>The Semantic Web</article-title>
          . pp.
          <volume>154</volume>
          {
          <fpage>169</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietrobon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Quality assessment for Linked Data: A survey</article-title>
          .
          <source>Semantic Web Journal</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>