<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A SPARQL Endpoint Profiler for Efficient Question Answering Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasunori YAMAMOTO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science, Univ. of Tokyo Kashiwa-no-ha Campus Station Satellite 6F.</institution>
          <addr-line>178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A question answering (QA) system querying SPARQL endpoints must know which endpoints have data to match a triple pattern of a given question. To develop a system that locates endpoints more efficiently, we propose a tool to collect several metadata of a SPARQL endpoint using SPARQL queries only. The metadata include IRIs of explicitly declared classes (i.e., objects of the rdf:type predicate) as well as the number of triples whose subject and object belong to the respective classes. In addition, our tool counts triples whose objects are literals or any IRIs not appearing as the subject of the rdf:type predicate. The result is in RDF format using Vocabulary of Interlinked Datasets (VoID), Service Description (SD), and a newly developed vocabulary. We have collected data from various SPARQL endpoints and have provided them from our SPARQL endpoint. QA systems can easily utilize these data to improve their efficiency in finding endpoints by obtaining endpoints preemptively.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>SPARQL</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Source Selection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        With the advent of triple data stores with the provision of SPARQL APIs and
openly available datasets in Resource Description Framework (RDF) format, such as
DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and UniProt [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], querying over multiple SPARQL endpoints is becoming
a practical method to gather data to answer a question. There have been workshops of
shared tasks that have reported this situation. Question Answering over Linked Data
(QALD) has taken place three times1. The latest QALD (QALD-4) set up three tasks,
and their target datasets were DBpedia, SIDER, Diseasome, and Drugbank. Other
than DBpedia, the three datasets are in the life science domain, which has a relatively
longer history of involvement in the RDF and the Semantic Web, and a certain
number of RDF datasets, such as UniProt and Bio2RDF2, are openly available in addition
to those used in QALD.
      </p>
      <p>To address the source selection issue, that is, how to find or choose an endpoint
that provides datasets relevant to a given question effectively and efficiently, the
in</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/ 2 http://lod-cloud.net/state/</title>
      <p>
        dustry held a workshop called PROFILES (1st International Workshop on Dataset
PROFIling &amp; fEderated Search for Linked Data). This workshop focused on data
source contextualization for searching and exploring linked data, responding to the
lack of scalable and usable methods for formulating and distributing semantic and
keyword queries across the Web of Data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        We, the Database Center for Life Science (DBCLS), share this issue: additional
datasets in Life Sciences have been recently released in RDF format. Examples include
ChEMBL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Reactome [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], LinkDB3, and Allie [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For this situation, we have
developed a tool to obtain the profiles of the endpoints that helps a QA system find
relevant datasets efficiently and effectively. In this context, a QA system accepts a
question in natural language, converts it into a SPARQL query or set of sub-queries,
determines the remote SPARQL endpoints, issues queries to them, and shows the result
to the user. Our primary interest in the profiling issue is the lack of knowledge of the
relationships among classes and literals as well as their statistics. Although some
endpoints provide their ontologies, statistics, such as the number of triples where subjects
and objects belong to certain classes, cannot be obtained. These metadata are used by
a QA system to determine a SPARQL endpoint to obtain a dataset and answer a given
query. For example, the question "What genes are associated with Alzheimer
disease?" can be translated to a pseudo SPARQL query as follows4:
      </p>
      <sec id="sec-2-1">
        <title>SELECT ?t1</title>
      </sec>
      <sec id="sec-2-2">
        <title>WHERE {</title>
        <p>?t1 [:isa] [genes] .
?t2 [:isa] [alzheimer disease] .</p>
        <p>?t1 [associated with] ?t2 .
}</p>
        <p>
          In this query, terms within brackets are mapped to existing IRIs or literals by a QA
system needing to find endpoints that provide datasets to answer the above question.
More specifically, the first QA system task is to search for vocabularies that have
terms mapped to each bracketed term. The second task is to find endpoints that have
datasets corresponding to the vocabularies. In this study, we focus on the second task.
Several groups [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">7-9</xref>
          ] have studied this topic and demonstrated that VoID [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] datasets
can be used. Although VoID is useful because its dataset can be obtained using the
HTTP, the system cannot learn a number of elements useful for QA. It does not
provide the relationships among classes and literals with respect to predicates along with
these statistics. Using the above question as an example, a QA system attempts to
determine which endpoints can provide triples that connect instances of [genes] and
[alzheimer disease] with meaningful predicates related to [associated with]. VoID
does provide the number of instances of association to each class but does not provide
the statistics for such relationships.
        </p>
        <p>Our objective is to collect these statistics from SPARQL endpoints and provide
them through a public SPARQL endpoint, and to our knowledge, this is the first
at</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 http://www.genome.jp/linkdb/linkdb_rdf.html 4 http://lodqa.dbcls.jp</title>
      <p>tempt. Our approach requires only the URLs of endpoints of interest. By using the
metadata from the endpoints, a QA system can choose relevant endpoints before
issuing specific SPARQL queries to answer a given question. In addition, our tool stores
obtained data in RDF format, allowing everyone to share the data easily. We used
VoID, SPARQL 1.1 Service Description (SD) vocabularies, and a vocabulary
developed by our project called SPARQL Builder Metadata (SBM)5.
2</p>
      <sec id="sec-3-1">
        <title>Our approach</title>
        <p>To collect the statistics of a SPARQL endpoint, our tool obtains all combinations
of subject and object relationships with respect to classes and literals in the dataset
provided. There are six triple types, that is, combinations of two and three cases for
subject and object, respectively. Any subject can be a resource (i.e., IRI or blank
node) of a locally declared or undeclared class instance, and any object can be literal
or identical to the case of the subject. A locally declared class includes resources that
are objects of the rdf:type predicate in the dataset. For example, Tree is a locally
declared class if there is a following triple in the dataset:</p>
        <p>:sycamore rdf:type :Tree .</p>
        <p>Accordingly, sycamore is a locally declared class instance. In addition, we call a
resource a locally undeclared class (LUC) instance if it does not appear as a subject of
the rdf:type predicate in the dataset. Then, our tool counts predicates with respect to
each triple type mentioned above and each subject and object classes, respectively.</p>
        <p>Figure 1 shows our proposed algorithm. In the algorithm, LOOP0 corresponds to a
process for each graph, and LOOP1 is an inner loop of LOOP0 that addresses each
class in the graph. The inner loop corresponds to the following cases:
1.
2.
3.</p>
        <p>Both the subject and the object of a triple belong to respective locally
declared classes (i.e., combinations of locally declared classes).</p>
        <p>The subject of a triple belongs to a locally declared class, and its object is
literal.</p>
        <p>Either the subject or the object of a triple belongs to a locally declared class.
Here, process number 1 of the inner loop meets the first case, process number 2 meets
the second case, and process numbers 3 and 4 meet the third case. As for process
numbers 2 and 3 of LOOP0, the former corresponds to the case where the subject and
the object of a triple are an LUC instance and a literal, respectively. The latter
corresponds to the case where both the subject and object of a triple are LUC instances.
5 http://www.sparqlbuilder.org/doc/?page_id=20</p>
        <p>Obtain a graph set G.</p>
        <p>LOOP0:
for each g in G</p>
        <p>1: obtain a locally declared class set C in g.</p>
        <p>LOOP1:
for each c in C
1: count the number of triples w.r.t each predicate
whose subject and object belong to c and c'
where c' is a locally declared class.
2: count the number of triples w.r.t each predicate
whose subject belongs to c,
and whose object is literal.
3: count the number of triples w.r.t each predicate
whose subject belongs to c,
and whose object is an LUC instance.
4: count the number of triples w.r.t each predicate
whose subject is an LUC instance,
and whose object belongs to c.
end for (LOOP1)</p>
        <sec id="sec-3-1-1">
          <title>2: count the number of triples w.r.t each predicate</title>
          <p>whose subject is an LUC instance,
and whose object is literal.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3: count the number of triples w.r.t each predicate</title>
          <p>whose subject and object are both LUC instances.
end for (LOOP0)</p>
          <p>After obtaining the data, our tool stores them in a triple store. We use three
vocabularies to create an RDF dataset: VoID6, SD7, and SBM8. SBM was developed
to express relationships between classes where a triple connects instances of classes
as its subject and object. For example, an instance of the sbm:ClassRelation class has
the properties sbm:subjectClass and sbm:objectClass, and the instance assumes to be
an object of a void:Dataset instance with the predicate sbm:classRelation. Therefore,
SBM can be used to extend the VoID data.
3</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Scenario</title>
        <p>Taking the above question as an example, we show how an obtained dataset can be
used to determine an endpoint. There are three triple patterns as follows:
1: ?t1 [:isa] [genes] .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6 http://rdfs.org/ns/void# 7 http://www.w3.org/ns/sparql-service-description# 8 http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#</title>
      <sec id="sec-4-1">
        <title>2: ?t2 [:isa] [alzheimer disease] .</title>
      </sec>
      <sec id="sec-4-2">
        <title>3: ?t1 [associated with] ?t2 .</title>
        <p>As for the first pattern, a QA system needs to know that something is a gene, and
Bio2RDF endpoints 9 can be used. Bio2RDF provides a set of SPARQL endpoints
from life science datasets that include GenBank, Gene Ontology, and OMIM.
Therefore, we assume that mapping from a pseudo class to a specific IRI is done based on
vocabularies used in Bio2RDF. We use the prefix bio2rdf: to denote
http://bio2rdf.org/. The question of interest is related to genes and disease, and there
are the following classes:
bio2rdf:omim_vocabulary:Gene, and
bio2rdf:umls_vocabulary:Resource .</p>
        <p>
          The latter class holds 29,602 instances10, each of which corresponds to an UMLS
Metathesaurus concept identifier (CUI). The UMLS Metathesaurus [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a very
large multi-lingual vocabulary database in the life science domain, and CUIs are used
to identify concepts and meanings in the Metathesaurus. The term Alzheimer's disease
is represented as C0002395 using its CUI, and therefore the term can be expressed as
bio2rdf:umls:C0002395 in the Bio2RDF dataset. At this point, the system must find
relationships between bio2rdf:umls:C0002395 and instances that belong to
bio2rdf:omim_vocabulary:Gene (we refer to this as the Gene class hereafter). As
previously mentioned, mapping a pseudo class to a specific URI is a significant task,
but not the focus of this study.
        </p>
        <p>Once the mapping is completed, the next task is to identify a path between terms.
By looking up the statistics obtained using our approach, the system can discover two
groups of triples. Those of one group have a subject belonging to the Gene class, and
those of the other have a subject belonging to the class of
bio2rdf:umls_vocabulary:Resource (we refer to this as the UMLS class hereafter)11.
However, triples of the latter group fall into one of the following three subgroups12:
1.
2.
3.</p>
        <p>Taking rdf13:type as its predicate
Taking void:inDataset as its predicate</p>
        <p>Taking a literal as its object</p>
        <p>The first and second groups indicate that those triples are not statements relevant to
the question in the life science domain. The last group indicates that those triples do
9 http://download.bio2rdf.org/release/3/release.html
10 SELECT ?n { &lt;http://bio2rdf.org/umls_vocabulary:Resource&gt; ^void:class/void:entities ?n }
11 SELECT (count(distinct ?c1) as ?cc1) (count(distinct ?c2) as ?cc2) {
?c1 sbm:subjectClass &lt;http://bio2rdf.org/omim_vocabulary:Gene&gt; .</p>
        <p>?c2 sbm:subjectClass &lt;http://bio2rdf.org/umls_vocabulary:Resource&gt; . }
12 SELECT ?p ?o {
[] sbm:subjectClass &lt;http://bio2rdf.org/umls_vocabulary:Resource&gt; ;
sbm:objectClass ?o ;
^sbm:classRelation/void:property ?p }
13 http://www.w3.org/1999/02/22-rdf-syntax-ns#
not connect instances in the UMLS class to instances in another class. Therefore, the
system can determine that it must search for triples whose subject belongs to the Gene
class first.</p>
        <p>The system then looks up the statistics of the triples whose subject belongs to the
Gene class. Table 1 shows the top five predicates and object classes of triples whose
subject belongs to the Gene class, in the order of their counts14. Next, the system
looks up the statistics of each from the top down. Similar to the case of the UMLS
class, triples whose subject belongs to the top class have literals as their objects or are
not relevant to the life science domain, such as rdfs:Resource and dcterms:Dataset15.
As for the second class, 33,659 triples connect their instances to ones belonging to the
class bio2rdf:umls_vocabulary:Resource with the predicate of
bio2rdf:omim_vocabulary:x-umls16. In this way, the system discovers the possibilities
of connections from the Gene class to the UMLS class. The obtainable statistics are
merely binary relations, and there may be no designated path. However, the system
can determine whether there is a possibility or not, and this decision can be made
without further inquiry to remote SPARQL endpoints. Therefore, the system will
choose only promising endpoints to answer a given (sub) question.
Once the system learns that the endpoint of interest may have a dataset to answer
the question, it issues a SPARQL query to retrieve an answer. By traversing the
dataset, the system identifies the paths from instances of the Gene class to
bio2rdf:umls:C0002395. Figure 2 shows the obtained paths. In Figure 2, an oval
de14 SELECT ?c ?p ?o {
[] sbm:subjectClass &lt;http://bio2rdf.org/omim_vocabulary:Gene&gt; ;
sbm:objectClass ?o ;
void:triples ?c ;
^sbm:classRelation/void:property ?p }</p>
        <p>ORDER BY DESC(?c) LIMIT 5
15 SELECT distinct ?o {</p>
        <p>&lt;http://bio2rdf.org/gi_vocabulary:Resource&gt; ^sbm:subjectClass/sbm:objectClass ?o }
16 SELECT ?p ?o ?c {
[] sbm:subjectClass &lt;http://bio2rdf.org/omim_vocabulary:Resource&gt; ;
sbm:objectClass ?o ;
void:triples ?c ;
^sbm:classRelation/void:property ?p .</p>
        <p>FILTER(regex(str(?o),"umls","i"))}
notes a class and an arrow denotes a predicate. Note that we omit the prefix of
http://bio2rdf.org/ from the class and predicate IRIs. As Figure 2 shows, there are
multiple paths. For example, the omim_vocabulary:Phenotype class and the
omim_vocabulary:Resource classes are connected to the Gene class by the
omim_vocabulary:refers-to predicate. At the time of this writing, 170 instances of the
Gene class have been retrieved, as shown in the following example:</p>
        <p>AMYLOID P COMPONENT, SERUM; APCS [omim:104770]
PRION PROTEIN; PRNP [omim:176640]</p>
        <p>REELIN; RELN [omim:600514]
The SPARQL query to obtain this result is as follows.</p>
      </sec>
      <sec id="sec-4-3">
        <title>PREFIX omim: &lt;http://bio2rdf.org/omim_vocabulary:&gt;</title>
      </sec>
      <sec id="sec-4-4">
        <title>PREFIX umls: &lt;http://bio2rdf.org/umls:&gt;</title>
      </sec>
      <sec id="sec-4-5">
        <title>SELECT distinct (str(?l) as ?gene)</title>
      </sec>
      <sec id="sec-4-6">
        <title>WHERE {</title>
        <p>?o a omim:Gene ;
rdfs:label ?l ;
?p ?s .</p>
        <p>?s omim:x-umls umls:C0002395 .
} ORDER BY ?gene</p>
        <p>We have proposed a solution to the source selection issue. A QA system can
choose a SPARQL endpoint efficiently and effectively by using metadata obtained by
our proposed method. The efficiency is gained because a QA system issues the
minimum possible number of SPARQL queries to remote endpoints. The effectiveness is
gained because a QA system searches for possible paths corresponding to a graph
pattern prior to issuing a query.</p>
        <p>Currently, we note three issues concerning our approach. First, it takes time to
gather endpoint data because our tool must issue multiple SPARQL queries. The more
classes an endpoint graph has, the more queries the tool must issue. Letting C denote
the number of classes, the tool issues approximately 4C+3 queries. To alleviate the
load on an endpoint, we added a 1/3 second default wait time between queries.
Although collecting data requires time, our approach is not fatally impractical because
datasets provided by currently accessible endpoints normally have small numbers of
classes. According to the statistics provided by LODStats17, representing 365 datasets,
the median number of classes per dataset is 5.0. We have already demonstrated data
collection from 22 endpoints and 39 datasets in the life science domain, and our
approach performs as expected. The subject datasets are readily accessible through the
SPARQL interface at http://tm.dbcls.jp/tdp/ .</p>
        <p>The second issue is attributed to the implementations of target endpoints. There are
implementations that do not support all operations conforming to SPARQL 1.1. For
example, to obtain the number of triples whose subject belongs to an LUC, a query
must contain the MINUS keyword, which was newly introduced in SPARQL 1.1.
Fortunately, newly released implementations are expected to support these operations
and will address this issue in the near future.</p>
        <p>
          The third issue is that not all the datasets accessible through their SPARQL
endpoints have domain-specific classes like Bio2RDF. For example, there is a SPARQL
endpoint provided by the BioPortal project [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. We can also obtain OMIM data from
the dataset, but the only declared classes are owl:Class18. Therefore, the tool cannot
obtain any useful statistics when using the relationships solely among the locally
declared classes.
        </p>
        <p>We continue to obtain the statistics of several endpoints and utilize them to
develop a QA system in the life science domain. Beyond our proposed approach, we will
study how to effectively map a given pseudo class to existing obtained classes. In
addition, we hope that many endpoints provide the proposed statistics under the
conditions that they can be openly accessed and shared with others to minimize the
number of gathering actions at these endpoints. The tool is implemented in Java and
distributed from https://bitbucket.org/yayamamo/tripledataprofiler under the terms of the
MIT license.</p>
        <sec id="sec-4-6-1">
          <title>Acknowledgments</title>
          <p>This work has been supported by the National Bioscience Database Center
(NBDC) of the Japan Science and Technology Agency (JST). We thank the
SparqlBuilder team for providing us the SBM vocabulary before making it public.
17 http://stats.lod2.eu/
18 http://www.w3.org/2002/07/owl#Class</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>DBpedia: A Nucleus for a Web of Open Data</article-title>
          .
          <source>Proceedings of the 6th International Semantic Web Conference (ISWC2007)</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bairoch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duvaud</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Phan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redaschi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzek</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGarvey</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasteiger</surname>
          </string-name>
          , E.:
          <article-title>Infrastructure for the life sciences: design and implementation of the UniProt website</article-title>
          .
          <source>BMC Bioinformatics</source>
          .
          <volume>10</volume>
          :
          <fpage>136</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Demidova</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietze</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szymanski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breslin</surname>
          </string-name>
          , J.:
          <source>Preface. Proceedings of the 1st International Workshop on Dataset PROFIling &amp; fEderated Search for Linked Data</source>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Willighagen</surname>
            ,
            <given-names>E. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waagmeester</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spjuth</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ansell</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tkachenko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hastings</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wild</surname>
            ,
            <given-names>D. J.:</given-names>
          </string-name>
          <article-title>The ChEMBL database as linked open data</article-title>
          .
          <source>J Cheminform</source>
          .
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>23</fpage>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jupp</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malone</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolleman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandizi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaulton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehant</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laibe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redaschi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wimalaratne</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Le</given-names>
            <surname>Novère</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Parkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Birney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Jenkinson</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. M.:</surname>
          </string-name>
          <article-title>The EBI RDF platform: linked open data for the life sciences</article-title>
          .
          <source>Bioinformatics</source>
          .
          <volume>30</volume>
          (
          <issue>9</issue>
          ):
          <fpage>1338</fpage>
          -
          <lpage>9</lpage>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamaguchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yonezawa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Building Linked Open Data towards integration of biomedical scientific literature with DBpedia</article-title>
          .
          <source>J Biomed Semantics</source>
          .
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>8</fpage>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rakhmawati</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karnstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasnain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hausenblas</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A Comparison of Federation over SPARQL Endpoints Frameworks</article-title>
          .
          <article-title>In Knowledge Engineering and the Semantic Web</article-title>
          (pp.
          <fpage>132</fpage>
          -
          <lpage>146</lpage>
          ). Springer Berlin Heidelberg. (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Görlitz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steffen</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions</article-title>
          .
          <source>In: Proc. of the 2nd Int. Workshop on Consuming Linked Data</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwarte</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hütter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation</article-title>
          .
          <source>In The Semantic Web-ISWC</source>
          <year>2013</year>
          (pp.
          <fpage>427</fpage>
          -
          <lpage>443</lpage>
          ). Springer Berlin Heidelberg. (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Describing</surname>
          </string-name>
          <article-title>Linked Datasets with the VoID Vocabulary</article-title>
          , http://www.w3.org/TR/void/
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>11. UMLS® Reference Manual [Internet], http://www.ncbi.nlm.nih.gov/books/NBK9684/</mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Whetzel</surname>
            <given-names>PL.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
          </string-name>
          , NF.,
          <string-name>
            <surname>Shah</surname>
          </string-name>
          , NH.,
          <string-name>
            <surname>Alexander</surname>
          </string-name>
          , PR.,
          <string-name>
            <surname>Nyulas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tudorache</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
          </string-name>
          , MA.:
          <article-title>BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>39</volume>
          (
          <issue>Web Server issue</issue>
          ):
          <fpage>W541</fpage>
          -
          <lpage>5</lpage>
          . (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>