<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assessing Trust with PageRank in the Web of Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jose M. Gimenez-Garc a</string-name>
          <email>jose.gimenez.garcia@univ-st-etienne.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harsh Thakkar</string-name>
          <email>hthakkar@uni-bonn.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Zimmermann</string-name>
          <email>antoine.zimmermann@emse.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Enterprise Information Systems Lab, University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Univ Lyon, MINES Saint-Etienne, CNRS, Laboratoire Hubert Curien UMR 5516</institution>
          ,
          <addr-line>F-42023 Saint-Etienne</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Univ Lyon, UJM-Saint-Etienne, CNRS, Laboratoire Hubert Curien UMR 5516</institution>
          ,
          <addr-line>F-42023 Saint Etienne</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>While a number of quality metrics have been successfully proposed for datasets in the Web of Data, there is a lack of trust metrics that can be computed for any given dataset. We argue that reuse of data can be seen as an act of trust. In the Semantic Web environment, datasets regularly include terms from other sources, and each of these connections express a degree of trust on that source. However, determining what is a dataset in this context is not straightforward. We study the concepts of dataset and dataset link, to nally use the concept of Pay-Level Domain to di erentiate datasets, and consider usage of external terms as connections among them. Using these connections we compute the PageRank value for each dataset, and examine the in uence of ignoring predicates for computation. This process has been performed for more than 300 datasets, extracted from the LOD Laundromat. The results show that reuse of a dataset is not correlated with its size, and provide some insight on the limitations of the approach and ways to improve its e cacy.</p>
      </abstract>
      <kwd-group>
        <kwd>linked data</kwd>
        <kwd>trust</kwd>
        <kwd>reuse</kwd>
        <kwd>interlinking</kwd>
        <kwd>PageRank</kwd>
        <kwd>metric</kwd>
        <kwd>assessment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The WDAqua project1 aims to advance the state of the art in data-driven
question answering, with a special focus in the Web of Data. The Web of Data
comprises thousands of datasets about varied topics, interrelated among them,
which contain large quantities of relevant data to answer a question. Nonetheless,
in an environment of information published independently by many di erent
actors, data veracity is usually uncertain [
        <xref ref-type="bibr" rid="ref17 ref19">17, 19</xref>
        ], and there is always the risk of
consuming misleading data. While some quality metrics have been proposed that
      </p>
      <p>Copyright held by the authors.</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://wdaqua.informatik.uni-bonn.de/</title>
      <p>
        can help to identify good datasets [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], there is a lack of trust metrics to provide
a con dence on the veracity of the data [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>
        In this context, we argue that actual usage of data can be seen as an act of
trust. In this paper we focus on reuse of resources by other datasets as an usage
metric. We consider reuse of a resource of a dataset by any other given dataset
as an outlink from the later to the former. Under this purview, we can compute
the PageRank [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] value of each dataset and rank them according to their reuse.
PageRank has been successfully used to obtain trust metrics on individual triples.
In order to obtain a good measure of reuse, we perform the process on a large
scale. We make use of the tools provided by the LOD Laundromat [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to go
beyond LOD Cloud, and process more than 38 billion triples, distributed in more
than 600 thousand documents. The LOD Laundromat provides data from data
dumps collected from the Internet, so it is not limited to dereferenceable linked
data. However, what is regarded as a dataset is an important issue when dealing
with data dumps. We make use of the concept of Pay-Level Domain (or PLD,
also known as Top-Private Domain) to draw a distinction between datasets,
and consider the in uence of ignoring predicates when extracting outlinks. We
perform an agrupation of the triples in datasets according to their PLD and
compute their PageRank values as a rst measure of trust. Finally, we discuss
the results and limitations of the approach, suggesting improvements for future
work.
      </p>
      <p>This document is organized as follows: in Section 2, we rst discuss what
should be considered a dataset in our context in order to clarify the problem
we address; in Section 3 we present the tools we are using, namely the LOD
Laundromat and the PageRank algorithm; Section 4 describes the experiments
and results, which we further discuss; Section 5 presents relevant related work;
nally, we provide some conclusions and directions for future work in Section 6.
2</p>
      <sec id="sec-2-1">
        <title>Ranking the Web of Data</title>
        <p>We would like to assess trust in datasets by measuring their popularity based
on the reuse of resources from a dataset in another dataset. To do this, we
rely on the PageRank algorithm (presented in more detailed in Section 3). To
compute PageRank in a set of datasets, it is rst necessary to de ne what is
considered a dataset and what is a link between datasets. RDF graphs, although
formally de ned as a set of triples, can be seen as directed multigraphs in which
predicates play the role of arcs. This view suggests that if a triple contains a
resource of dataset A as subject, and a resource of dataset B as object, it can be
seen as a link from dataset A to dataset B. However, the links formed by arcs
in an RDF graph are irrelevant to the notion of dataset linking. In fact, only
the presence of hyperlinks su ces to indicate a link between one source and
destination, therefore any HTTP IRI in an RDF graph can be seen as a link. So
the question is, what it means that a resource belongs to a dataset, and to what
dataset a hyperlink \points to". A nave approach would be to consider that any
IRI existing in a dataset belongs to the dataset and thus, that links connect two
datasets having one same resource. However, this would imply, for instance, that
any triple anywhere that uses a DBpedia IRI is considered to be linked to from
the DBpedia dataset. As a result, any dataset that reuses a DBpedia IRI would
increase their PageRank according to this de nition.</p>
        <p>Alternatively, we could take advantage of the linked data principles which
stipulate that IRIs should be addresses pointing to a location on the Web. Again,
one could navely assume that the location that the address points to is what
de nes the dataset, that is, the document retrieved when one gets the resource
using the HTTP protocol. However, this would lead us, for instance, to de ne
each DBpedia article as an individual dataset.</p>
        <p>
          A second possibility would be to use the domain part of the URL, so datasets
are grouped by the same publisher. This approach is taken by Ding and Finin
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to characterize data in the Semantic Web. This way, it would be easy to
determine what dataset is being linked to. Such approach would work well if
all datasets were accessible from dereferenceable IRIs. However, there are large
portions of the Web of Data that provide access to data dumps only [
          <xref ref-type="bibr" rid="ref16 ref9">9, 16</xref>
          ]. In
this case, the domain of the dump does not necessarily match the domain of the
individual IRIs found in the dataset. As an example, the DBpedia dumps are
found at http://downloads.dbpedia.org/ while all DBpedia IRIs start with
http://dbpedia.org/.
        </p>
        <p>
          The last approach, is to use the the concept of PLD, i.e., the subdomain
component of a URL followed by a public su x, to identify a dataset. Then,
datasets are grouped not necessarily by the same publisher, but by the same
publisher authority. This approach has already been used by other works [
          <xref ref-type="bibr" rid="ref15 ref22">15, 22</xref>
          ].
As an example, if a le found at http://download.dbpedia.org/ contains the
following triple:
&lt;http://dbpedia.org/wiki/Europe&gt;
&lt;http://www.w3.org/2002/07/owl#sameAs&gt;
        </p>
        <p>&lt;http://sws.geonames.org/6255148/&gt;
we consider that the dataset having the PLD dbpedia.org is linking to the
dataset with PLD geonames.org. It is important to notice that the source of the
link (dbpedia.org) is obtained from the URL of the document that contains the
triple (http://download.dbpedia.org/), not from the subject of the example.
This approach enables us to extract outlinks from datasets published in dumps,
and therefore access the majority of accessible semantic web data.
De nition 1 (Dataset). A dataset is a non empty collection of triples that
can be retrieved from a source accessible at a URL having a common Pay-Level
Domain. The PLD identi es the dataset.</p>
        <p>In the previous example, we see that the predicate IRI is linking to the
standard OWL vocabulary. It is very likely that predicates in general will be
linking to vocabularies that are extensively reused. However, our intent is to
evaluate trust on actual data that can be used to answer questions, and not
vocabularies used to describe the data. We predict that extracting outlinks from
predicates will lead to higher values for datasets containing only vocabularies. For
this reason, we perform the same experiment with and without taking predicates
into consideration.</p>
        <p>De nition 2 (Dataset link). There exists a link from a dataset A to a dataset
B if and only if there exists a triple in a le at a location having the PLD that
identi es A in which the PLD of its subject, its object, or both matches the PLD
that identi es B.</p>
        <p>
          This de nition is in line with the PageRank algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] where the number
of links between the same two nodes is irrelevant. Note that since datasets must
be non empty, links to PLDs that do not host RDF have to be ignored. In the
next section, we describe the tools that we used in our experiments.
3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Preliminaries</title>
        <p>In order to provide a realistic assessment of reuse in Linked Open Data, we
exploited a large number of datasets by way of the LOD Laundromat (described
in Section 3.1) from which we compute the dataset links that form the input of
the PageRank algorithm (described in Section 3.2).
3.1</p>
        <sec id="sec-2-2-1">
          <title>The LOD Laundromat and Frank</title>
          <p>
            The LOD cloud2, and in general Linked Open Data, contains a wide variety of
formats, publishing schemes, errors, that make it di cult to perform a
largescale evaluation. Yet, to be accurate, our study requires to be comprehensive.
Fortunately, the LOD Laundromat [
            <xref ref-type="bibr" rid="ref1 ref21">1, 21</xref>
            ] makes this data available by
gathering dataset dumps from the Web, including archived data. LOD Laundromat
cleans the data by xing syntactic errors and removing duplicates, and then
makes it available through download (either as gzipped N-Triples or N-Quads,
or HDT [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] les), a SPARQL endpoint, and Triple Pattern Fragments [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ].
Using the LOD Laundromat is also a better solution than trying to use
documents dereferenced by URIs, because most of datasets available online are data
dumps [
            <xref ref-type="bibr" rid="ref16 ref9">9, 16</xref>
            ], thus not accessible by dereferencing.
          </p>
          <p>
            Frank [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] is a command-line tool which serves as an interface of the LOD
Laundromat, and makes it easy to run evaluations against very large numbers
of datasets.
3.2
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>PageRank</title>
          <p>
            PageRank [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ] is the original algorithm developed by Page et al. that Google
uses to rank their search results. It takes advantage of the graph structure of
the web, considering each link from one page as a \vote" from the source to the
destination. Using the links, the importance of a page is propagated across the
graph, dividing the value of a page among its outlinks. This process is repeated
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 http://lod-cloud.net/</title>
      <p>until convergence is reached. The nal result of PageRank corresponds to a
stationary distribution, where each page value amounts to the probability for a
random surfer to be at any moment in the page.
4</p>
      <sec id="sec-3-1">
        <title>Experiments and Results</title>
        <p>The process to compute PageRank involves the following steps, detailed further
below and illustrated in Figure 1. The code and results are provided online3.
1. Extracting the document list from LOD Laundromat.
2. Parsing the content of each document to extract the outlinks.
3. Consolidating the results
4. Computing PageRank</p>
        <p>LOD
Laundromat</p>
        <p>Parse Documents
Parse Documents</p>
        <p>…
Parse Documents</p>
        <p>Outlinks
Outlinks
…</p>
        <p>Outlinks
Extract List of
Documents</p>
        <p>List of
Documents</p>
        <p>Consolidate
Results</p>
        <p>Outlinks</p>
        <p>Compute
PageRank</p>
        <p>
          PageRank
Values
We use the Frank command line tool [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] to obtain a snapshot of the contents
of the LOD Laundromat. While the output of Frank can be directly pipelined
to our process, the next step is performed in parallel in several machines. For
this reason, we need that every machine reads the exact same input. An update
in the contents of the LOD Laundromat during the next process could have
impacted the results in that case. We retrieve the list of documents in the LOD
Laundromat with the following command.
        </p>
        <p>$ frank documents &gt; documents.dat</p>
        <p>This command retrieves a list of pairs (downloadURL-resourceURL), where
the rst is the URL to download the gzipped datasets, and the second the
resource identi er in the LOD Laundromat ontology. At the moment of the
experiments, it retrieved 649,855 documents.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 https://github.com/jm-gimenez-garcia/LODRank</title>
      <sec id="sec-4-1">
        <title>Parsing the content of each document to extract the outlinks.</title>
        <p>A prototype tool4 has been developed to stream the contents of the
documents end extract the outlinks. This tool reads the list of pairs
(downloadURLresourceURL) by standard input, and accepts two optional parameters for
partial: Step and Start. The rst one tells how many lines the process reads in every
iteration, processing the last one, while the second denotes what line to use for
the rst input. For each line processed, it queries the SPARQL endpoint to
retrieve the URL where that datasets was crawled. This information can be found
in the LOD Laundromat ontology connected to the resource, in the case the
document was crawled as a single le, or connected to the archive that contains
the document, if it was crawled compressed in a compressed le, possibly along
other documents. In the rst case, we retrieve the URL with Query 1, in the
second case we retrieve the URL using Query 2, where %s is substituted by the
resourceURL. The Pay-Level Domain is then extracted and stored. This will be
considered as the identi er of the dataset.</p>
        <p>SELECT ?url
WHERE {&lt;%s&gt; &lt;http://lodlaundromat.org/ontology/url&gt; ?url}</p>
        <p>Query 1: Query to retrieve crawled URL of a non-archived document
SELECT ?url
WHERE {
}
?archive &lt;http://lodlaundromat.org/ontology/containsEntry&gt; &lt;%s&gt; .
?archive &lt;http://lodlaundromat.org/ontology/url&gt; ?url</p>
        <p>Query 2: Query to retrieve crawled URL of an archived document</p>
        <p>Then, the gzipped le is streamed from the downloadURL and parsed the
triples. The subject and object (in case it is a URI) are extracted the Pay-Level
Domain and compared against their dataset PLD. If they have a valid PLD
and is di erent from their dataset's Pay-Level Domain, the pair
(datasetPLDresourcePLD) is stored as an outlink for the dataset. The output of each dataset
is stored in a di erent le, which will be appended more pairs if a di erent
document is identi ed as the same dataset.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4 https://github.com/jm-gimenez-garcia/LODRank/tree/master/src/com/</title>
      <p>chemi2g/lodrank/outlink_extractor</p>
      <p>This process makes use of Apache Jena5 v3.0.1 to query the SPARQL
endpoint of the LOD Laundromat and Google Guava6 v19.0 to extract the Pay-Level
Domain of the datasets.</p>
      <p>In the experiments the process was launched in parallel in 8 virtual machines
using Google Cloud Platform7 free trial resources, each one processing a di
erent subset of the list downloaded in the previous step. A statistical description
of the results of each process, with and without considering predicates, is
detailed in Table 1. \Documents" correspond to the number of dump les in the
LOD Laundromat, while \Datasets" are the number of PLDs that the process
is dealing with. There can overlap in the datasets of several processes, so the
total number of datasets is not equal to the sum. We can see that the number of
triples processed by each process is not proportional to the number of documents
processed.</p>
      <p>Process Documents Triples Datasets (w. p.) Datasets (w/o. p.)
1 81,220 3,994,446,393 135 121
2 81,226 3,742,870,561 137 118
3 83,422 4,146,249,367 140 127
4 81,225 3,376,784,600 135 120
5 81,225 3,623,413,245 142 120
6 88,198 3,377,773,585 131 116
7 81,226 4,132,960,522 137 115
8 89,781 3,911,917,919 134 123</p>
      <p>Table 1: Data extracted from the LOD Laundromat by each process
4.3</p>
      <sec id="sec-5-1">
        <title>Consolidating the results</title>
        <p>Once the outlinks have been extracted, the di erent les have to be appended
and duplicated removed using a simple tool8. In the experiments, the data from
each virtual machine was downloaded in a separate folder of a unique machine.
Then les with the same name in each folder were concatenated and removed
the duplicates. The total number of datasets after consolidating the results is
412 when considering predicates, and 319 when not. The result was again
concatenated in a single le.
5 https://jena.apache.org/</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 https://github.com/google/guava</title>
    </sec>
    <sec id="sec-7">
      <title>7 https://cloud.google.com/</title>
    </sec>
    <sec id="sec-8">
      <title>8 https://github.com/jm-gimenez-garcia/LODRank/tree/master/src/com/</title>
      <p>chemi2g/lodrank/duplicate_remover</p>
      <sec id="sec-8-1">
        <title>Computing PageRank</title>
        <p>
          For PageRank computation we make use of the igraph R package [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The ordered
PageRank values for all datasets can be seen in Figure 2 and Figure 3, with a
logarithmic scale. The complete list of results is published online9. We can see
that in both cases the top-ranked dataset are very much higher than the rest,
then the slope becomes more regular until it reaches a plateau at the end, with
a minimum value shared by several datasets that have no inlinks at all. Tables
2 and 3 show the 10 highest ranked datasets.
        </p>
        <p>Discussion. Here we provide additional information about the datasets,
especially the top-ranked ones, in order to understand how ranking correlates with
other statistical values, such as number of triples, number of documents. We also
discuss how our own choices in uenced the results.</p>
        <p>The datasets appearing on the top 10 list are generally not surprising, with
the only exception of holygoat.co.uk, the only domain in the top 10 owned by
an individual person, Richard Newman, a computer scientist who wrote several
ontologies in the early days of the Semantic Web. This is even more remarkable
considering that the dataset has only 7 inlinks. The reason is that rdfs.org
includes resources from holygoat.co.uk. Because this dataset has only 2
outlinks, half of its PakeRank score is forwarded to holygoat.co.uk, which accrues
for 96% of its PageRank value.</p>
        <p>As predicted, when including predicates the rst positions incorporate more
datasets about vocabularies. When removing the predicates, w3.org, xmlns.com,
schema.org, and ogp.me no longer appear in the top positions, and datasets
with factual data move upwards. lodlaundromat.org seems to appear when
considering predicates because the LOD Laundromat adds information about
the cleaning process when processing the data. While not an optimum solution
(considering that purl.org and rdfs.org are still in the top positions), ignoring
the predicates proves to be a simple but useful technique.</p>
        <p>We used two queries, (Query 3 and Query 4), to obtain the number of
documents and triples for each PLD, from the LOD Laundromat.</p>
        <p>PREFIX llo: &lt;http://lodlaundromat.org/ontology/&gt;
PREFIX ll: &lt;http://lodlaundromat.org/resource/&gt;
SELECT (COUNT(DISTINCT ?resource) AS ?count)
WHERE {
}
{
}
?resource llo:url ?url
FILTER regex(?url, "[^/\\.]*\\.?%s/", "")
?archive llo:containsEntry ?resource ;</p>
        <p>llo:url ?url</p>
        <p>FILTER regex(?url, "[^/\\.]*\\.?%s/", "")</p>
        <p>Query 3: Query to retrieve the number of documents per dataset</p>
        <p>The result of the queries are given in Table 4 for all the datasets that appear
in the 10 top of both experiments.</p>
        <p>As we can see, popularity is not at all correlated with the size of the datasets.
Indeed, a number of the top ten datasets have less that 200 triples, while
dbpedia.org and europa.eu both have billions of triples.</p>
        <p>The enormously high page rank of purl.org should be mitigated by the
fact that purl.org does not actually host any data. It is a redirecting service
that many data publishers are using. This result highlights a drawback in our
heuristic for identifying datasets: the PLD is not always referring to a single
dataset. To overcome this particular case, we could consider the PLD of the
URL of the document obtained after dereferencing the IRI.</p>
        <p>Another possible drawback of the approach is that triples with rdf:type in
predicate position have their object pointing to a class in an ontology. This is
in contradiction with our remark in Section 2 where we say that we want to</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9 https://github.com/jm-gimenez-garcia/LODRank/tree/master/results</title>
      <p>PREFIX llo: &lt;http://lodlaundromat.org/ontology/&gt;
PREFIX ll: &lt;http://lodlaundromat.org/resource/&gt;
SELECT (COUNT(DISTINCT ?resource) AS ?count) (SUM(?triples) as ?sum)
WHERE {
}
UNION
{
?resource llo:url ?url ;</p>
      <p>llo:triples ?triples
FILTER (?triples &gt; 0)</p>
      <p>FILTER regex(?url, "[^/\\.]*\\.?%s/", "")
?archive llo:containsEntry ?resource ;</p>
      <p>llo:url ?url .</p>
      <p>?resource llo:triples ?triples
FILTER (?triples &gt; 0)</p>
      <p>FILTER regex(?url, "[^/\\.]*\\.?%s/", "")
Query 4: Query to retrieve the number of documents with triples and number of
triples
rank instance data rather than terminological knowledge. This can have a major
impact the results since purl.org is most often used to redirect to vocabularies
more than datasets, and rdfs.org only hosts ontologies.
5</p>
      <sec id="sec-9-1">
        <title>Related work</title>
        <p>
          The authors of Semantic Web Search Engine (SWSE [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]) strongly advocate that
the use of a ranking mechanism is very crucial for prioritizing data elements in the
search process. Their work is inspired by the Google PageRank algorithm, which
treats hyperlinks to other pages as a positive score. The PageRank algorithm
is targeted for hyperlink documents and its adaptation to the LOD is however
non-trivial, as we have seen. They point out that the primary reason for this
is that LOD datasets may not have direct hyperlinks to other datasets but
rather in most cases make use of implicit links to other web pages via the
reuse of dereferenceable URIs. Here the unit of search becomes the entity and
not the document itself. The authors brie y re-introduce the concept of naming
authority, from their previous work [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] in order to rank structured data from an
open distributed environment. They assume that the naming authority should
match the Pay-level domain such that computing PageRank is performed on a
naming authority graph where the nodes are PLDs. Their intuition therefore
is in accordance with our reasoning from Section 2. They have discussed and
contrasted the interpretation of naming authorities on a document level (e.g.
http://www.danbri.org/foaf.rdf) and a PLD level (danbri.org). Also, they
make use of a generalization for the method discussed in the paper [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for ranking
entities and carry out links analysis on the PLD abstraction layer.
        </p>
        <p>
          The authors of Swoogle [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] develop OntoRank algorithm in order to rank
documents. OntoRank, a variation of Google PageRank, is an iterative algorithm
for calculating the ranks for documents built on references to terms (i.e., classes
and properties) which are de ned in other documents.
        </p>
        <p>
          In the paper [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], the authors calculate the rank of entities (or as they call
them objects) based on the logarithm of the number of documents where that
particular object is mentioned.
        </p>
        <p>
          In their work [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] present LinkQA, an extensible data quality assessment
framework for assessing the quality of linked data mappings using the network
measures. For this, they assess the degree of interlinking of datasets using ve
network measures, out of which two network measures are speci cally designed
for Linked Data (namely, Open Same-As chains and Description Richness) and
the other three standard network measures (namely, degree, centrality, and the
clustering coe cient) in order to assess variation in the quality of the overall
linked data with respect to a certain set of links.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], PageRank is used to compute a measure that is in turn associated
to individual statements in datasets for the purpose of incorporating trust in
reasoning. Therefore, as in our own approach, they consider that PageRank is
an indication of trustworthiness. However, they only compute PageRank on a
per document basis, and report on the PageRank values of the top 10 documents
obtained from their web crawl.
6
        </p>
      </sec>
      <sec id="sec-9-2">
        <title>Conclusion &amp; Future work</title>
        <p>Data-driven question answering, the aim of project WDAqua mentioned in the
introduction to this paper, requires quality data in which one can trust. Our
aim has been to provide insight on how a trust measure can be based on dataset
interlinking. To that end, we consider Pay-Level Domains as identi ers of unique
datasets and compute PageRank on them. Our results show that the design
choices greatly a ect the results. Whether taking into account or not predicates
for outlink extraction impacts how vocabularies are ranked, and the choice of
PLD as de nition of dataset seems questionable, as some PLDs group many data
dumps. In order to improve this, we could associate well known datasets to IRI
patterns, such as it.dbpedia.org for the Italian version of DBpedia.</p>
        <p>
          In addition, we also intend to explore further applications of PageRank that
may be useful for question answering. User interaction that provides trust
values in a number of dataset could be used to compute PageRank values with
those datasets as a teleport set, as suggested by Gyongyi et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Also,
Topic-Sensitive PageRank [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] could help a question-answering system to select
di erent datasets when a question is identi ed to belong to a speci c topic.
        </p>
        <p>Finally, this work is part of a broader objective that we want to pursue:
to ascertain the relationship between the perceived trust on a dataset and its
objective quality. We will explore this area in a future work where other data
reuse metrics will be considered and compared against di erent quality metrics.</p>
      </sec>
      <sec id="sec-9-3">
        <title>Acknowledgement</title>
        <p>This project is supported by funding received from the European Unions Horizon
2020 research and innovation program under the Marie Sklodowska-Curie grant
agreement No 642795. We would like to thank Elena Simperl, whose idea
jumpstarted the project that lead to this article, and also Elena Demidova, Kemele
Endris, and Christoph Lange for the useful discussions related to it.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Beek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rietveld</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bazoobandi</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wielemaker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>LOD laundromat: A uniform way of publishing other people's dirty data</article-title>
          .
          <source>In: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          . Proceedings, Part I. pp.
          <volume>213</volume>
          {
          <issue>228</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bonatti</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sauro</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Robust and scalable Linked Data reasoning incorporating provenance and trust annotations</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>9</volume>
          (
          <issue>2</issue>
          ),
          <volume>165</volume>
          {
          <fpage>201</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] Cheng, G.,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Searching Linked Objects with Falcons: Approach, Implementation and Evaluation</article-title>
          .
          <source>International Journal of Semantic Web and Information Systems</source>
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <volume>49</volume>
          {
          <fpage>70</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Csardi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nepusz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The igraph software package for complex network research</article-title>
          .
          <source>InterJournal, Complex Systems 1695(5)</source>
          , 1{
          <issue>9</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Debattista</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , London~o,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Quality assessment of linked datasets using probabilistic approximation</article-title>
          . In: Gandon,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Sabou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sack</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudre-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . (eds.)
          <article-title>The Semantic Web</article-title>
          .
          <source>Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC</source>
          <year>2015</year>
          , Portoroz, Slovenia, May 31 - June 4,
          <year>2015</year>
          .
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>9088</volume>
          , pp.
          <volume>221</volume>
          {
          <fpage>236</fpage>
          . Springer (
          <year>2015</year>
          ), http://dx.doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -18818-8_
          <fpage>14</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Characterizing the semantic web on the web</article-title>
          .
          <source>In: International Semantic Web Conference. Lecture Notes in Computer Science</source>
          , vol.
          <volume>4273</volume>
          , pp.
          <volume>242</volume>
          {
          <fpage>257</fpage>
          . Springer (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cost</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddivari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doshi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachs</surname>
          </string-name>
          , J.:
          <article-title>Swoogle: a search and metadata engine for the semantic web</article-title>
          .
          <source>In: Proceedings of the thirteenth ACM international conference on Information and knowledge management</source>
          . pp.
          <volume>652</volume>
          {
          <fpage>659</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Finding and ranking knowledge on the semantic web</article-title>
          .
          <source>In: The Semantic Web{ISWC</source>
          <year>2005</year>
          , pp.
          <volume>156</volume>
          {
          <fpage>170</fpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Ermilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Linked open data statistics: Collection and exploitation</article-title>
          .
          <source>In: Knowledge Engineering and the Semantic Web - 4th International Conference, KESW</source>
          <year>2013</year>
          ,
          <article-title>St</article-title>
          . Petersburg, Russia, October 7-
          <issue>9</issue>
          ,
          <year>2013</year>
          . Proceedings. pp.
          <volume>242</volume>
          {
          <issue>249</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            nez-Prieto,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arias</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Binary RDF Representation for Publication and Exchange (HDT)</article-title>
          .
          <source>Journal of Web Semantics</source>
          (
          <year>2013</year>
          ), http://dataweb.infor.uva. es/wp-content/uploads/2013/01/jws2013.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gueret</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Assessing linked data mappings using network measures</article-title>
          .
          <source>In: The Semantic Web: Research and Applications - 9th Extended Semantic Web Conference, ESWC</source>
          <year>2012</year>
          , Heraklion, Crete, Greece, May
          <volume>27</volume>
          -31,
          <year>2012</year>
          . Proceedings. pp.
          <volume>87</volume>
          {
          <issue>102</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] Gyongyi,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.O.</surname>
          </string-name>
          :
          <article-title>Combating web spam with trustrank</article-title>
          .
          <source>In: (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases</source>
          , Toronto, Canada,
          <source>August 31 - September 3 2004</source>
          . pp.
          <volume>576</volume>
          {
          <issue>587</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kinsella</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Using naming authority to rank data and ontologies for web search</article-title>
          .
          <source>In: The Semantic Web - ISWC</source>
          <year>2009</year>
          , 8th International Semantic Web Conference,
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2009</year>
          ,
          <article-title>Chantilly</article-title>
          ,
          <string-name>
            <surname>VA</surname>
          </string-name>
          , USA, October
          <volume>25</volume>
          -
          <issue>29</issue>
          ,
          <year>2009</year>
          . Proceedings. pp.
          <volume>277</volume>
          {
          <issue>292</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Haveliwala</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          :
          <article-title>Topic-sensitive pagerank</article-title>
          .
          <source>In: Proceedings of the Eleventh International World Wide Web Conference, WWW 2002, May 7-11</source>
          ,
          <year>2002</year>
          , Honolulu, Hawaii. pp.
          <volume>517</volume>
          {
          <issue>526</issue>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kinsella</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>9</volume>
          (
          <issue>4</issue>
          ) (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An empirical survey of Linked Data conformance</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>14</volume>
          ,
          <issue>14</issue>
          {
          <fpage>44</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>d'Aquin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
          </string-name>
          , E.:
          <article-title>Towards linked data fact validation through measuring consensus</article-title>
          . In: Rula,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Zaveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Knuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 2nd Workshop on Linked Data Quality co-located with 12th Extended Semantic Web Conference (ESWC</source>
          <year>2015</year>
          ), Portoroz, Slovenia, June 1,
          <year>2015</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1376</volume>
          . CEURWS.org (
          <year>2015</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1376</volume>
          /LDQ2015_paper_04.pdf
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winograd</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The PageRank citation ranking: bringing order to the web</article-title>
          . (
          <year>1999</year>
          ), http://ilpubs.stanford.edu:
          <volume>8090</volume>
          /422/1/1999-
          <fpage>66</fpage>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Improving the quality of linked data using statistical distributions</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>10</volume>
          (
          <issue>2</issue>
          ),
          <volume>63</volume>
          {
          <fpage>86</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Rietveld</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>LOD lab: Experiments at LOD scale</article-title>
          .
          <source>In: The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference</source>
          , Bethlehem, PA, USA, October
          <volume>11</volume>
          -
          <issue>15</issue>
          ,
          <year>2015</year>
          , Proceedings, Part II. pp.
          <volume>339</volume>
          {
          <issue>355</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Rietveld</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sande</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlobach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Linked data-as-a-service: The semantic web redeployed</article-title>
          .
          <source>In: The Semantic Web. Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC</source>
          <year>2015</year>
          , Portoroz, Slovenia, May 31 - June 4,
          <year>2015</year>
          . Proceedings. pp.
          <volume>471</volume>
          {
          <issue>487</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Schmachtenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Adoption of the linked data best practices in di erent topical domains</article-title>
          . In: Mika,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.A.</given-names>
            ,
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.T.</given-names>
            ,
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.F.</given-names>
            ,
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Goble</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.A</surname>
          </string-name>
          . (eds.)
          <source>The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          .
          <source>Proceedings, Part I. Lecture Notes in Computer Science</source>
          , vol.
          <volume>8796</volume>
          , pp.
          <volume>245</volume>
          {
          <fpage>260</fpage>
          . Springer (
          <year>2014</year>
          ), http://dx.doi.org/10. 1007/978-3-
          <fpage>319</fpage>
          -11964-9_
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Thakkar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Endris</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimenez-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.M.</given-names>
            ,
            <surname>Debattista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Are linked datasets t for open-domain question answering? a quality assessment (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sande</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coppens</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>de</surname>
            <given-names>Walle</given-names>
          </string-name>
          , R.V.:
          <article-title>Web-scale querying through linked data fragments</article-title>
          . In: Bizer,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>T</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW</source>
          <year>2014</year>
          ), Seoul, Korea, April 8,
          <year>2014</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1184</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2014</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1184</volume>
          /ldow2014_paper_04.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>