<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assessing Quantity and Quality of Links Between Linked Data Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ciro Baron Neto</string-name>
          <email>cbaron@informatik.uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leipzig University</institution>
          ,
          <addr-line>AKSW/KILT</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Source Dataset eagle-i @ Ponce - School of Medicine Radata na! eagle-i @ Vanderbilt University I-Choose The Cancer Genome Atlas</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Linked Data Web is growing and it becomes increasingly necessary to analyze the relationship between datasets to exploit its full value. LOD datasets can range from datasets with low cohesion { containing data from di erent Fully Quali ed Domain Names (FQDN) and namespaces { to highly cohesive datasets. This paper evaluates the quantity and quality of links between distributions, datasets and ontologies categorizing and de ning di erent types of links. We streamed and indexed 2.5 billion triples and extracted 0.5 billion links using probabilistic data structures. Our results show the analysis of datasets w.r.t. valid links, dead links, and number of namespaces described by distributions and datasets. Our results indicate that 7.9% of the links we indexed and veri ed are actually dead.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Linked Open Data</kwd>
        <kwd>Linksets</kwd>
        <kwd>Dead Links</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper, we present a thorough analysis of the links
between the datasets participating in the 2014 LOD cloud[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
and Linked Open Vocabularies2. The aim of this paper
was to evaluate the quality of links between these
knowledge bases. The analysis was conducted with the engine of
LODVader3, a real-time LOD Visualisation, Analytics and
DiscovEry tool. Our novel approach based on Bloom- lters
allows us to accurately measure the exact number of links
between datasets and distributions, as well as identify dead
and unveri ed links (cf. section 2) between datasets.
The remainder of this work is structured as follows: We
provide a description of metadata vocabularies, link
granularity and linksets in Section 2, followed by the methodology
used details in Section 3. Section 4 describes the results of
our analysis and in Section 5 we present the related works.
1http://lod-cloud.net/#history
2http://lov.okfn.org/dataset/lov/
3For the interface see http://svn.aksw.org/papers/2016/
WWW_LODVader_DEMO/public.pdf
ID1
      </p>
      <p>S1</p>
      <p>D1;1
D1;2
D1;3
.
.</p>
      <p>.</p>
      <p>D1;j</p>
      <p>Lreal
L1</p>
      <p>L2</p>
      <p>D2;1
D2;2
D2;3
.
.</p>
      <p>.</p>
      <p>D2;k</p>
      <p>Lreal</p>
      <p>L4</p>
      <p>L3</p>
      <p>Dn;1
Dn;2
Dn;3
.
.</p>
      <p>.</p>
      <p>Dn;m
Finally, in Section 6 we present the future works and our
conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>BACKGROUND</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Dataset Metadata Vocabularies</title>
      <p>
        In order to identify which resources should be streamed and
analyzed, this work relies on vocabularies such as DCAT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
VoID 4 and DataID [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These vocabularies are used to
represent metadata descriptions of datasets. They provide
information about multiple properties of a dataset, including
subsets and distributions. A subset is a distinct part of a
dataset that can be di erentiated for a number of reasons,
such as di erences in provenance, publication dates,
accessibility or language5. Distributions describe the speci c les
or resources by which the datasets might be accessed or
acquired6.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Linkset Definition</title>
      <p>Linksets are RDF descriptions of relations between datasets
or distributions, represented by links. We adopted the DCAT
and VoID vocabulary to describe the number of links, as well
as source and target datasets. In order to clarify the de
nition of the existing variables for a linkset, a brief explanation
is given.</p>
      <p>ID: a dataset, described by void:Dataset or
dcat:Dataset;
SID: the set of subsets, described by void:subset of
given dataset ID
&lt; s; p; o &gt;: the RDF triple which represents the
subject s, predicate p and object o for a given relation
dn: the n-th distribution consisting of a set of RDF
triples.</p>
      <p>DID: the set of distributions, described by
dcat:distributions, of the dataset ID
DSID : the set of distributions of subset S of dataset
ID
Lds!dt : the set of existing links between two
distributions, having ds as source distribution and dt as target
distribution. We de ne that a link occurs from a
distribution ds to a distribution dt whenever ds contains
&lt; ss; ps; os &gt; and dt contains &lt; st; pt; ot &gt; where
os = st. We then call the triple &lt; ss; ps; os &gt; in the
4http://www.w3.org/TR/void/
5http://www.w3.org/TR/void/#subset
6http://www.w3.org/TR/vocab-dcat/
#class-distribution
source distribution a link (regardless of the used
property) and say that the distributions are linked with
each other (cf. Section 2.4). From this de nition it
easily follows that linksets between distributions
(subsets or datasets) can be aggregated in a straightforward
manner. Consequently, a dataset IDs is linked to
another dataset IDt, if a non-empty linkset from any
distribution DSIDs to DSIDt exists.</p>
      <p>Furthermore, we de ne the following notions in order to
describe dead or unveri ed links. A dead link on the WWW is
generally associated with a HTTP 404 Not Found response
message. Analogously, we de ne "Not Found" between a
distribution and a dataset:</p>
      <p>N Sn(uri): The namespace of a URI, whereas N S0
refers to FDQN (incl. subdomain), N Sx refers to FDQN
plus the URI path of length n and N S refers to the
FDQN plus the path until and including the last '/' or
'#'. In this paper, we work with N S or simply N S
only, although other research would be interesting.
SN SS(D) the set of N S (st) for all the subjects in all
distributions of dataset D.</p>
      <p>A partial dead link &lt; s1; p1; o1 &gt; between a
distribution d1 and a dataset D exists if N S(o1) 2 SN SS(D)
and @ triple t 2 D j o1 = st. Note that this de nition is
based on the assumption that namespaces are unique
to datasets. Given that there are several datasets with
applicable namespaces, a total dead link or just dead
link means that the respective object is not found as
subject in any {already indexed{ dataset with
overlapping namespaces.</p>
      <p>An unveri ed link &lt; s1; p1; o1 &gt; exists if N S(o1) can
not be found in any indexed dataset, i.e. there are no
overlapping namespaces. As we are not investigating
HTTP resolution, we have to assume bona de that
we just have not indexed the target dataset yet.</p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Link Granularity</title>
      <p>
        The LOD cloud diagram[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] assumes as the basis for a dataset
de nition the Pay-Level Domain (PLD) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It consequently
only depicts inter-dataset relations as links. LODVader also
o ers visualisation and analysis of intra-dataset
relationships, for example between subsets and distributions,
featuring a higher link granularity. Figure 1 shows an overview
of links at di erent levels of granularity regarding a linkset
representation. Datasets are represented by IDn, subsets
are represented by Sn and distributions are represented by
Dn. Lreal is a linkset containing links between two
distributions which are measured on the intersection of subjects
and objects (cf. Section 2.2 ). The linksets L1 to L4 can be
generated by calculating the union of the linksets between
all distributions of the respective subsets and datasets.
      </p>
    </sec>
    <sec id="sec-6">
      <title>2.4 Linking Predicates</title>
      <p>
        Common approaches for linking analysis rely on the
inspection of the predicates. owl:sameAs has well-de ned formal
semantics and is the predicate which is closest to traditional
deduplication. For record linkage or object reconciliation
in the database area, counting owl:sameAs links exclusively
provides a very limited view of the Web of Data and does
not provide a reliable model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Several other properties have been proposed with
rdfs:seeAlso and skos: { exact | close | broad | narrow
| related} Match being the most common. In our work,
we are tolerant and consider all predicates for linking. While
for crawling link direction is important { although DBpedia
is the largest authority [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], no backlinks are included { we
argue that linking properties is often either symmetric (and
highly unlikely to be asymmetric) or it is feasible to assume
that an inverse property exists or could be easily created, i.e.,
following a birthplace$isBirthplaceOf pattern or simply
birthplace 1.
      </p>
      <p>To the best of our knowledge, we have not encountered
predicates expressing negative links yet (i.e. notLinkedTo).
Vocabulary Links. another aspect of linking properties that
is often neglected are links to vocabularies and links between
vocabularies. Especially, the linkage via rdf:type has not
yet been visualized in a cloud diagram and is often not
included in link analysis.</p>
    </sec>
    <sec id="sec-7">
      <title>3. METHODOLOGY</title>
      <p>We parsed description les from Linked Open Vocabularies7,
DBpedia datasets and from the LOD cloud searching for
instances of dcat:Distribution, henceforth called source
distribution. The application then fetches the dcat:downloadURL
or void:dataDump object. Before the download of the source
distribution is started, it is checked whether the dataset has
already been imported into the system. If the dataset is
known, the system reads the Last-Modi ed date and
ContentLength in the HTTP header to verify whether the dataset
has not been changed. If there are modi cations, the old
data is moved to an archive, in order to use it for
versioning reasons. Once the streaming starts, we detect the
serialization type, possibly decompress the stream and parse
the RDF triples. It's important to emphasize that since
LODVader is publicly available, more and more datasets are
added and analyzed.</p>
      <p>The process of Link Discovery is made on the y for each
distributions streamed. For every triple, the Linking Analytics
modules discards the predicate and takes only the subject
and the object as input (&lt; s; o &gt;). If the object is a literal
or a blank node the tuple is discarded. As a nal ltering
step, we reject tuples with malformed IRIs. The tuples that
pass the ltering step, enter a processing pipeline:
7http://lov.okfn.org/dataset/lov/
1. Tuple splitting. subjects and objects of each tuple
are separated and saved in two queues. The queues
contain resources which will be compared with Bloom
lters (BFs).
2. BF Fetching. we extract the namespace of each
resource to compare and assign the resource to a
respective BF which will represent a target distribution. For
every namespace we encounter, we fetch all the existing
BFs that are processed and stored in a cache memory.
3. Link Extraction. objects and subjects of the source
distribution are compared with the in-memory BFs of
the target distributions. If an object of the source
distribution exists in the BF of the target distribution as
a subject we count one link between the source
distribution and the target distribution. If the opposite way
happens, i.e. if subject of the source distribution exists
in the BF of the target distribution as an object we
count one link between the target distribution and the
source distribution. The non-existence of link between
a source distribution and a target dataset is counted
as a dead link between the source distribution and the
target dataset.</p>
      <p>At the end of the pipeline two sets of BFs are created. A
set containing all subjects and a second set containing all
objects of the source distribution. These BFs will represent
the current distribution and might be used later when other
sources distributions are streamed.</p>
      <p>It is important to stress that, although our model reads and
retrieves RDF data, it does not store any RDF. Our
implementation creates RDF on the y reading documents from
MongoDB and using Apache Jena to create RDF models.
All BF stored have the same size (each BF describes 5000
resources), making the time to query any resource from any
distribution be quasi-linear time complexity. For big
distributions with more than 5000 triples, multiple BFs are
created. In addition, the BFs are not stored directly to the le
system, but using GridFS8 to manage the BF les. A more
detailed documentation in regard to the implementation can
be found on the LODVader GitHub9 repository.</p>
    </sec>
    <sec id="sec-8">
      <title>4. RESULTS</title>
      <p>In order to make a general analysis of quantity and quality
of Linked Data datasets, we streamed all datasets found in
the metadata description le of the The Linking Open Data
cloud diagram 201410, the DBpedia Core11 distributions and
all vocabularies found on Linked Open Vocabularies12. At
the time of writing, we discovered13 185 million veri ed links
(out of 0.5 billion links in total) among 1408 datasets and 395
vocabularies, totalizing more than 2.5 billion triples. These
numbers grow, since more users start to provide good
metadata and it's possible for users to submit their datasets to
our analysis.
8https://docs.mongodb.org/manual/core/gridfs/
9https://github.com/AKSW/LODVader
10http://data.dws.informatik.unimannheim.de/lodcloud/2014/ISWC-RDB/
11http://downloads.dbpedia.org/current/core/
12http://lov.okfn.org/dataset/lov/
13http://lodvader.aksw.org/#/stats</p>
      <sec id="sec-8-1">
        <title>Name</title>
        <p>Educational programs - SISVU
statistics.data.gov.uk
Farmers Markets Geo. Data (U.S.)
VIVO Weill Cornell Medical College
VIVO WUSTL
. . .
. . .
eagle-i @ Dartmouth College
TaxonConcept Knowledge Base
eagle-i @ Montana State University
The Living LOD Cloud
Ontos News Portal
Our result analysis consists of three steps. First, in order
to know whether a dataset is suitable or not to describe
certain resource (e.g., subjects or objects), we extracted all
namespaces with their respective proportion on the datasets.
Following, we calculated the number of indegree and
outdegree per datasets, and nally, we calculated the indegree
and outdegree of dead links among datasets. Our metric for
indegree and outdegree are the number of datasets which
contains one or more link to or from the current dataset.
Several datasets describe a single namespace, however more
than 70% of datasets describes two or more. Table 1 shows
datasets with the biggest and smallest proportions of
described namespaces. The column "# N S " contains the
number of distinct namespaces for the dataset, and the last
column shows the proportion of the predominant
namespace. The top 5 rows show datasets with highly
predominant namespaces, and the last 5 rows show the datasets with
completely mixed namespaces.
Table 4 and Table 5 shows the top 5 datasets with dead
indegree links, and top 5 datasets with dead outdegree links.
Dead indegree means that external datasets link to
nonexisting resources of a dataset. Dead outdegree refers to
dataset that link to external dead links. The in and out
degree is aggregated at the dataset level and the links provides
the total number of dead links.</p>
      </sec>
      <sec id="sec-8-2">
        <title>Source Dataset</title>
        <p>DBpedia Core
eagle-i @ Dartmouth College
eagle-i @ Uni. Alaska
eagle-i @ Charles R. Drew Uni.
TaxonConcept Knowledge Base</p>
      </sec>
      <sec id="sec-8-3">
        <title>Target Dataset</title>
        <p>The Living LOD Cloud
TaxonConcept Knowledge Base
VIVO Cornell
eagle-i @ Jackson State University
Traditional Korean Medicine Ont.</p>
      </sec>
      <sec id="sec-8-4">
        <title>Outdegree</title>
        <p>13
13
21
41
1</p>
      </sec>
      <sec id="sec-8-5">
        <title>Target Dataset</title>
        <p>The Media RDF Vocabulary
Document Availability Information Ont
VIVO Core Ontology
An Ontology for vCards
Conversion Ontology</p>
        <p>Finally, Figure 2 provides an overview of the total correct
links, dead links and unveri ed links. In total, we have
found 302,855,189 unveri ed links, 12,430,800 dead links and
172,254,731 links. The large number of unveri ed links is
due the fact that our coverage is not so broad, and it's still
getting wider since new datasets are added. It is worth
noting though that 7.9% of the veri ed links are dead links.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>5. RELATED WORK</title>
      <p>
        Most LD (link discovery) frameworks can only determine
links based on owl:sameAs or equivalent instances. However,
RDF-AI[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a framework which takes two datasets as input,
and as outcome generates a new dataset where the content
is a list of correspondences between equivalent resources of
the input datasets. The system is composed of ve modules
      </p>
      <sec id="sec-9-1">
        <title>Dead links 3% 62% 35%</title>
      </sec>
      <sec id="sec-9-2">
        <title>Unveri ed links</title>
        <p>which allows pre-process, match, fusion, inter-link and
postprocess RDF datasets.</p>
        <p>
          Due to strong growth of the LOD cloud it is obvious that
there is a demand for LOD cloud analytical frameworks.
Some statistical information can be found together with the
LOD cloud diagram [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Unfortunately the statistical
information are also static.
        </p>
        <p>
          Another good example is Aether [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. It supplies the user
with many di erent statistical information for datasets when
supplied with a SPARQL endpoint address. It is even
possible to compare di erent SPARQL endpoints, which can
be useful if two di erent endpoints should be analyzed.
Although this framework supplies the user with great
statistical information and pie charts, it is only developed for
comparing the content between two SPARQL endpoints.
LOD-Laundromat[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] provides an uniform way to publish and
clean datasets. Di erent statistical data is published, like
duplicated triples, amount of triples, dataset size and other.
The LOD-Laundromat contains over 38 billion triples,
however the issue is that they do not provide metadata
regarding dataset labels, name or title, making the whole graph
visualization a hard task.
        </p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>6. CONCLUSIONS AND FUTURE WORK</title>
      <p>This paper classi ed and evaluated links among more than
1,200 datasets w.r.t. dataset indegree and outdegree for
different types of links. We discovered a total of 0.5 billion
links out of which 12.5M were dead and we could not verify
302M links. This suggests that around 7.9% of the veri ed
LOD links we indexed are dead. This number is based on
current coverage of indexed datasets of our analysis.
Indexing new datasets can raise this number (if more dead links
are discovered) as well as lower it (if a dataset is indexed
that contains link targets). However, we already invested
a lot of e ort into discovering as many datasets as possible
and assume that an average linked data consumer would not
go to such lengths to retrieve data.</p>
      <p>
        In order to expand the coverage of our analysis, we expect to
work in collaboration with other approaches such as
LODLaundromat[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We believe that at least the amount of
unveri ed links might be reduced as more dataset will be
added.
      </p>
      <p>An area we would like to reasearch on is to identify
authoritative namespaces for datasets. This would make it easier to
identify if a resource is described in an authoritative dataset
or a dataset hijacks a namespace. This could provide ways
to further analyze the quality of links and would also help
to de ne best practices based on de-facto linking.
Acknowledgement. This paper's research activities were
funded by grants from the FP7 &amp; H2020 EU projects LIDER
(GA-610782) and ALIGNED (GA 644055), FREME
(GA644771), Smart Data Web (GA-01MD15010B) and CAPES
foundation - Ministry of Education of Brazil (13204/13-0).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Beek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rietveld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bazoobandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schlobach</surname>
          </string-name>
          .
          <article-title>Lod laundromat: A uniform way of publishing other people's dirty data</article-title>
          .
          <source>In ISWC 2014, Lecture Notes in Computer Science</source>
          , pages
          <volume>213</volume>
          {
          <fpage>228</fpage>
          . Springer International Publishing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bru</surname>
          </string-name>
          mmer, C. Baron, I. Ermilov,
          <string-name>
            <given-names>M.</given-names>
            <surname>Freudenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Hellmann.</surname>
          </string-name>
          <article-title>DataID: Towards Semantically Rich Metadata for Complex Datasets</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Semantic Systems, SEM '14</source>
          , pages
          <fpage>84</fpage>
          {
          <fpage>91</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. J. Chris</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <article-title>State of the lod cloud</article-title>
          .,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fetterly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Manasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Najork</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wiener</surname>
          </string-name>
          .
          <article-title>A large-scale study of the evolution of web pages</article-title>
          .
          <source>In Proceedings of the 12th International Conference on World Wide Web, WWW '03</source>
          , pages
          <fpage>669</fpage>
          {
          <fpage>678</fpage>
          , New York, NY, USA,
          <year>2003</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. Z.</given-names>
            <surname>Francois Schar</surname>
          </string-name>
          <string-name>
            <surname>e</surname>
          </string-name>
          , Yanbin Liu.
          <article-title>Rdf-ai: an architecture for rdf datasets matching, fusion and interlink</article-title>
          .
          <source>IJCAI</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Halpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCusker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and H. S.</given-names>
            <surname>Thompson</surname>
          </string-name>
          .
          <article-title>When OWL: sameAs Isn't the Same: An Analysis of Identity in Linked Data</article-title>
          .
          <source>In ISWC</source>
          , pages
          <volume>305</volume>
          {
          <fpage>320</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lehmberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meusel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Graph Structure in the Web: Aggregated by Pay-level Domain</article-title>
          .
          <source>In Proceedings of the 2014 ACM Conference on Web Science</source>
          ,
          <source>WebSci '14</source>
          , pages
          <fpage>119</fpage>
          {
          <fpage>128</fpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Maali</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Erickson</surname>
          </string-name>
          .
          <article-title>Data Catalog Vocabulary (DCAT). W3C recommendation, W3C</article-title>
          , Jan.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ma</surname>
          </string-name>
          <article-title>kela. Aether { generating and viewing extended void statistical descriptions of rdf datasets</article-title>
          .
          <source>In Proceedings of the ESWC 2014 demo track</source>
          , Springer-Verlag,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>SalahEldeen and M. L. Nelson</surname>
          </string-name>
          .
          <article-title>Losing my revolution: How many resources shared on social media have been lost? CoRR</article-title>
          , abs/1209.3026,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmachtenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Adoption of the Linked Data Best Practices in Di erent Topical Domains</article-title>
          .
          <source>In ISWC 2014</source>
          , pages
          <fpage>245</fpage>
          {
          <fpage>260</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>