<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pro ling the Linked (Open) Data</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Universita degli Studi di Milano-Bicocca</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The number of datasets published as Linked (Open) Data is constantly increasing with roughly 1000 datasets as of April 2014. Despite this number of published datasets, their usage is still not exploited as they lack comprehensive and up to date metadeta. The metadata hold signi cant information not only to understand the data at hand but they also provide useful information to the cleansing and integration phase. Data pro ling techniques can help generating metadata and statistics that describe the content of the datasets. However the existing research techniques do no cover a wide range of statistics and many challenges due to the heterogeneity nature of Linked Open Data are still to overcome. This paper presents the doctoral research which tackles the problems related to Linked Open Data Pro ling. We present the proposed approach and also report the initial results.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Open Data</kwd>
        <kwd>Pro ling</kwd>
        <kwd>Data Quality</kwd>
        <kwd>Topical Classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        With 12 datasets in 2007, the Linked Open Data cloud has grown to more than
1000 datasets as of April 2014 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], a number that is constantly increasing. The
datasets to be published need to adopt a series of rules in a way that it would be
simple for them to be searched and queried [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The datasets should be published
adapting W3C standarts in RDF1 format and made available for SPARQL2
endpoint queries. Adapting these rules allow di erent data sources to be connected
by typed links which are useful to extract new knowledge as linked datasets do
not have the same information. Even though the Linked Open Data is
considered a gold mine, its usage is still not exploited as understanding a large and
unfamiliar RDF dataset is still a key challenge. As a result of a lack of
comprehensive descriptive information the consumption of these dataset is still low.
Data pro ling techniques support data consumption and data integration with
statistics and useful metadata about the content of the datasets. While
traditional pro ling techniques solve many issues these techniques can not be applied
to heterogeneous data such as Linked Open Data. Data pro ling techniques in
the context of Linked Open Data are very important for di erent tasks:
1 http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
2 http://www.w3.org/TR/rdf-sparql-query/
Complex schema discovery. Schema complexity leads to di culties to
understand and access databases. Schema summaries provide users a concise
overview of the entire schema despite its complexity.
      </p>
      <p>Ontology / schema integration. Ontologies published on the Web, even for
datasets in similar domains can have di erences. Data pro ling techniques
can help understanding the overlap between ontologies and help in the
process of ontology creation, maintenance and integration.</p>
      <p>Big knowledge bases and provide a landscape view. Data pro ling
techniques can help identifying some core knowledge patterns (KP) which reveal
a piece of knowledge in a domain of interest.</p>
      <p>Inspect large datasets to nd quality issues. Data pro ling tools allow the
inspection of large datasets for detecting quality issues, by identifying the
cases that do not follow business rules, outliers detection, residuals, etc.
Data integration. To perform a data integration process, one should consider
schema mapping, the process of discovering relationships between schemas.
Pro ling techniques can reveal mappings between classes and properties,
helping the integration process.</p>
      <p>Entity summarization. Finding features that best represent the topic/s of a
given dataset can help not only the topical classi cation of the dataset but
also understanding the semantic of the information found in the data.
Data visualization for summarization. Pro ling techniques can support data
visualization tools to visualize large multidimensional datasets by displaying
only a small and concise summary of the most relevant and important
features, enhancing the comprehension of the user by allowing him to dig into
the data by zooming in or out the provided summary.</p>
      <p>In this proposal we will focus on the pro ling techniques to summarize the
content of a dataset and reveal data quality problems. Moreover we will propose
pro ling techniques combined with data mining algorithms to nd useful and
relevant features to summarize the content of datasets published as Linked Open
Data and also techniques that reveal quality issues in the data. The dataset
summarization can be used not only to detect if the dataset is useful or not, but
also to provide useful information to the cleansing and integration phase.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>Statistics and summaries can help to describe and understand large RDF data.
Most of the existing pro ling tools, support traditional databases which are
homogeneous and have a well-de ned schema. These techniques can not be applied
to Linked Open Data due to their heterogeneity and the lack of a well-de ned
schema. As it will be discussed most of the existing techniques to pro le Linked
Open Data are limited in few statistics and summaries covering only one task.</p>
      <p>
        Roomba [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a framework to automatically validate and generate descriptive
dataset pro les. The extracted metadata are grouped into four categories
(general, access, ownership or provenance) depending on the information they hold.
After metadata extraction some validation and enrichments steps are performed.
Metadata validation process identi es missing information and automatically
corrects them when it is possible. As an outcome of the validation process, a
report is produced which can be automatically sent to the dataset maintainer.
      </p>
      <p>
        The ExpLOD [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] tool is used to summaries a dataset based on a
mechanism that combines text labels and bisimulation contractions. It considers four
RDF usages that describe interactions between data and metadata, such as class
and predicate instantiation, class and predicate usage on which it creates RDF
graphs. It also uses the owl:sameAs links to calculate statistics about the
interlinking between datasets. The ExpLOD summaries are extracted using SPARQL
queries or algorithms such as partition re nement.
      </p>
      <p>
        RDFStats [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] generates statistics for datasets behind SPARQL endpoint and
RDF documents. It is built on Jena Semantic Framework and can be executed as
a stand-alone process, important to optimize SPARQL queries. These statistics
include the number of anonymous subjects and di erent types of histograms;
URIHistogram for URI subject and histograms for each property and the
associated range(s). It uses also methods to fetch the total number of instances for
a given class, or a set of classes and methods to obtain the UIRs of instances.
      </p>
      <p>
        LODStats [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a pro ling tool which can be used to obtain 32 di erent
statistical criteria for datasets from Data Hub. These statistics describe the dataset
and its schema and include statistics about number of triples, triples with blank
nodes, labeled subjects, number of owl:sameAs links, class and property usage,
class hierarchy depth, cardinalities etc. These statistics are then represented
using Vocabulary of Interlinked Datasets (VoID)3 and Data Cube Vocabulary4.
      </p>
      <p>
        ProLOD [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a web based tool which analyzes the object values of RDF
triples and generates statistics upon them such as data type and patterns
distribution. In ProLOD the type detection is performed using regular expression
rules and normalized patterns are used to visualize huge numbers of di erent
patterns. ProLOD also generates statistics on literal values and external links.
ProLOD++5 which is an extension of ProLOD is also a browser based tool
which implements several algorithms with the aim to compute di erent pro
ling, mining or cleansing tasks. In the pro ling task are included processes to nd
frequencies and distribution of distinct subjects, predicates and objects, range
of the predicates etc. ProLOD++ can also identify predicates combinations that
contain only unique values as key candidates to distinctly identify entities. The
implementation of mining tasks cover processes such as synonym and inverse
predicate discovering, association rules on subjects, predicates and objects, etc.
It also performs some cleansing tasks such as auto completions of new facts for
a given dataset, ontology alignment in identifying predicates which are synonym
or identifying cases where the pattern usage is over speci ed or underspeci ed.
      </p>
      <p>Pro ling as the activity of providing insights through the data, is not only
about providing statistics about value distribution, null values etc, but also is
referred to the process of nding and extracting information patterns in the data.
3 http://www.w3.org/TR/void/
4 http://www.w3.org/TR/vocab-data-cube/
5 https://www.hpi.uni-potsdam.de/naumann/sites/prolod++/app.html</p>
      <p>
        In the area of schema summarization Knowledge Patterns (KP) can be
dened as a template to organise meaningful knowledge [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The approach in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
identi es an abstraction named dataset knowledge architecture that highlights
how a dataset is organized and which are the core knowledge patterns (KP) we
can retrieve from that dataset. These KPs summarise the key features of one or
more datasets, revealing a piece of knowledge in a certain domain of interest.
      </p>
      <p>
        Encyclopedic Knowledge Patterns (EKP) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are some knowledge patterns
introduced to extract core knowledge for entities of a certain type from Wikipedia
page links. EKPs are extracted from the most representative classes describing a
concept and containing abstraction of properties. The use of EKPs that supports
exploratory search is showen in Aemoo6 to enrich query results with relevant
knowledge coming from di erent data sources in the Web [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In order to understand complex datasets, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduces Statistical Knowledge
Pattern (SKP) to summarize key information about an ontology class considering
synonymity between two properties of a given class. An SKP is stored as an OWL
ontology and contains information about axioms derived or not expressed in a
reference ontology but can be promoted applying some statistical measures.
      </p>
      <p>
        As shown, the actual pro ling tools provide schema based statistics like the
class/property usage, incoming/outgoing links etc, but none of the existing works
is focused in providing summarization of the content of the dataset and also apply
techniques to pro le its quality. Author in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose an approach to pro le the
Web of Data, but in di erence from this, the proposed approach pro les Linked
Data in terms of its quality and summarize datasets in terms of its topic.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Research Plan</title>
      <p>The contribution of this PhD in the area of Linked Open Data Pro ling covers (i)
generating new statistics that are not covered by the state of the art techniques
(ii) new algorithms to overcome the challenges to perform pro ling in the LOD,
and (iii) the development of a methodology on how to perform pro ling tasks.
In the following we will give an overview of the methodology which we want to
follow in order to accomplish the contribution we want to make in the eld.</p>
      <p>New statistics for Linked Data Pro ling</p>
      <p>
        While much e ort is done as described in the state of the art, the generated
statistics are limited in some basic statistics such as the number of triples,
number of classes/ properties that are used in a dataset, the datatypes or sameAs
links used, etc. Datasets hold much more interesting information which might be
hidden, but at the same time, this information could be useful for the consumer
of the dataset. As data pro ling is referred to the activity of providing useful
descriptive information, new techniques on how to extract the hidden information
should be developed. Our intent is to develop automatic approaches to generate
new statistics and knowledge patterns to provide dataset summary and inspect
its quality. Di erent data mining techniques, such as association rule mining, can
be used to discover and extract patterns and dependencies in the dataset. These
6 http://wit.istc.cnr.it/aemoo/
patterns might provide useful information especially to detect errors and
inconsistencies in spatial data (consistency quality dimension). Implementation of
di erent approaches for outlier detection, like distance/deviation/depth-based,
evolutionary techniques, etc. could provide insight about abnormalities in the
underlying data. Other techniques such as clustering, classi cation, aggregation,
dimensionality reduction or spatial data summarization might help to provide
concise and accurate dataset summarization and inspect quality dimensions
mentioned in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. We intend to further investigate the topical classi cation of LInked
Open Data. The datasets published as LOD cover a wide range of topics but
they lack metadata that describe the topical category, so the users have di
culties deciding if the dataset is relevant for their interest or not. For each of the
dataset published as LOD a label for the topical category was manually assigned
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The datasets have only one label for the topical category while often two
or more topics are needed to describe a dataset. The actual topical classi cation
of datasets in the LOD is limited to eight categories, while a more ne-grained
topical classi cation might provide more useful information.
      </p>
      <p>Overcoming Pro ling Challenges</p>
      <p>
        As another contribution in this research we want to tackel the pro ling
challenges described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Traditional pro ling task can not be applied to Linked
Data due to their heterogeneity. Heterogeneity can appear in di erent forms
such as di erent formats or query languages called syntactic heterogenity. Linked
Open Data can be represented in di erent formats, stored in di erent storage
architectures also the data encoding schemes may vary. This is referred to as
schematic heterogeneity. Datasets published as LOD might use di erent
vocabularies, to describe synonymous terms. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] referred semantic heterogeneity as
the discovery of semantic overlap of the data. Traditional data pro ling tools can
not be used to pro le Linked Open Data as they suppose data to be homogeneous
stored in a single repository, while Linked Open Data are neither homogeneous
nor stored in a single repository. Also as the number of the datasets published is
increasing the need to adapt and optimise pro ling techniques to support huge
amount of data is also high. A good approach when dealing with large datasets,
is to improve the pro ling performance running the calculation of statistics and
patterns extraction in parallel. We also plan to adapt some data mining
techniques to deal with high dimensionality data, such as Linked Open Data.
      </p>
      <p>Methodology to Pro le Linked Open Data</p>
      <p>As another contribution of this research we intend to develop a methodology
on how to perform pro ling tasks. This methodology would classify pro ling
tasks depending on the purpose and also provide guidelines to appropriate select
the tasks needed by the user.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Preliminary Results</title>
      <p>
        This PhD work is now at the second year. As a rst step we measured the value
of Linked Open Data, pro ling the data published as Open Data from the Italian
Public Administrations. In this work we pro led the adoption of Linked Open
Data best practices and local laws by the Italian Public Administration
calculating a compliance index considering three quality dimensions for the published
data; completeness, accuraccy and timeliness [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        As mentioned in the Sec. 3, the main contribution of this research is to
provide new techniques for dataset summarization and new statistics about the
data. ABSTAT7 is a framework which can be used to summarise linked datasets
and at the same time to provide statistics about them. The summary consists
of Abstract Knowledge Patterns (AKPs) of the form &lt;subjectType, predicate,
objectType&gt; which represent the occurrence of triples &lt;sub,pred,obj&gt; in the
data, such that subjectType is a minimal type of sub and objectType is a
minimal type of obj. The ABSTAT summaries can help users comparing in which
of two datasets a concept is described with richer and diverse properties, and
also help detecting errors in the data such as missing or datatype diversity,
etc [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. ABSTAT can also be used to x the domain and range information
for properties. Either the domain or the range is unspeci ed for 585 properties
in DBpedia Ontology and AKPs can help us in determining at least one
domain and one range for the unspeci ed properties. For example, for the property
http:==dbpedia.org=ontology=governmentType in DBpedia we do not have
information about the domain. With our approach we can derive 7 di erent
AKPs meaning that we can derive 7 domains for this property.
      </p>
      <p>
        We further investigated one of the challenges still present in the Linked Open
Data datasets, topic classi cation. We built the rst automatic approach to
classify LOD datasets into the topical categories that are used by the LOD cloud
diagram. For the classi cation we considered eight feature sets; vocabulary, classes
and properties usage, local class/property names, text from rdfs:lable,
toplevel domain and in and out degree. In Table 1, are shown the results training
three classi ers k-NN, Naive Bayes and Decision Tree on three balancing
approaches, no sampling, down and up sampling and two normalization techniques
considering the binary occurrence and the relative term occurence for each term
or vocabulary. Our approach achieves an accuracy of 81,62% [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>A deep literature study for the tools which are used to pro le LOD has
been taken. We analyzed existing tools in terms of the goal they are used for,
7 http://abstat.disco.unimib.it/
techniques, input, output, approach, automatization information, license etc,
with the aim to have a complete view of the existing approaches and techniques
for pro ling which helps us in determining new statistics or new techniques. This
deepen study will also help us for the third contribution classifying pro ling tasks
and creating a general methodology for each task depending on the use case.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Lessons Learned, Open Issues and Future Work</title>
      <p>The main contribution of this PhD work is to address the challenges mentioned
in Sec. 3 to built a framework for pro ling the Linked Open Data in order to
give insights of the data, despite their heterogeneous nature. To evaluate the
validity of the proposed approach or the results achieved is very di cult as in
the led of LOD pro ling there is no Gold Standard, thus is very di cult to
compare with others. For this issue, we want to further explore how these new
statistics or summarization allow to improve the performance of the actual
proling techniques and tools, e.g. how pro ling tasks can improve full-text search
etc. To evaluate the validity of the proposed pro ling techniques to summarise
datasets, as pattern discovery is not trivial, humans will evaluate the validity
of the summarization in terms of relatedness and informativeness. We intend
to provide to users a list of statistics and ask them which in their opinion is
more important to support pro ling of Linked Open Data. The evaluation of the
performance of pro ling tasks is very di cult, which still remains an open issue
on which I am currently working.</p>
      <p>The ABSTAT framework provides some contributions in summarising Linked
Open Data, and detecting quality issues. We are working to enrich this
framework with other statistics and to apply it to unstructured data such as microdata.</p>
      <p>Regarding the topical classi cation of LOD datasets, we will consider the
problem for multi-label classi cation. As the datasets in the LOD cloud are
unbalanced a two stage approach might help, while a classi ers chain which makes a
prediction for one class after the other could address the multi-lable problem. Up
till now in our experiments we have not exploited RDF links beyond datasets
in and out degree, so link-based classi cation techniques could be applied to
further investigate the content of a dataset.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research has been supported in part by FP7/2013-2015 COMSODE (under
contract number FP7-ICT-611358). I would like to thank my supervisor
Assoc. Prof Andrea Maurino, my supervisor during my visiting period Prof. Dr
Christian Bizer, Asst. Prof Matteo Palmonari, Dr. Anisa Rula for their priceless
suggestions and also the anonymous reviewers for their helpful comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Assaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Senart</surname>
          </string-name>
          . Roomba:
          <article-title>An extensible framework to validate and build dataset pro les</article-title>
          .
          <source>In The 2nd International Workshop on Dataset PROFIling</source>
          and
          <article-title>fEderated Search for Linked Data (PROFILES '15) co-located with ESWC 2015, Portoroz</article-title>
          , Slovenia, May 31 - June 1,
          <year>2015</year>
          ., pages
          <volume>32</volume>
          {
          <fpage>46</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Demter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          .
          <article-title>Lodstats - an extensible framework for high-performance dataset analytics</article-title>
          .
          <source>In 18th International Conference, EKAW</source>
          <year>2012</year>
          ,
          <string-name>
            <given-names>Galway</given-names>
            <surname>City</surname>
          </string-name>
          , Ireland, October 8-
          <issue>12</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Gentile</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Augenstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciravegna</surname>
          </string-name>
          .
          <article-title>Statistical knowledge patterns for characterising linked data</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Ontology and Semantic Web Patterns co-located with ISWC</source>
          <year>2013</year>
          , Sydney, Australia, October
          <volume>21</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bo</surname>
          </string-name>
          hm,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abedjan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fenz</surname>
          </string-name>
          , T. Grutze,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hefenbrock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pohl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Sonnabend</surname>
          </string-name>
          .
          <article-title>Pro ling linked open data with prolod</article-title>
          .
          <source>In Workshops Proceedings of the 26th ICDE 2010, March 1-6</source>
          ,
          <year>2010</year>
          , Long Beach, California, USA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          .
          <article-title>Towards a pattern science for the semantic web</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          -2):
          <volume>61</volume>
          {
          <fpage>68</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          .
          <article-title>Pro ling the web of data</article-title>
          .
          <source>Proceedings of the 8th Ph. D. retreat of the HPI research school on service-oriented systems engineering, page 101</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Khatchadourian</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Consens</surname>
          </string-name>
          . Explod:
          <article-title>Summary-based exploration of interlinking and RDF usage in the linked open data cloud</article-title>
          .
          <source>In ESWC</source>
          <year>2010</year>
          , Heraklion, Crete, Greece, May 30 - June 3,
          <year>2010</year>
          , pages
          <fpage>272</fpage>
          {
          <fpage>287</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Langegger</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Wo</surname>
          </string-name>
          <article-title> . Rdfstats - an extensible RDF statistics generator and library</article-title>
          .
          <source>In Database and Expert Systems Applications</source>
          , DEXA,
          <string-name>
            <surname>International</surname>
            <given-names>Workshops</given-names>
          </string-name>
          , Linz, Austria,
          <source>August 31-September 4</source>
          ,
          <year>2009</year>
          , pages
          <fpage>79</fpage>
          {
          <fpage>83</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Meusel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Towards automatic topical classi cation of lod datasets</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web, LDOW Workshop</source>
          ,
          <year>2015</year>
          , Florence,Italy, May
          <volume>18</volume>
          -22,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <article-title>Data pro ling revisited</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>42</volume>
          (
          <issue>4</issue>
          ):
          <volume>40</volume>
          {
          <fpage>49</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciancarini</surname>
          </string-name>
          .
          <article-title>Encyclopedic knowledge patterns from wikipedia links</article-title>
          .
          <source>In The Semantic Web - ISWC 2011 Bonn, Germany, October 23-27</source>
          ,
          <year>2011</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>520</volume>
          {
          <fpage>536</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Musetti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciancarini</surname>
          </string-name>
          .
          <article-title>Aemoo: exploring knowledge on the web</article-title>
          .
          <source>In Web Science</source>
          <year>2013</year>
          (
          <article-title>co-located with ECRC)</article-title>
          ,
          <source>WebSci '13</source>
          , Paris, France, May 2-
          <issue>4</issue>
          ,
          <year>2013</year>
          , pages
          <fpage>272</fpage>
          {
          <fpage>275</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Plamonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Porrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maurino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Ferme</surname>
          </string-name>
          . Abstat:
          <article-title>Linked data summaries with abstraction and statistics</article-title>
          .
          <source>In European Semantic Web Conferenze</source>
          <year>2015</year>
          (
          <article-title>ESWC2015) Portoroz</article-title>
          , Slovenia, 31th May - 4th
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Adamou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A. C.</given-names>
            <surname>Schopman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          .
          <article-title>Extracting core knowledge from linked data</article-title>
          .
          <source>In Proceedings of the COLD</source>
          <year>2011</year>
          , Bonn, Germany, October
          <volume>23</volume>
          ,
          <year>2011</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaveri</surname>
          </string-name>
          .
          <article-title>Methodology for assessment of linked data quality</article-title>
          .
          <source>In Proceedings of the 1st Workshop on Linked Data Quality co-located with 10th International Conference on Semantic Systems, LDQ@SEMANTiCS</source>
          <year>2014</year>
          , Leipzig, Germany,
          <year>September 2nd</year>
          ,
          <year>2014</year>
          .,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmachtenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <article-title>Adoption of the linked data best practices in di erent topical domains</article-title>
          .
          <source>In The Semantic Web - ISWC 2014, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          . Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>245</volume>
          {
          <fpage>260</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Viscusi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maurino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Batini</surname>
          </string-name>
          .
          <article-title>Compliance with open government data policies: An empirical assessment of italian local public administrations</article-title>
          .
          <source>Information Polity</source>
          ,
          <volume>19</volume>
          (
          <issue>3-4</issue>
          ):
          <volume>263</volume>
          {
          <fpage>275</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>