<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting core knowledge from Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valentina Presutti</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lora Aroyo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Adamou</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Balthasar Schopman</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aldo Gangemi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guus Schreiber</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alma Mater Studiorum Universita di Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ISTC, National Research Council</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent research has shown the Linked Data cloud to be a potentially ideal basis for improving user experience when interacting with Web content across di erent applications and domains. Using the explicit knowledge of datasets, however, is neither su cient nor straightforward. Dataset knowledge is often not uniformly organized, thus it is generally unknown how to query for it. To deal with these issues, we propose a dataset analysis approach based on knowledge patterns, and show how the recognition of patterns can support querying datasets even if their vocabularies are previously unknown. Finally, we discuss results from experimenting on three multimedia-related datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The constant expansion trend of Linked Data (LD) is broadening the
potential exploitation range of their datasets for improving search through related
data. Current research [
        <xref ref-type="bibr" rid="ref1 ref6">6, 1</xref>
        ] and established Web search rms like Google and
Powerset show the bene ts of using explicit semantics and LD to re ne search
results. However, using e ciently the explicit knowledge of each dataset can
be awkward and ine ective. Datasets typically cover diverse domains, do not
follow a uni ed way of organizing the knowledge, di er in size, granularity and
descriptiveness. To avoid burdensome, dataset-speci c querying schemes, the
following are required: (1) measures and indicators that provide a landscape view
on a dataset; (2) a way to query a dataset even with no prior knowledge of its
vocabulary.
      </p>
      <p>We propose an approach to examine LD with these problems in mind. It
employs a strategy for inspecting datasets and identifying emerging knowlege
patterns (KPs). A key step of this method is the construction of a formal
logical architecture, or dataset knowledge architecture, which summarizes the key
features and gures of one or more datasets, thus addressing requirement (1).
This, in turn, relies on the notions of KPs and type-property paths. We identify
the central properties and types, i.e. those able to capture most of the knowledge
in a dataset, and extract KPs based on the central types. In other words, we
extract the dataset vocabulary and analyse the way the data are used in terms
of patterns. We also associate general-purpose measures, such as betweenness
and centrality, to the knowledge architecture components of a dataset for
performing empirical analysis. These notions and measures will be de ned
throughout the paper. Using KPs and paths, we can provide prototypical ready-to-use
queries for core and concealed knowledge to emerge, thus addressing
requirement (2). Although the method applies to datasets whose logical structure is not
known a priori, it is meant to analyse LD for serendipitous knowledge. Unlike
a mere reverse-engineering exercise, our method discovers new knowledge about
datasets, such as their central types and properties and emerging patterns.</p>
      <p>The method was partly applied manually and our claims on it are observed
empirically, yet it can be generalised and fully automated, as the construction
of a dataset architecture and computation of its measures are all derived by
directly querying the data using metalevel constructs from RDF and OWL.</p>
      <p>The paper is organized as follows. In Section 2 we discuss the general
approach for data retrieval and analysis, compounded with our leading
hypotheses and basic de nitions of recurring terms in our methodology. In Section 3
we describe the dataset knowledge architecture, the evaluation measures and an
overview of the datasets speci cally considered in this analysis. In Section 4 we
present and discuss the results of our empirical study, including an example of
query, knowledge pattern and dataset gures. After an overview on related work
in Section 5, we present our conclusions and future work in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>Linked Data typically combine types and properties de ned either in in-house
ontologies, or in widespread existing ones e.g. FOAF4, DC5 or GeoNames6. It
may occur, though, that in-house ontologies have not been formalized, or are not
disclosed. Even when the ontologies are available, they do not tell which relevant
part of them is actually used in a dataset, and what links are drawn (through
the data) between entities across various ontologies. Knowing these details about
a dataset is a pre-condition for evaluating its adequateness for being reused
in a certain context, for inspecting its content, for integrating it with other
(possibly legacy) knowledge; in other words, for using it. We hypothesize that
employing KPs for analysing and possibly authoring LD addresses this problem.
In this paper, we focus on querying a dataset when its vocabulary is previously
unknown, by proceeding as follows:
{ we de ne a method and an ontology for analyzing a dataset and producing
a synthesis of it i.e. a modular abstraction named dataset knowledge
architecture, that highlights how a dataset knowledge is organized, and what its
core knowledge components (e.g. central types and associated KPs) are;
{ we show how this general method and ontology can be exploited for
identifying principal KPs extracted from a dataset for producing prototypical
queries, through which we are able to retrieve a dataset core knowledge.</p>
      <p>As another premise to our approach to be described in Section 2, we de ne
a few terms that are used throughout the remainder of this paper.</p>
      <p>A knowledge pattern (KP) for a type in an RDF graph includes: (i)
the properties by which instances of this type relate to other individuals; (ii)</p>
      <sec id="sec-2-1">
        <title>4 Friend-Of-A-Friend, http://xmlns.com/foaf/0.1/ 5 Dublin Core, http://purl.org/dc/elements/1.1/ 6 GeoNames, http://www.geonames.org/ontology/ontology_v2.2.1.rdf</title>
        <p>
          the types of such individuals for each property. A KP is an invariance across
observed data or objects that allows a formal or cognitive interpretation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. A
KP embeds the key relations that describe a relevant piece of knowledge in a
certain domain of interest, similar to linguistic frames and cognitive schemata.
        </p>
        <p>A path is an ordered type-property sequence that can be traversed in an
RDF graph. Note the use of types in lieu of their instances, which instead denote
multiple occurrences of the same path. The length of a path is the number of
properties involved (possibly even with repetitions).</p>
        <p>Our approach uses a strategy aimed at modelling, inspecting, and
summarizing Linked Data sets, thereby drawing what we call their knowledge architecture,
which relies on the notions of paths and KPs de ned above. The application of
this approach is sketched in Figure 1, and can be synthesized as follows:
1. Gather property usage statistics for a chosen dataset and store them as</p>
        <p>ABox of the knowledge architecture ontology ;
2. Query the dataset for extracting paths. Store all paths with length up to 4,
with their usage statistics, in the knowledge architecture dataset7;
3. Identify central types and central properties based on their frequencies in a
key position in paths, i.e. betweenness, and number of instantiations;
4. Extract emerging KPs based on the dataset's central types and properties;
5. Select clustering factors among central properties, i.e. those properties
occupying the same position in a set of paths, and construct path clusters;
The following section illustrates how the components of a dataset knowledge
architecture are constructed for performing the steps of our method.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Method and datasets</title>
      <p>Our method described in Section 2 focuses on two main empirical results:
(1) building a knowledge architecture of a dataset able to summarize its key
7 The choice of maximum path length 4 was dictated by computational boundaries
and the extreme redundancy empirically observed in longer paths (cf. Section 3).
features; and (2) extracting the central KPs of a dataset. In this Section we
focus on what a dataset knowledge architecture is and how to construct it.
3.1</p>
      <sec id="sec-3-1">
        <title>The knowledge architecture</title>
        <p>A dataset knowledge architecture is an ontology that expresses a dataset
vocabulary in a modular way. Its components are selected based on measures
that indicate their importance in capturing the core knowledge in a dataset. In
other words, it is an abstraction over an RDF graph, which o ers a modular
view as opposed to the usual \class-property" view provided by vocabularies
and ontologies, since it is open to queries that are agnostic to speci c types and
properties used in the dataset. To populate it, we inspect a dataset for (i) the
types and properties it uses, (ii) its typical paths i.e. type-property sequences,
and (iii) quantitative statistics about their usage. The knowledge architecture
schema is available online8. This formalism allows us to empirically analyse a
dataset architecture through SPARQL queries. Figure 2 depicts the main
entities de ned by the dataset knowledge architecture ontology. With the help of
this ontology, we aim at deriving, in a bottom-up way, the ontology actually
employed for representing the data in a LD dataset and produce additional useful
knowledge about a dataset e.g., its central types. We identify (i) the properties
used in a dataset triples, and model them through the class Property; (ii) the
types (classes or literals) of the subject and object resources of such triples, and
model them through the class Type; and (iii) the typical paths that connect
triples in a dataset, and model them through the class Path.</p>
        <p>A Path is an ordered set fT1; p1; :::; pl; Tl+1g, where Ti is a Type, pi is a
property, and l is the path length. Each ordered subset fTi; pi; Ti+1g of a Path
of length l is called PathElement, and is associated with its position i = 1; ::l,
8 http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl,
which imports the paths module as well.
in the path. For example, one DBTune Jamendo9 instance of Path is:</p>
        <p>fmo:MusicArtist foaf:made mo:Record mo:available as mo:Torrentg
has length= 2, and is composed of the following PathElements:</p>
        <p>fmo:MusicArtist, foaf:made, mo:Recordg (position 1)
fmo:Record, mo:available as, mo:Torrentg (position 2)
where mo: and foaf: are pre xes for the Music Ontology (i.e., http://purl.
org/ontology/mo/) and FOAF namespaces (i.e., http://xmlns.com/foaf/0.
1/). We then de ne four properties describing PathElement: hasProperty, hasPosition,
hasPathElementObjectType and hasPathElementSubjectType. Each Path is
associated to an instance of PathOccurrencesInDataset, which indicates the
observed number of occurrences of that path in a Dataset.</p>
        <p>We also de ne the concepts CentralType and CentralProperty, which
identify the entities capturing most of the knowledge in a dataset. The type KnowledgePattern
is used for storing the emerging knowledge patterns.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Measures</title>
        <p>Table 1 illustrates the measures that we associate to the knowledge
architecture of a dataset to empirically analyse and interpret them to support our
conclusive statements in Section 4. Measures from #Triples to #PathOcc hold
for a dataset as a whole, while the others are related to a type or property and
can be computed either for one dataset or across multiple datasets. Most of
the measures have been computed by combining SPARQL queries and software
scripting10.</p>
        <p>The measures for identifying the central types and properties of a dataset
are also shown. Type betweenness and Property betweenness are simpli cations
of centrality measures used in graph theory. Although we do not model the
knowledge architecture of a dataset as a graph, its structure approximates it
through the notion of directed paths and based on the empirical observation
that all paths longer than 3 are composed of the observed shorter paths.</p>
        <p>We can then compute betweenness of types by counting the participation of
types as subjects in paths of length 2 at position 2, and betweenness of properties
by counting the participation of properties in paths of length 3 in position 2.</p>
        <p>The combination of betweenness values and number of instances indicates
the value of centrality of a type or property for a dataset. Central types and
properties are able to capture most of the knowledge expressed in a dataset.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Datasets</title>
        <p>We initially examined several datasets including general-purpose, cross-domain
and multimedia domain-speci c datasets. As selection criteria for ensuring us a
statistically relevant sample for our experiments, we selected datasets:
1. addressing a speci c knowledge domain;</p>
        <sec id="sec-3-3-1">
          <title>9 DBTune Jamendo, http://dbtune.org/jamendo/ 10 Queries available at http://stlab.istc.cnr.it/stlab/LOD-Analysis-Statistics</title>
          <p>2. addressing the same speci c domain, or a conceptually related one - hence
leading to possible cross-relations among them;
3. with di erent sizes
4. that use third-party as well as in-house ontologies.</p>
          <p>We eventually chose three datasets related to the multimedia domain.</p>
          <p>Jamendo11 is an online distributor of independent musical artists. The
represented data focus on record authorship, release and distribution over internet
channels. Being part of the DBTune service, its data representation relies on
the Music Ontology12 and parts of FOAF, Dublin Core, Event13, Timeline14
and Tags15 ontologies. The indie nature of its hosted artists, who are scarcely
represented in other datasets, makes Jamendo a primary source for its content.</p>
          <p>John Peel Sessions (JPeel)16 includes data related to live musical
performances for the John Peel Show aired on BBC Radio One, and the resulting record
releases. It is also a DBTune dataset, but being more event-focused, it mostly
reuses a di erent portion of the Music Ontology vocabulary than Jamendo does.
11 Jamendo DBTune home, http://dbtune.org/jamendo
12 The Music ontology, http://purl.org/ontology/mo/
13 Event Ontology, http://purl.org/NET/c4dm/event.owl#
14 Timeline ontology, http://purl.org/NET/c4dm/timeline.owl#
15 Tag vocabulary, http://www.holygoat.co.uk/owl/redwood/0.1/tags/
16 John Peel DBTune home, http://dbtune.org/bbc/peel</p>
          <p>LinkedMDB (LMDB)17 is a tripli ed database of the lm industry
domain. It encompasses the entities involved with lm production and release,
plus additional metadata concerning ratings and events such as lm festivals.
The LinkedMDB ontology is unpublished18 and almost entirely in-house, with a
few exceptions such as FOAF and Dublin Core terms.</p>
          <p>These datasets address domain-speci c knowledge, thus satisfying our
criterion 1. They also address criterion 2 as Jamendo and JPeel share the music
production domain (albeit with di erent data and perspectives), while LMDB
addresses the movie production domain, which is related to music e.g. through
soundtrack authorship. Table 2 shows how they di er in dimensions, thus
satisfying criterion 3. Finally, as for criterion 4 Jamendo and JPeel heavily reuse
external ontologies, while LMDB mainly uses a proprietary one. Additionally,
their vocabulary usage has little to no overlap.</p>
          <p>
            We excluded general-purpose and cross-domain datasets, e.g. DBPedia and
GeoNames, based on the already existing research experience on applying
patterns to them. Examples are [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], which addresses the application of patterns
to general-purpose datasets such as WordNet19, and [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] which applies patterns
to the Thesaurus of Geographic Names20 addressing geographical as well as
artrelated knowledge. In other words, we opted to experiment on a di erent type
of resource in order to lay the basis - in our future work - for comparing our
method and results with existing approaches.
          </p>
          <p>For each dataset we computed the measures from Table 1. They provided us
with a means to objectively describe datasets according to our selection criteria.
17 LinkedMDB home, http://www.linkedmdb.org/
18 The base namespace http://data.linkedmdb.org/resource/movie/ would not
resolve as of August 2011, thus forcing us to assume an implicit schema for the dataset.
19 WordNet, http://wordnet.princeton.edu/
20 Geographic Names, http://www.getty.edu/research/tools/vocabularies/tgn/
By the gures in Table 3, the three datasets are not very sparse, due to their
high property usage values. This favours the identi cation of central types and
properties. Additionally, datasets di er by several orders of magnitude in number
of paths and their occurrences, thus con rming their variety in size. The full list
of types and properties used in the three datasets is available online21.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and examples</title>
      <p>Based on the described entities and the associated measures (cf. Sections 3.2
and 3.3), we can compute each dataset's central types and properties and extract
their most representative Paths, to the aim of building prototypical queries.
These can be constructed by identifying a set of path elements that allow us to
retrieve the relevant knowledge about central types.</p>
      <p>We extract all emerging KPs through querying all distinct paths of length
1 and group them by their subject types. The KP of a type C (i.e., KP (C))
includes the type C, all the properties used for describing its instances, and
the object types connected to them. The list of KPs extracted from the three
datasets is published online22.</p>
      <p>A prototypical query has to provide an e ective summarization of a dataset
knowledge about a central type. Such e ective summarization includes more
than the properties used for describing such type's instances, that is, we need a
way to identify an interesting neighborhood of the types' instances. To this aim,
we exploit the notion of central properties and their clusters of paths of length
3 (where the property is at position 2). Examples of clusters, the queries used
for retrieving them, and the full list of extracted KPs are available online23.</p>
      <p>We identify the types and properties through which most of the dataset
knowledge transits (i.e., central types and properties, respectively), and show
that they have a primary role for selecting KPs and paths for building
prototypical queries. We report here the statistics computed for these types and
properties in Jamendo24. Figures 4(a) and 4(b) show the metrics used for
identifying central types; that is, as our task is to build prototypical queries, we look
at types with both high betweenness and many individuals, thereby addressing
centrality and recall. In the case of Jamendo, we select mo:Playlist, Track,
and Signal. Figures 4(c) and 4(d) show the metric values for identifying central
types, i.e. the number of triples that instantiate a property and the
betweenness of properties. By adopting the same policy as for central types, we select
mo:available as, mo:published as, and foaf:made.</p>
      <p>A prototypical query, which provides a meaningful summarization of a dataset
knowledge about a central type, is built by combining KPs and central
properties' clusters of paths of length 3, by implementing the following algorithm:
21 http://stlab.istc.cnr.it/stlab/LOD-Analysis-TypesAndProperties
22 List of KPs extracted from Jamendo (10 KPs), JPeel (8 KPs), and LMDB (51 KPs),
http://stlab.istc.cnr.it/stlab/LOD-Analysis-EmergingKP
23 Example clusters at http://stlab.istc.cnr.it/stlab/LOD-Analysis-Clusters
24 Webpage http://stlab.istc.cnr.it/stlab/LOD-Analysis-Graphs contains the
complete charts; http://stlab.istc.cnr.it/stlab/LOD-Analysis-Statistics
shows the same data as tables, along with the queries for computing them.</p>
      <p>Let us exemplify the generation of a prototypical query, following the
algorithm above, for C=mo:Track, which is a central type in Jamendo. Based on
KP (mo : T rack), P E will be initialized by 4 paths, characterized by the
properties fdc:title, mo:available as, mo:license, mo:track numberg25. Among
them, only mo:available as is a central property in Jamendo, hence we pick up
its cluster of paths (of length 3) in order to enrich the set P E that will be used
for building the prototypical query. From such cluster, we collect 3 additional
paths as they are connected to mo:Track. Two of them identify incoming links
to mo:Track (i.e., mo:track and mo:pulished as), and one identi es one link in
the neighborhood of mo:Track (i.e., dc:format), which reaches a 2-degree
distance from it in its knowledge graph. This additional link in the neighborhood
shows how this approach allows to build queries that capture more
meaningful knowledge than a simple SPARQL DESCRIBE. An interesting investigation
that we have planned in our future work is to study the cognitive soundness of
these queries with respect to user interaction tasks.</p>
      <p>The result of the described procedure is the following query. Figure 3 shows
the resulting graph if such a query would be applied to a speci c instance of
mo:Track, http://dbtune.org/jamendo/track/7593 in this speci c case.
construct {
?t a mo:Track . ?t dc:title ?t1 . ?t mo:available_as ?t2 . ?t2 dc:format ?f .
?t mo:license ?t3 . ?t mo:track_number ?t4 .
?s a mo:Signal . ?s mo:published_as ?t .
?r a mo:Record . ?r mo:track ?t .
}
from jamendo_dataset
where {
?t a mo:Track .
?t dc:title ?t1 .
{{OPTIONAL { ?t mo:available_as ?t2 .
?t2 dc:format ?f }} UNION
{OPTIONAL { ?t mo:license ?t3 }} UNION
{OPTIONAL { ?t mo:track_number ?t4 }} UNION
{OPTIONAL { ?s a mo:Signal .
?s mo:published_as ?t }} UNION
{OPTIONAL { ?r a mo:Record .
?r mo:track ?t }}}
}</p>
      <p>These steps allow us to build a summary of a dataset, which supports the
retrieval of the most representative knowledge for its domain as (i) paths of length
3 are enough for capturing all knowledge structures, (ii) central types catch most
representative knowledge of the dataset, (iii) KPs convey the description of types,
and (iv) central properties link the most representative KPs of a dataset. This
analytic approach showed that we can summarize a dataset through a relatively
small knowledge architecture, thus limiting the impact of empirical analysis on
computational complexity. In our three experiments of Section 3.3, we built a
25 For the sake of readability we omit the object types of the path elements.
knowledge architecture dataset of 130; 373 triples representing three datasets
whose combined size sums up to over 7 106 triples. The architecture is available
online26.</p>
      <p>mo:MusicArtist
mo:Record
mo:Lyrics
tags2:Tag
mo:Torrent
mo:ED2K
time:Interval
mo:Signal
mo:Track
mo:Playlist
foaf:maker
foaf:made
mo:track
mo:time
mo:published_as
tags2:taggedWithTag
mo:available_as
mo:Lyrics
tags2:Tag
mo:Torrent
mo:ED2K
time:Interval
mo:Signal
mo:Playlist
mo:MusicArtist
mo:Track
mo:Record
tags2:taggedWithTag
mo:time
mo:track
mo:published_as
mo:available_as
foaf:maker
foaf:made
0 20000 40000 60000 80000 100000 120000
0
2
4
6
8
10
12
(a) Number of individuals
(b) Type betweenness
0 20000 40000 60000 80000 100000 120000 140000
0
2
4
6
8
10
12
(c) Number of triples</p>
      <p>(d) Property betweenness</p>
      <p>
        There has been valuable research work on understanding the LD cloud
recently, which highlights di erent approaches, perspectives, and speci c goals.
Some work focus on providing macroscopic analysis on LD as a whole such as
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which analyzes the typical usage of the owl:sameAs standard property; [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
which identi es and generates relations between ontologies used in LD; [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which
discusses how LD would bene t from vocabulary alignments with foundational
ontologies; [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which identi es the most important vocabularies and classes
over large-scale distributed datasets.
      </p>
      <p>
        Works such as [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] focus mainly on query optimization.
26 http://ontologydesignpatterns.org/ont/lod-analysis-properties-data.owl
      </p>
      <p>
        Other research e orts exploit LD for supporting user interaction with
contentintensive applications, such as [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] but do not try to provide a
summarization of the used RDF datasets.
      </p>
      <p>
        Finally, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] focus on providing a compact representation of RDF
datasets. In both cases, the main di erence with our approach is the lack of
design perspective and conceptual analysis. In particular, the taxonomy of classes
and properties is not considered, they treat all classes and properties in the same
way, while we use them for eliminating redundancies. Furthermore, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] do not
consider the notion of central types and properties (or analogous), while [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] does
not exploit semantic web technologies for storing the RDF datasets summaries,
which is instead a characteristic of our approach. Finally, our approach focuses on
identifying prototypical queries that convey meaningful conceptual organization
around a certain type based on the notions of centrality.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>We have shown how to summarize Linked Data sets by treating them as
sets of connected knowledge patterns, in order to identify their core knowledge
components. We have experimented on three datasets from the LD cloud, and
showed how to build prototypical queries for them even when the ontologies
that model them are unknown. We have planned, in our future work, to
compare ontologies explicitly published and used for a dataset with the knowledge
architecture that arises from our analysis.</p>
      <p>Our ongoing and future work focuses on extending our strategy, in order
to (i) demonstrate how by aligning emerging KPs of a dataset to general KPs
improves interoperability across datasets, and detects incompatibility issues (ii)
compare analysis data about di erent datasets, and (iii) improve user interaction
in searches for relevant content. We have planned to improve the method by
performing additional analysis on an extensive coverage of the multimedia domain,
and subsequently evaluate the cross-domain portability of our approach.
Acknowledgements This work has been part-funded by the European Commission under
grant agreement FP7-ICT-2007-3/ No. 231527 (IKS - Interactive Knowledge Stack)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stash</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorgels</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutledge</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>CHIP demonstrator: Semantics-driven recommendations and museum tour generation</article-title>
          .
          <source>In: Semantic Web Challenge. CEUR Workshop Proceedings</source>
          , vol.
          <volume>295</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Basse</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gandon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirbel</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Dfs-based frequent graph pattern extraction to characterize the content of rdf triple stores</article-title>
          .
          <source>In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line</source>
          , Raleigh, NC: US (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kershenbaum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schonberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The summary abox: Cutting ontologies down to size</article-title>
          . In: Cruz,
          <string-name>
            <given-names>I.F.</given-names>
            ,
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Preist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Schwabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Uschold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <source>International Semantic Web Conference. Lecture Notes in Computer Science</source>
          , vol.
          <volume>4273</volume>
          , pp.
          <volume>343</volume>
          {
          <fpage>356</fpage>
          . Springer (
          <year>2006</year>
          ), http://dblp.uni-trier.de/db/conf/ semweb/iswc2006.html#FokoueKMSS06
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Towards a pattern science for the semantic web</article-title>
          .
          <source>Semantic Web</source>
          <volume>1</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>61</volume>
          {
          <fpage>68</fpage>
          (
          <year>2010</year>
          ), http://dblp.uni-trier.de/db/journals/semweb/ semweb1.html#GangemiP10
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Halpin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCusker</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>When owl:sameAs isn't the same: An analysis of identity in Linked Data</article-title>
          .
          <source>In: 9th International Semantic Web Conference (ISWC2010) (November</source>
          <year>2010</year>
          ), http: //data.semanticweb.org/conference/iswc/2010/paper/261
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Heim</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lohmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stegemann</surname>
          </string-name>
          , T.:
          <article-title>RelFinder: Revealing relationships in RDF knowledge bases</article-title>
          .
          <source>In: Proceedings of the 3rd International Conference on Semantic and Media Technologies (SAMT). Lecture Notes in Computer Science</source>
          , vol.
          <volume>5887</volume>
          , pp.
          <volume>182</volume>
          {
          <fpage>187</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeh</surname>
            ,
            <given-names>P.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          : A.p.:
          <article-title>Linked Data is merely more data</article-title>
          .
          <source>In: In: AAAI Spring Symposium Linked Data Meets Arti cial Intelligence</source>
          ,
          <source>AAAI</source>
          . pp.
          <volume>82</volume>
          {
          <fpage>86</fpage>
          . Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Khatchadourian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consens</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          : Explod:
          <article-title>Summary-based exploration of interlinking and rdf usage in the linked open data cloud</article-title>
          . In: Aroyo,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Antoniou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Hyvnen</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>ten Teije</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cabral</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tudorache</surname>
          </string-name>
          , T. (eds.)
          <source>ESWC (2). Lecture Notes in Computer Science</source>
          , vol.
          <volume>6089</volume>
          , pp.
          <volume>272</volume>
          {
          <fpage>287</fpage>
          . Springer (
          <year>2010</year>
          ), http://dblp.uni-trier.de/db/conf/esws/eswc2010-
          <fpage>2</fpage>
          .html# KhatchadourianC10
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Maduko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anyanwu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schliekelman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Graph summaries for subgraph frequency estimation</article-title>
          . In: Hauswirth,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Koubarakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bechhofer</surname>
          </string-name>
          , S. (eds.)
          <source>Proceedings of the 5th European Semantic Web Conference. LNCS</source>
          , Springer Verlag, Berlin, Heidelberg (
          <year>June 2008</year>
          ), http://data.semanticweb.org/ conference/eswc/2008/papers/330
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Malaise</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gazendam</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The interaction between automatic annotation and query expansion: a retrieval experiment on a large cultural heritage archive</article-title>
          . In: Bloehdorn,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Grobelnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          , D.T. (eds.)
          <source>SemSearch. CEUR Workshop Proceedings</source>
          , vol.
          <volume>334</volume>
          , pp.
          <volume>44</volume>
          {
          <fpage>58</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
          </string-name>
          , E.:
          <article-title>Capturing emerging relations between schema ontologies on the Web of Data</article-title>
          .
          <source>In: First International Workshop on Consuming Linked Data (COLD2010)</source>
          (
          <year>2010</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>665</volume>
          /NikolovEtAl_COLD2010.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Passant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raimond</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Combining social music and Semantic Web for musicrelated recommender systems</article-title>
          .
          <source>In: Social Data on the Web (SDoW2008)</source>
          (
          <year>2008</year>
          ), http://data.semanticweb.org/workshop/sdow/2008/paper/3
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge</surname>
          </string-name>
          , W., Cheng, G.,
          <string-name>
            <surname>Zhiqiang</surname>
          </string-name>
          , G.:
          <article-title>Class association structure derived from linked objects</article-title>
          .
          <source>In: Proceedings of WebSci'09: Society On-Line</source>
          , Athens, Greece. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Stankovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Open Innovation and Semantic Web : Problem solver search on Linked Data</article-title>
          .
          <source>In: 9th International Semantic Web Conference (ISWC2010) (November</source>
          <year>2010</year>
          ), http://data.semanticweb.org/conference/iswc/2010/paper/439
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vdovjak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Houben</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broekstra</surname>
          </string-name>
          , J.:
          <article-title>Index structures and algorithms for querying distributed rdf repositories</article-title>
          .
          <source>In: WWW</source>
          . pp.
          <volume>631</volume>
          {
          <issue>639</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stash</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorgels</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutledge</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreiber</surname>
          </string-name>
          , G.:
          <article-title>Recommendations based on semantically enriched museum collections</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <volume>283</volume>
          {
          <fpage>290</fpage>
          (
          <year>2008</year>
          ), http://www.sciencedirect.com/science/article/B758F-4TT7153-1/2/ f1bd28cd4d79a0ff70d74439e3f5e3fc,
          <source>semantic Web Challenge</source>
          <year>2006</year>
          /2007
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stash</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreiber</surname>
          </string-name>
          , G.:
          <article-title>Semantic relations for content-based recommendations</article-title>
          .
          <source>In: Proceedings of the fth international conference on Knowledge capture</source>
          . pp.
          <volume>209</volume>
          {
          <fpage>210</fpage>
          .
          <string-name>
            <surname>K-CAP</surname>
          </string-name>
          '
          <fpage>09</fpage>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2009</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1597735.1597786
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>