=Paper=
{{Paper
|id=Vol-1743/paper12
|storemode=property
|title=Evaluating Class Assignment Semantic Redundancy on Linked Datasets
|pdfUrl=https://ceur-ws.org/Vol-1743/paper12.pdf
|volume=Vol-1743
|authors=Leandro Mendoza,Alicia Díaz
|dblpUrl=https://dblp.org/rec/conf/simbig/MendozaD16
}}
==Evaluating Class Assignment Semantic Redundancy on Linked Datasets==
<pdf width="1500px">https://ceur-ws.org/Vol-1743/paper12.pdf</pdf>
<pre>
    Evaluating Class Assignment Semantic Redundancy on Linked Datasets


                 Leandro Mendoza                                     Alicia Dı́az
                CONICET, Argentina                          LIFIA, Facultad de Informática,
            LIFIA, Facultad de Informática,                       UNLP, Argentina
                   UNLP, Argentina


                    Abstract                            dancy is related with the concept of extensional
                                                        conciseness which has been defined in (Zaveri et
     In this work we address the concept of se-         al., 2015) as “the case when the data does not
     mantic redundancy in linked datasets con-          contain redundant objects at instance level”. In
     sidering class assignment assertions. We           this scenario, redundancy means that a resource is
     discuss how redundancy can be evaluated            specified as member of a class when it is not nec-
     as well as the relationship between redun-         essary, either because the information is explicitly
     dancy and three class hierarchy aspects:           duplicated or because it can be derived from infor-
     the number of instances a class has, num-          mation that already exists. Current works that have
     ber of class descendants and class depth.          dealt with semantic redundancy on linked datasets
     Finally, we performed an evaluation on the         implement algorithms based on graph pattern dis-
     DBpedia dataset using SPARQL queries               covering techniques. In contrast, our work pro-
     for data redundancy checks.                        poses a simplified approach based on SPARQL
                                                        queries and considering class assignment asser-
1    Introduction
                                                        tions. We discuss how redundancy can be eval-
The amount of interlinked knowledge bases built         uated and perform an evaluation over the DBpe-
under Semantic Web technologies and following           dia dataset (Lehmann et al., 2015) in order to un-
the linked data (Heath and Bizer, 2011) princi-         derstand the relationship between redundancy and
ples has increased significantly last years. These      three class hierarchy aspects: the number of in-
knowledge bases (also known as linked datasets)         stances a class has, the class depth and its number
contain information that associates Web entities        of descendants. This approach may be useful for
(called resources) with well-defined semantics          linked data users who need to measure semantic
that specifies how these entities should be in-         data redundancy in a practical way, understand its
terpreted. In most linked datasets a substantial        origin and detect when it may be useful (e.g. to
amount of data corresponds to class assignment          improve performance) or when it can affect nega-
assertions, that is, information that specifies re-     tively the knowledge base (e.g. misuse of classes
sources (or individuals) as instances of certain        when typifying resources). The following sections
classes. In this sense, resources are typified us-      are organized as follows: sections 2 and 3 give
ing classes usually defined through ontologies and      some background definitions and related work, re-
organized into class hierarchies. Several differ-       spectively. Section 4 introduces the redundancy
ent ontologies can be combined to classify re-          definition adopted and discusses some of the alter-
sources within the same dataset giving rise to huge     natives to address it on linked datasets. Section 5
and complex interlinked structures that can suffer      shows the evaluation results. Finally, some con-
from data quality problems (Hogan et al., 2010).        clusions and further work are given in section 6.
Thus, the use of practical mechanisms to handle
knowledge conciseness becomes increasingly im-          2       Background
portant to improve the overall dataset quality and
                                                        In the linked data context, datasets are knowl-
the study of redundancy on class assignments as-
                                                        edge bases described using the RDF1 data model
sertions aims to contribute in this way. From a
data quality perspective, class assignment redun-           1
                                                                https://www.w3.org/RDF/


                                                  103
and published following the linked data princi-         analysis method to identify graph patterns that can
ples2 . These datasets are collection of assertions     be used to remove redundant triples and calculate
about resources specified following the “subject        the volume of semantic redundancy. In (Joshi et
predicate object” pattern. Assertions are RDF           al., 2013) authors employ frequent itemset (fre-
triples and resources may be anything identifi-         quent pattern) mining techniques to generate a
able by an HTTP URI. Knowledge representation           set of logical rules to compress RDF datasets and
mechanisms like RDFS3 and OWL4 extend RDF               then use these rules during decompression. Both
and allow datasets to be augmented with more ex-        works mention the idea of semantic compression
pressive semantics. For example, it is possible         by removing derivable knowledge. Regarding the
to describe ontologies by specifying classes and        use of SPARQL for quality assessment, (Fürber
relationships between them (e.g. “SoccerPlayer          and Hepp, 2010) and (Kontokostas et al., 2014)
rdfs:subClassOf Atlhete”) and to specify re-            use query templates to detect some quality prob-
sources as member of those classes (e.g. “Li-           lems but semantic redundancy is not included.
onel Messi rdf:type SoccerPlayer”). From an             Inspired on the ideas of these works, we use a
overall perspective, information contained in these     SPARQL query oriented approach to evaluate re-
datasets can be split into two levels: schema level     dundant class assignments and make this informa-
and instance level. Schema level refers to ter-         tion explicit to users.
minological knowledge (known as TBox), for ex-
ample, classes, properties and their relationships.     4   Redundant class assignments
On the other hand, instance level refers to asser-
tional knowledge (known as ABox), that is, propo-       As we know, linked datasets are basically sets
sitions about entities of a specific domain of in-      of RDF triples and knowledge is specified using
terest. An important type of assertional knowl-         mechanisms provided by RDFS and OWL, each
edge corresponds to class assignments, that is,         one with its own well-defined semantics (Hitzler et
RDF statements of the form “resource rdf:type           al., 2009). In this way, schema and instance level
class” used to specify resources as members of          assertions can be considered as propositions to for-
certain classes. The most common way to retrieve        mally describe the notion of derivable knowledge.
this information from linked datasets is through a      For example, the notation {p1 , p2 } |= {p3 , p4 }
SPARQL5 endpoint. These endpoints are web ser-          (where |= is called entailment relation) states that
vices that accept SPARQL queries and return in-         propositions p3 and p4 (also p1 and p2 ) are log-
formation that match with a given pattern. In this      ical consequences of propositions p1 and p2 ob-
work we will use this mechanism to detect redun-        tained under a certain set of rules (logic). Con-
dant class assignments.                                 sidering this, the concept of data redundancy can
                                                        be associated to what is known in mathematical
3   Related Work                                        logic literature as independence, that is, the abil-
                                                        ity to deduce a proposition from other proposi-
In the linked data literature, redundancy is re-        tions. Formally, given a logic L (semantics) and a
lated with the data quality dimension of concise-       set of propositions P , it is defined as independent
ness (Zaveri et al., 2015) and has been studied         if for all proposition pi 2 P does not hold that
and categorized from syntactic to semantic and          {P pi } |= pi . In this way, a non-independent
from schema to instance levels (Pan et al., 2014).      set of propositions can be considered redundant
From a syntactic perspective most of the existing       since it contains extra information that may not be
compression techniques focus on RDF serializa-          necessary because if it is removed from the initial
tion. On the other hand, from a semantic perspec-       set, it can be obtained from the remaining proposi-
tive, just a few works addressed redundancy. In         tions applying an inference mechanism and keep-
(Wu et al., 2014) authors propose a graph based         ing the same logical consequences. Similarly, an
  2                                                     independent set of propositions can be considered
    https://www.w3.org/DesignIssues/
LinkedData.html                                         non-redundant.
  3
    https://www.w3.org/TR/rdf-schema/                      As we mentioned in section 2, class assign-
  4
    https://www.w3.org/standards/techs/                 ments are instance level assertions (or proposi-
owl\#w3c\_all
  5
    https://www.w3.org/TR/                              tions) that specify resources as members of cer-
rdf-sparql-query/                                       tain classes. Thus, given a resource r, its class as-


                                                  104
signment set (CASr ) contains all the propositions            get all the ancestors (or depth) of a given class (as
that specify the classes to which r belongs. The              shows example of listing 3).
idea of the previous paragraph can be applied to
class assignments to define semantic redundancy                SELECT DISTINCT ?c
since the non-redundant class assignment set of                WHERE {
a resource r (N RCASr ) is the independent set of                ?c rdf:subClassOf* <class_URI>
CASr . Then, the redundant class assignment set                }
(RCASr ) can be considered as the difference of
those sets. In the following subsections, we dis-
                                                              Listing 2: SPARQL query to get class descendants
cuss some techniques that can be used to compute
N RCASr on linked datasets. Then, we will use
one of these techniques to perform our evaluation.
                                                               SELECT DISTINCT ?c
4.1 Using SPARQL queries                                       WHERE {
                                                                 <class_URI> rdf:subClassOf* ?c
Using SPARQL, a simple query can be imple-                     }
mented to get the N RCAS. For example, query
in listing 1 can be used to get the non-redundant
set of classes of a given resource (specified by re-           Listing 3: SPARQL query to get class ancestors
source URI).
                                                                 It is important to highlight that the performance
 SELECT DISTINCT ?c                                           of SPARQL queries depends on its implemen-
 WHERE {                                                      tation and the dataset size. Although complex
   <resource_URI> rdf:type ?c                                 SPARQL queries can become unacceptably slow
   FILTER regex(str(?c),"ont_URI","i")                        when working with large amounts of data, it is
   FILTER NOT EXISTS {
                                                              currently the most practical mechanism to access
     <resource_URI> rdf:type ?sc .
     FILTER regex(str(?sc),"ont_URI","i")
                                                              linked datasets.
     ?sc rdfs:subClassOf ?c }
                                                              4.2   Using graph based algorithm and
 }
                                                                    reasoners
                                                              Given a class hierarchy, a resource r and its CASr ,
Listing 1: SPARQL query example to get non-                   a way to compute redundancy is by interpreting
redundant class assignments                                   the class hierarchy as a directed acyclic graph “G”
                                                              in which each node is a class and each edge is
   Note that the mentioned query example consid-              the relation rdfs:subClassOf. A node “A”
ers only one ontology (filtered by ontology URI)              of “G” can be considered a class if there exist a
and does not implement any inference mechanism                triple with the form “A rdfs:subClassOf x”,
at instance or schema level. This means that the              “x rdfs:subClassOf A” or “x rdf:type
query will work while all class assignments and               A”. Then, a class “B” is subclass of a class “A”
relationships between the involved classes will be            if node “A” is reachable from node “B” in “G”,
specified explicitly on the dataset. If this is not the       that is, if exists a path between B and A in the
case and a transitive closure of sub/super classes            graph. Considering this, given a proposition set
is needed, it is necessary to implement an algo-              Q that specifies the classes to which an instance i
rithm that iterates recursively over these queries            belongs, a naive algorithm can be implemented to
until it gets the required classes. Using SPARQL              compute a non-redundant proposition set R: first
property paths (e.g. rdfs:subClassOf* or                      set R=Q, then for each element q in Q check if
rdfs:subClassOf+) it is possible to check                     there is a path from some of the remaining propo-
connectivity of two classes by an arbitrary length            sitions in Q to q, if so, q is deleted from R. Finally,
path (route through a graph between two graph                 the algorithm returns R which is then the non re-
nodes)6 . For example, it can be used to get all              dundant class assignments set of r. Removing
the classes that are descendant (or subclasses) of            redundancies can be associated with the problem
a given class (as shows example of listing 2) or to           known as transitive reduction (Aho et al., 1972)
   6
       https://www.w3.org/TR/sparql11-property-paths/         which has an unfortunate complexity class if it is


                                                        105
implemented naively. If the given graph is a fi-                 further analysis was done per class groups but
nite directed acyclic graph current approaches that              only considering the DBpedia class hierarchy in
solve this problem are close to the upper bound                  order to keep the number of classes manageable.
O(n2.3729 ) but if the graph has cycles the problem              For each class group, we analyzed the relationship
belongs to NP-hard class.                                        between redundant class assignments (RCA) and
   Another alternative to compute redundancy is                  three class hierarchy characteristics: class depth,
by using Semantic Web reasoners that have been                   class descendants and number of class assign-
implemented based on decidable fragments of                      ments per class.
RDFS and OWL semantics. These tools imple-
ment inference mechanisms that can be used to                    5.1   Overall redundancy evaluation
deduce if a resource belongs to a given class (in-               The first overall evaluation was made by retriev-
stance checking). The main advantage of using                    ing all resources that belong to some class of
reasoners is that the potential of the underling se-             the DBpedia ontology (6,729,604 resources of
mantics can be exploited (e.g. several ontologies                453 classes) and then we did the same with the
can be combined to get implicit knowledge). On                   YAGO ontology (2,886,306 resources of 369,144
the other hand, the disadvantage of using these                  classes). For each resource we compute its CAS,
tools in linked data scenario is its complexity: in-             its N RCAS and its RCAS (see section 4) con-
ference techniques work well for small examples                  sidering both ontologies separately. Information
with limited knowledge but they turn unacceptably                about resources and its CAS and N RCAS were
slow for large-scale datasets. Besides, when multi-              obtained through SPARQL queries (see section
ples ontologies are combined, inconsistencies can                4.1) and RCAS was obtained by computing the
arise affecting the inference process and hamper-                difference CAS        N RCAS. Results can be
ing the detection of redundant propositions.                     viewed in table of figure 1. Each element of
                                                                 each CAS was counted as a different class assign-
5       Evaluation                                               ment (nbCA column), each element of N RCA
To perform our evaluation we selected the En-                    was counted as a non-redundant class assignment
glish version of DBpedia7 and set up a local mir-                (nbNRCA column) and each element of RCA was
ror using a Virtuoso8 server (version 7.2) . The                 counted as a redundant class assignment (nbRCA
mechanisms implemented to compute redundant                      column). As we can see in chart of figure 1,
class assignment avoid the use of complex graph                  considering classes of the DBpedia ontology al-
based algorithms or RDFS/OWL reasoners and                       most half class assignments are redundant. On the
use a SPARQL query oriented approach (see sec-                   other hand, considering the YAGO ontology 80%
tion 4.1). Although resources in DBpedia are clas-               of class assignments are redundant. In the latter
sified using several classes of different schema we              case, the amount of class assignments is higher
only considered the DBpedia9 core and YAGO10                     and the amount of concepts in the class hierar-
ontologies because information about the involved                chy increases considerably. These results sug-
class hierarchies (subclass relationships) can be                gest a relationship between the number of classes,
obtained directly from queries through the dataset               class assignments and redundancy: as the number
SPARQL endpoint. DBpedia ontology is a shal-                     of classes and class assignments increases, so the
low cross-domain ontology that covers more than                  probability of redundancy.
600 classes and was created based on Wikipedia                   5.2   Redundancy and class depth
infoboxes. YAGO is a taxonomy used in the
                                                                 To analyze the relationship between redundant
YAGO knowledge base that currently covers more
                                                                 class assignments and class depth we categorized
than 350,000 classes. The evaluation is organized
                                                                 classes into groups from 0 to 6 according to their
in the next subsections as follows: we first per-
                                                                 depth in the DBpedia class hierarchy (the distance
formed an overall redundancy evaluation consid-
                                                                 from the root to that class) and then we count
ering DBpedia and YAGO ontologies and then a
                                                                 how many class assignments refer to those classes.
    7
     http://wiki.dbpedia.org/Downloads2015-04                    Classes with depth 0 are the most general and 6
    8
     http://virtuoso.openlinksw.com/
   9                                                             is the max depth found in the class hierarchy. A
     http://wiki.dbpedia.org/services-resources/ontology
  10
     https://www.mpi-inf.mpg.de/departments/databases-           class assignment refers to (or belongs to) a class C
and-information-systems/research/yago-naga/yago/                 if it is a triple of the form (resource rdf:type


                                                           106
                                                         column) of depth 0 (Depth column), 5,037,966
                                                         class assignments (nbCA column) that refer to
                                                         those classes and 3,940,920 of them are redundant
                                                         (nbRCA column). Examples of classes that belong
                                                         to that group are Agent, Place, Work, etc.


Figure 1: DBpedia and YAGO redundancy evalu-
ations


C). To compute the depth of a class we used a
SPARQL query to count the number of ancestors
(see section 4.1 listing 3). Results can be viewed
in table 1, chart of figure 2 shows the relationship
between the class depth and the percentage of re-        Figure 3: RCA distribution considering the class
dundant class assignments (%RCA) and chart of            depth
figure 3 shows how these redundant class assign-
ments are distributed.
     Depth    nbClasses   nbCA        nbRCA
     0        32          5,037,966   3,940,920          5.3   Redundancy and class descendants
     1        86          3,941,230   2,877,434
     2        110         5,385,831   1,156,327
     3        180         1,154,660   118,886            To analyze the relationship between redundant
     4        39          118,891     5,777              class assignments and class descendants we cat-
     5        5           5,777       0
     6        1           889         0
                                                         egorized classes into 10 groups according to the
                                                         number of descendants that they have. To compute
Table 1: Redundancy and class depth evaluation.          the descendants we used a SPARQL query to get
                                                         the subclasses of a given class (see section 4.1 list-
                                                         ing 2). Results are showed in table 2 and chart of
  As we can see on chart of figure 2, as the class
                                                         figure 4 shows the relationship between the num-
depth increases (more specific a class is), the num-
                                                         ber of class descendants and the percentage of re-
ber of redundant class assignments decreases.
                                                         dundant class assignments (%RCA). Chart of fig-
                                                         ure 5 shows how these redundant class assign-
                                                         ments are distributed. Classes that do not have
                                                         descendants (330 classes) are the most specific
                                                         and class assignments that belong to that group
                                                         are not redundant. As we can see in chart of fig-
                                                         ure 4, when the number of descendant per class
                                                         increases, the number of redundant class assign-
                                                         ments also increases. For example, group named
                                                         “1 to 5” refers to classes that have between 1 to
                                                         5 descendants (78 classes) and redundancy is rela-
          Figure 2: RCA vs class depth                   tively low. On the other hand, classes with several
                                                         descendants (e.g. class Agent) has a high level of
   Chart of figure 3 shows that more than 80% of         redundancy. As chart of figure 5 shows, only 3
redundant class assignments refer to more general        classes have more than 100 descendants (Agent,
classes (with less depth). For example, in table         Person and Place) and they concentrates the 65%
1 we can see that there are 32 classes (nbClasses        of redundant class assignments.


                                                   107
  nbDesc          nbClasses   nbCA        nbRCA
  0               330         3,250,543   0
  1 to 5          78          3,209,728   321,078
  6 to 10         16          948,187     524,262
  10 to 20        13          666,630     531,848
  21 to 30        8           773,316     751,488
  31 to 50        2           535,606     502,892
  51 to 70        3           809,171     784,103
  71 to 100       1           220,219     211,415
  101 to 200      2           2,860,585   2,117,098
  More than 200   1           5,231,844   4,472,258

Table 2: Redundancy and class descendants eval-
uation.


                                                            Figure 5: RCA distribution considering class de-
                                                            scendants


                                                            less than 10K class assignments. Besides, as
                                                            this number increases the number of classes in-
                                                            volved decreases but the percentage of redundant
                                                            class assignments increases. For example, classes
                                                            that have more than 500K class assignments (e.g.
       Figure 4: RCA vs class descendants                   Agent, Place, Person, etc.) concentrate most of
                                                            them (9,292,013) and 48% are redundant. We also
                                                            observe that the second group of most used classes
5.4 Redundancy and number of class
                                                            (between 200K and 500K class assignments) con-
    assignments
                                                            sists of about 8 classes with high levels of redun-
To analyze the relationship between redundant               dancy (between 70% and 95%).
class assignments and the number of class as-
signments per class we categorized classes into
10 groups according to the number of class as-
signments that refers to a class. Table 3 shows
the evaluation results and chart of figure 6 shows
the relationship between redundancy and the num-
ber of class assignments (or instances) per class.
Columns nbCA-acc and nbRCA-acc show the total
number of class assignments and redundant class
assignments in each group.

 nbCA             nbClasses   nbCA-acc     nbRCA-acc            Figure 6: RCA vs number of class assignments
 0 to 10K         315         599,261      101,861
 10K to 20K       35          484,931      120,512
 20K to 30K       34          588,845      267,460
 30K to 40K       25          384,601      220,270          6     Conclusions and future work
 40K to 30K       11          900,290      36,156
 100K to 200K     7           906,730      365,085          In this work we addressed the concept of seman-
 200K to 300K     5           1,274,010    1,222,844        tic redundancy considering class assignments as-
 300K to 400K     1           396,046      395,804          sertions in linked datasets. Based on a formal
 400K to 500K     2           934,843      681,445
 More than 500K   6           9,292,013    4,472,258        definition we discussed how redundant (and non-
                                                            redundant) class assignments sets can be detected.
Table 3: Redundancy and number of class assign-             Inspired in previous related work, we conducted
ments evaluation.                                           an evaluation over the English version of DBpedia
                                                            based on SPARQL queries. We analyzed the re-
  As we can see, most of classes (315) have                 lationship between redundancy and three class hi-


                                                      108
erarchy characteristics: the number of instances a         Christian Fürber and Martin Hepp. 2010. Using se-
class has, its depth and its number of descendants.          mantic web resources for data quality management.
                                                             In Proceedings of the 17th International Conference
   Regarding the evaluation results, they suggest
                                                             on Knowledge Engineering and Management by the
that there is a relationship between redundancy              Masses, EKAW’10, pages 211–225, Berlin, Heidel-
and depth of a class in a hierarchy: as more gen-            berg. Springer-Verlag.
eral a class is (less depth), more redundant class
                                                           Tom Heath and Christian Bizer. 2011. Linked Data:
assignments can be found that refer to that class.           Evolving the Web into a Global Data Space, vol-
In a similar way, as the number of descendants               ume 1 of Synthesis Lectures on the Semantic Web:
per class increases so does the number of redun-             Theory and Technology. Morgan Claypool, 1st ed.,
dant class assignments related to that class. Par-           html version edition, February.
ticularly, we noted that this also occurs in most          Pascal Hitzler, Markus Krötzsch, and Sebastian
populated classes (with more class assignments).             Rudolph. 2009. Foundations of Semantic Web Tech-
In this sense, datasets that use complex and large           nologies. Chapman & Hall/CRC.
class hierarchies to typify their resources in uncon-      Aidan Hogan, Andreas Harth, Alexandre Passant, Ste-
trolled environments (such as crowsourced gener-             fan Decker, and Axel Polleres. 2010. Weaving the
                                                             pedantic web. In Linked Data on the Web Workshop
ated content) may be more prone to class assign-
                                                             (LDOW2010) at WWW’2010, volume 628, pages
ments redundancy. Considering this, we can make              30–34. CEUR Workshop Proceedings.
the following observations:
                                                           Amit Krishna Joshi, Pascal Hitzler, and Guozhu Dong,
  • SPARQL queries as a mechanism to evalu-                 2013. The Semantic Web: Semantics and Big
                                                            Data: 10th International Conference, ESWC 2013,
    ate redundancy offer practical ways to imple-           Montpellier, France, May 26-30, 2013. Proceedings,
    ment quality checks and get some statistics             chapter Logical Linked Data Compression, pages
    of linked datasets. However, SPARQL in-                 170–184. Springer Berlin Heidelberg, Berlin, Hei-
    ference capabilities are limited: we can dis-           delberg.
    cover just some graph patterns using queries           Dimitris Kontokostas, Patrick Westphal, Sören Auer,
    but other implicit knowledge will be unreach-            Sebastian Hellmann, Jens Lehmann, Roland Cor-
    able since we can not exploit all the semantic           nelissen, and Amrapali Zaveri. 2014. Test-driven
                                                             evaluation of linked data quality. In Proceedings of
    capabilities or expressiveness of the used lan-          the 23rd International Conference on World Wide
    guages.                                                  Web, WWW ’14, pages 747–758, Republic and
                                                             Canton of Geneva, Switzerland. International World
  • Redundancy analysis can be used to detect                Wide Web Conferences Steering Committee.
    class assignments patterns and data publish-
                                                           Jens Lehmann, Robert Isele, Max Jakob, Anja
    ers behaviors. For example, in our evaluation             Jentzsch, Dimitris Kontokostas, Pablo N. Mendes,
    we detected that when a resource is assigned              Sebastian Hellmann, Mohamed Morsey, Patrick van
    to a very specific class, it is also assigned ex-         Kleef, Sören Auer, and Christian Bizer. 2015. DB-
    plicitly to the ancestors of that class. Discov-          pedia - a large-scale, multilingual knowledge base
                                                              extracted from wikipedia. Semantic Web Journal,
    ering these kinds of patterns may be useful to            6(2):167–195.
    improve the linked data generation process or
    even to understand how classes described in a          Jeff Z Pan, JM Gómez-Pérez, Yuan Ren, Honghan
                                                              Wu, and Man Zhu. 2014. Ssp: Compressing rdf
    given ontology are used on a specific dataset.            data by summarisation, serialisation and predictive
                                                              encoding. Technical report, Technical report, 07
   Future work will be focused on the development             2014. Available as http://www. kdrive-project. eu/re-
and assessment of semantic redundancy metrics                 sources.
that support our results on other linked datasets.         Honghan Wu, Boris Villazn-Terrazas, Jeff Z. Pan, and
Besides, we plan to study how redundancy can                 Jos Manul Gmez-Prez. 2014. How redundant is
affect other data quality dimensions particularly            it? - an empirical analysis on linked datasets. In
those related with semantic accuracy.                        Olaf Hartig, Aidan Hogan, and Juan Sequeda, edi-
                                                             tors, COLD, volume 1264 of CEUR Workshop Pro-
                                                             ceedings. CEUR-WS.org.
References                                                 Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ri-
                                                            cardo Pietrobon, Jens Lehmann, and Sören Auer.
A. V. Aho, M. R. Garey, and J. D. Ullman. 1972. The         2015. Quality assessment for linked data: A survey.
  transitive reduction of a directed graph. SIAM Jour-      Semantic Web Journal.
  nal on Computing, 1(2):131.


                                                     109

</pre>