=Paper= {{Paper |id=Vol-1481/paper6 |storemode=property |title=A Statistical Comparison of Current Knowledge Bases |pdfUrl=https://ceur-ws.org/Vol-1481/paper6.pdf |volume=Vol-1481 |dblpUrl=https://dblp.org/rec/conf/i-semantics/FarberR15 }} ==A Statistical Comparison of Current Knowledge Bases== https://ceur-ws.org/Vol-1481/paper6.pdf
       A Statistical Comparison of Current Knowledge Bases

                          Michael Färber                                               Achim Rettinger
                          Institute AIFB                                                Institute AIFB
              Karlsruhe Institute of Technology (KIT)                       Karlsruhe Institute of Technology (KIT)
                      Karlsruhe, Germany                                            Karlsruhe, Germany
                   michael.faerber@kit.edu                                            rettinger@kit.edu


ABSTRACT                                                                Hence, our main contributions in this paper are:
In the last years, many knowledge bases have been developed
and used in real-world applications. These include DBpedia,                • We calculate a variety of statistical measurements on
Wikidata, and YAGO which all cover general knowledge and                     the widely used KBs DBpedia, Wikidata, and YAGO.
therefore similar topics. In this poster, we present statistical
                                                                           • We give an analysis regarding these results.
measurements on these KBs. Our experiments reveal that
despite that fact that these KBs cover the same domains                    • We make our framework for statistical analysis of KBs
to a considerable amount, they differ from each other sig-                   available for the public2 so that other KBs can be easily
nificantly w.r.t. their graph-based structure and ontological                integrated.
aspects.
                                                                        The remainder of this paper is organized as follows: First we
Categories and Subject Descriptors                                      give an overview of related work of semantic graph analysis.
H.4 [Information Systems Applications]: Miscellaneous;                  We then introduce the KBs which we selected for our analy-
D.2.8 [Software Engineering]: Metrics—complexity mea-                   sis, and provide details regarding the current versions of the
sures, performance measures                                             KBs. We then present the results of applying several graph-
                                                                        based and semantics-based metrics on the KB datasets in
                                                                        question. After discussing particularities of our analysis in
Keywords                                                                Section 3, we conclude in Section 4.
Knowledge Bases, Knowledge Graphs, Statistics, Metrics

1.     INTRODUCTION                                                     2. COMPARISON OF KNOWLEDGE BASES
In the last years, several knowledge bases (KBs) have been              2.1 Related Work
developed and found their way into industrial applications.             Firstly, some work on the analyis of the graph structure of
Although KBs have been used a lot, to the best of our knowl-            the (HTML) Web has been carried out. Early studies of Web
edge, comparative studies on the statistical characteristics of         topology were published already in the 1990s (see, e.g., [2]).
KBs are very limited so far. This is in particular true for the         In 2000, Broder et al. [3] found out that the structure of the
KBs DBpedia, Wikidata, and YAGO. These KBs are freely                   Web can be modeled in the shape of a bow tie. Rather re-
available and do not cover a specific domain, but general               cently, Donato et al. [4] developed some models which were
knowledge in general. In this paper, we focus on these KBs              brought into accordance with their crawl dataset regarding
and exhibit their particularities w.r.t. their structural and           some characteristics such as the power law distribution for
ontological conditions. Based on the fact that these KBs are            degree.
– from a conceptual point of view – directed graphs consist-            Secondly, related work has been carried out on the analysis
ing of RDF triples,1 we come up with simple graph-based                 of the Linked Open Data (LOD) cloud:3 Rodriguez [9], for
and RDF-based metrics such as in-degree, out-degree and a               instance, analyzed the graph of data sources in the LOD
variety of other metrics. Given the results of these metrics,           cloud. Among other things, he concluded that, despite the
we can gain a better insight into the particularities of these          general assumption of the LOD cloud being a crowded “ravel”,
current KBs and learn to what extent they differ from each              the LOD cloud can be disaggregated into a component around
other.                                                                  DBpedia and another component around DBLP.4 Gueret et
1                                                                       al. [6] confirmed that observation, but added a third com-
    See http://www.w3.org/RDF/.                                         ponent around UniProt.5
                                                                        Thirdly, a few analyses of single ontologies [11, 7, 5] were
                                                                        made – as we do it in this paper: Theoharis et al. [11] fo-
                                                                        cus on power-law degree distributions. According to them,
                                                                        2
                                                                          The implementation of the framework is available for down-
                                                                        load at http://www.aifb.kit.edu/web/KB-Statistics.
                                                                        3
                                                                          See http://lod-cloud.net.
                                                                        4
                                                                          See http://dblp.uni-trier.de. DBLP contains biblio-
                                                                        graphical information and is not domain-independent.
                                                                        5
                                                                          See http.//www.uniprot.org.


                                                                   18
ontologies exhibit power law degree distributions as soon as                 the integration of Freebase data.12 Our experiments
they have a sufficient number of predicates or classes. In                   on Wikidata are based on the Wikidata simple state-
this paper, we also calculate degree distributions and exam-                 ments dataset from February 2015.13
ine whether they follow a power-law. Hoser et al. [7] applied
                                                                           • YAGO: YAGO14 – Yet Another Great Ontology –
social network analysis on the two ontologies SWRC6 and
                                                                             has been developed at the Max Planck Institute for
Suggested Upper Merged Ontology (SUMO).7 According to
                                                                             Computer Science in Saarbrücken since 2007. YAGO
the authors, eigenvalue analysis provides deep insights into
                                                                             comprises information extracted from the Wikipedia,
the structure and focus of the ontology. In our work, in con-
                                                                             WordNet15 , and GeoNames.16 As of March 24, 2015,
trary, we do not take eigenvectors into consideration. In the
                                                                             YAGO3 is available, which we use in our experiments.
context of describing and evaluating a benchmark generator
                                                                             Since the YAGO3 data set was not available in triple
for Linked Data, Duan et al. [5] used measurements such as
                                                                             format at the time of the experiments, we transformed
indegree and number of distinct subjects/objects of specific
                                                                             the available tsv files into the triple format.
KBs such as DBpedia and YAGO (as of 2011). Their work
is therefore mostly related to our work. Duan et al. found
out that there is a bad fit between the degree distribution           2.3      Analysis of the Knowledge Bases
of the Semantic Web benchmark and curated Linked Data                  2.3.1    Number of Triples
datasets. They propose a new metric called coherence since            Comparing the number of triples in the different KBs (see
the existing graph-based metrics do not make a point about            Figure 1a), we can see that YAGO has much more triples
the quality of a KB. However, as we see in our experiments,           than DBpedia or Wikidata. One reason for that might be
this metric is not properly applicable for our KBs.                   that in case of YAGO (and Wikidata) there was only one
                                                                      dataset with all covered languages given (containing labels
2.2    Overview of the Knowledge Bases                                in different languages), while for DBpedia we could restrict
In the following, we shortly describe the different KBs which         the KB to the English language. Wikidata is rather small,
we analyze in the following sections. We focus on these three         since knowledge stored in Wikidata was not extracted from
KBs since they cover general, cross-domain knowledge and              one text corpus – as in case of DBpedia –, but created by
similar topics.                                                       users of the Wikidata community.

    • DBpedia: DBpedia8 is the most popular and promi-                 2.3.2    Disk Space
      nent KB in the LOD cloud [1]. Since the first public            As visible in Figure 1b and as expectedly, the measured disk
      release in 2007, DBpedia is updated roughly once a              space is directly correlated to the number of triples. Figure
      year.9 DBpedia is created from automatically-extracted          1c shows the relative disk space. Interesting is here the fact
      structured information contained in the Wikipedia, such         that – despite the relatively small number of triples – Wiki-
      as from infobox tables, categorization information, geo-        data requires much less disk space than the other KBs. The
      coordinates, and external links. Due to its role as the         reason for that is that Wikidata uses non-human readable
      hub of Linked Open Data, DBpedia contains many                  URIs (such as http://wikidata.org/entity/Q1040) while
      links to other datasets in the LOD cloud. DBpedia is            the other KBs rely on human-readable URIs (e.g., http://
      used extensively in the Semantic Web research com-              dbpedia.org/resource/Karlsruhe and http://yago.org/
      munity, but is also relevant in commercial settings:            resource/Karlsruhe). In case of Wikidata, the human-
      companies use it to organize their content, such as the         readable labels for entities and properties are stored sep-
      BBC [8] and the New York Times [10]. In our experi-             arately.
      ments, we use the latest version of DBpedia, which is
      DBpedia 2014.10
                                                                       2.3.3    Number of Distinct Subjects and Number of
    • Wikidata: Wikidata11 started on October 30, 2012                          Distinct Objects
      as a project of Wikimedia Deutschland. The aim of               Comparing the number of distinct subjects across the KBs
      the project is to provide data which can be used by             in question (see Figure 1d) and the number of distinct ob-
      any Wikimedia project, including Wikipedia. Wiki-               jects (see Figure 1e), it becomes apparent that DBpedia has
      data does not only store facts, but also the corre-             relatively few distinct subjects, but instead more distinct
      sponding sources, so that the validity of facts can be          objects. In other words: The set of resources with outgoing
      checked. Labels, aliases, and descriptions for entities         edges is significantly smaller than the set of resources with
      in Wikidata are provided in more than 350 languages.            incoming edges (ratio 1 : 1.6). YAGO, in contrast, has the
      Wikidata is a community effort, i.e., users collabora-          opposite characteristic (ratio 21 : 1). Figure 1f and 1g show
      tively add and edit information. Also, the schema is            the ratio of the set of distinct subjects/objects w.r.t. to the
      maintained and extended based on community agree-               entire set of resources in the KBs. Notable is that in case of
      ments. In the near future, Wikidata will grow due to            YAGO, only to relatively few resources is linked.
 6                                                                    12
   See http://ontobroker.semanticweb.org/ontologies/                     See                   https://plus.google.com/u/0/
 swrc-onto-2001-12-11.oxml.                                            109936836907132434202/posts/bu3z2wVqcQc
 7                                                                    13
   See http://www.ontologyportal.org.                                    See http://tools.wmflabs.org/wikidata-exports/rdf/
 8
   See http://dbpedia.org.                                             exports/20150223/.
 9                                                                    14
   There is also DBpedia live which is updated when                      See         http://www.mpi-inf.mpg.de/departments/
 Wikipedia is updated. See http://live.dbpedia.org.                    databases-and-information-systems/research/
10                                                                     yago-naga/yago/downloads/
   See our website for a list of the dump files used in our
                                                                      15
 experiments.                                                            See https://wordnet.princeton.edu.
11                                                                    16
   See http://wikidata.org.                                              See www.geonames.org.


                                                                 19
                                     x 10
                                           8   Number of triples in each KB                                                                                        x 10
                                                                                                                                                                         10              Disk Space                                                                                                                        Relative Diskspace                                                                                           x 10
                                                                                                                                                                                                                                                                                                                                                                                                                                              8          Number of subjects
                             12                                                                                                                              16                                                                                                                         160                                                                                                                       3.5




                                                                                                                                                                                                                                               Relative Diskspace [Byte per Triple]
                                                                                                                                                             14                                                                                                                         140                                                                                                                                         3
                             10




                                                                                                                               Disk Space usage [Byte]
                                                                                                                                                             12                                                                                                                         120




                                                                                                                                                                                                                                                                                                                                                                                        Number of subjects
                                                                                                                                                                                                                                                                                                                                                                                                                  2.5
         Number of triples


                                 8
                                                                                                                                                             10                                                                                                                         100
                                                                                                                                                                                                                                                                                                                                                                                                                                    2
                                 6                                                                                                                            8                                                                                                                                       80
                                                                                                                                                                                                                                                                                                                                                                                                                  1.5
                                                                                                                                                              6                                                                                                                                       60
                                 4
                                                                                                                                                                                                                                                                                                                                                                                                                                    1
                                                                                                                                                              4                                                                                                                                       40
                                 2                                                                                                                                                                                                                                                                                                                                                                                0.5
                                                                                                                                                              2                                                                                                                                       20

                                 0                                                                                                                            0                                                                                                                                            0                                                                                                                        0
                                     DBpedia 2014        Wikidata                       YAGO3                                                                       DBpedia 2014           Wikidata                         YAGO3                                                                              DBpedia 2014       Wikidata                           YAGO3                                                              DBpedia 2014              Wikidata    YAGO3



                                     (a) Number of triples                                                                                                           (b) Disk space used                                                                                                                       (c) Relative disk space                                                 (d) Number of distinct subjects
                                 x 10
                                       7          Number of objects                                                                                                            Subject Resource Ratio                                                                                                                     Object Resource Ratio                                                                                             x 10
                                                                                                                                                                                                                                                                                                                                                                                                                                                  4       Number of properties
                             5                                                                                                                               100                                                                                                                                      100                                                                                                                           10




                                                                                                              Subject Resource Ratio [%]




                                                                                                                                                                                                                                                                          Object Resource Ratio [%]
                             4                                                                                                                               80                                                                                                                                            80                                                                                                                           8




                                                                                                                                                                                                                                                                                                                                                                                                             Number of properties
      Number of objects




                             3                                                                                                                               60                                                                                                                                            60                                                                                                                           6


                             2                                                                                                                               40                                                                                                                                            40                                                                                                                           4


                             1                                                                                                                               20                                                                                                                                            20                                                                                                                           2


                             0                                                                                                                                0                                                                                                                                                0                                                                                                                        0
                                 DBpedia 2014           Wikidata                       YAGO3                                                                        DBpedia 2014           Wikidata                         YAGO3                                                                                  DBpedia 2014    Wikidata                           YAGO3                                                                  DBpedia 2014          Wikidata    YAGO3



     (e) Number of distinct objects                                                                                                                  (f) Subject Resource Ratio                                                                                                                       (g) Object Resource Ratio                                                         (h) Number of distinct proper-
                                                                                                                                                                                                                                                                                                                                                                                        ties
                                 x 10
                                       5          Number of classes                                                                                                x 10
                                                                                                                                                                         4   Avg no. of instances per class                                                                                                                  Average indegree                                                                                                         Average indegree with literals
                             5                                                                                                                               3.5                                                                                                                                           70                                                                                                                       20
                                                                                                                     Average number of instances per class




                                                                                                                                                              3                                                                                                                                            60
                             4
                                                                                                                                                                                                                                                                                                                                                                                                                                    15
      Number of classes




                                                                                                                                                                                                                                                                                        Average indegree




                                                                                                                                                                                                                                                                                                                                                                                                             Average indegree
                                                                                                                                                             2.5                                                                                                                                           50
                             3                                                                                                                                2                                                                                                                                            40
                                                                                                                                                                                                                                                                                                                                                                                                                                    10
                             2                                                                                                                               1.5                                                                                                                                           30

                                                                                                                                                              1                                                                                                                                            20
                                                                                                                                                                                                                                                                                                                                                                                                                                        5
                             1
                                                                                                                                                             0.5                                                                                                                                           10

                             0                                                                                                                                0                                                                                                                                                0                                                                                                                        0
                                 DBpedia 2014           Wikidata                       YAGO3                                                                        DBpedia 2014           Wikidata                         YAGO3                                                                                  DBpedia 2014    Wikidata                           YAGO3                                                                  DBpedia 2014          Wikidata    YAGO3



     (i) Number of distinct classes                                                                       (j) Average number of in-                                                                                                                                                                                (k) Average indegree                                                 (l) Average indegree with liter-
                                                                                                          stances per class                                                                                                                                                                                                                                                             als
                                                                                 8
                                                                                               Indegree Distribution                                                                                                                Average outdegree                                                                                                       10
                                                                                                                                                                                                                                                                                                                                                                             Outdegree Distribution
                                                                                10                                                                                                                                    20                                                                                                                                   10
                                                                                                                                                                   DBPedia 2014                                                                                                                                                                                                                                                             DBPedia 2014
                                                                                                                                                                   Wikidata                                                                                                                                                                                 8
                                                                                                                                                                                                                                                                                                                                                                                                                                            Wikidata
                                                                                 6                                                                                 YAGO3                                                                                                                                                                                   10                                                                               YAGO3
                                                                                10                                                                                                                                    15
                                                                                                                                                                                                  Average outdegree
                                                              Number of nodes




                                                                                                                                                                                                                                                                                                                                         Number of nodes




                                                                                                                                                                                                                                                                                                                                                            6
                                                                                                                                                                                                                                                                                                                                                           10
                                                                                 4
                                                                                10                                                                                                                                    10
                                                                                                                                                                                                                                                                                                                                                            4
                                                                                                                                                                                                                                                                                                                                                           10
                                                                                 2
                                                                                10                                                                                                                                    5                                                                                                                                     2
                                                                                                                                                                                                                                                                                                                                                           10

                                                                                 0                                                                                                                                                                                                                                                                          0
                                                                                10 0            2         4                                                           6              8                                0                                                                                                                                    10    0             2        4                                                         6           8
                                                                                  10       10           10                                                          10             10                                      DBpedia 2014   Wikidata                                                                  YAGO3                                       10           10       10                                                     10             10
                                                                                                     Indegree                                                                                                                                                                                                                                                                      Outdegree


                                                                                (m) Indegree distribution                                                                                                                  (n) Average outdegree                                                                                                           (o) Outdegree distribution

                                                                                                Figure 1: Statistics for the three KBs DBpedia, Wikidata, and YAGO.

 2.3.4                                         Number of Distinct Properties                                                                                                                                                                                                                               2.3.5              Number of Distinct Classes
From our analysis regarding the number of distinct proper-                                                                                                                                                                                                                                      For calculating the number of distinct classes (see Figure 1i),
ties (see Figure 1h) we can derive that the used Wikidata                                                                                                                                                                                                                                       we iterated over all instances contained in the KB datasets
RDF version contains only around 1,323 distinct properties.                                                                                                                                                                                                                                     and took the objects of the relation rdf:type.19 Although
The reason for that is that properties are carefully intro-                                                                                                                                                                                                                                     DBpedia often contains several classes according to this class-
duced by the Wikidata community and go through an ex-                                                                                                                                                                                                                                           assignment method, we only retrieved 526 distinct classes.
tensive discussion process before they are released for usage.                                                                                                                                                                                                                                  The small number in case of Wikidata can be justified again
DBpedia contains many properties. However, they are very                                                                                                                                                                                                                                        by the community approach of Wikidata. YAGO has a as-
heterogeneous and the non-mapping-based properties17 (i.e.,                                                                                                                                                                                                                                     tonishing number of distinct classes since YAGO is mainly
properties which were extracted not based on human-defined                                                                                                                                                                                                                                      an ontology, i.e., containing class-based information such as
mappings, but solely as they appeared in the info-boxes in                                                                                                                                                                                                                                      the classes of the WordNet taxonomy. This last fact be-
Wikipedia) are often very noisy.18 A similar situation holds                                                                                                                                                                                                                                    comes apparent in Figure 1j where the average number of
for YAGO.                                                                                                                                                                                                                                                                                       instances per class is visualized.
                                                                                                                                                                                                                                                                                                           2.3.6              Indegree
17
   I.e.         properties    having     the   URI     prefix                                                                                                                                                                                                                                   Comparing the average indegree (defined as the average num-
 http://dbpedia.org/property/.                                                                                                                                                                                                                                                                  ber of inlinks per node; see Figure 1k) where no triples with
18
   There are, for instance, 53,930 triples with the property
                                                                                                                                                                                                                                                                              19
 http://dbpedia.org/property/s in DBpedia 2014 which                                                                                                                                                                                                                                             Standing      for                                                                     http://www.w3.org/1999/02/
 has obviously no meaning.                                                                                                                                                                                                                                                                      22-rdf-syntax-ns#type.


                                                                                                                                                                                                                                          20
literals (values) on the object position were considered and            4.   CONCLUSIONS
comparing the average indegree where triples with literals              A measurement how current knowledge bases such as DB-
were considered in addition (see Figure 1l), we can see that            pedia, Wikidata, and YAGO look like and how they are
in general (i.e., for all KBs) the average indegree with liter-         structured, is to a large extent missing. In this paper, we
als is much lower than the average indegree where no literals           presented a (freely available) framework for statistical anal-
were counted. The indegree for DBpedia and Wikidata is                  ysis of KBs where any KB with triple format can easily be
roughly the same. One reason might be that a considerable               integrated. We calculated a variety of statistical measure-
amount of Wikidata was taken from Wikipedia. It can be                  ments on the KBs DBpedia, Wikidata, and YAGO, since
assumed that YAGO has a higher average indegree than DB-                they all cover general knowledge and are used in many ap-
pedia and Wikidata, since YAGO comprises many different                 plications. Our investigations revealed that all current KBs
ontologies.                                                             performed very differently w.r.t. the presented metrics.
The indegree distribution diagram (see Figure 1m) shows
almost ideal logarithmic decreases of the number of nodes               Acknowledgement
for all considered KBs. This is especially interesting since all        This work was carried out with the support of the German
KBs were created in different ways: automatically extracted             Federal Ministry of Education and Research (BMBF) within
from Wikipedia (DBpedia), partly created by the commu-                  the Software Campus project SUITE (Grant 01IS12051).
nity (Wikidata), or composed of several sources which were
used partly automatically, partly manually (YAGO). In the               5.   REFERENCES
light of the figure we can also confirm that the power law               [1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann,
is still applicable to the indegree distribution of semantic                 R. Cyganiak, and Z. Ives. DBpedia: A Nucleus for a
graphs such as the considered KBs.                                           Web of Open Data. In Proceedings of the 6th ISWC
                                                                             and 2nd ASWC, pages 722–735. Springer, 2007.
 2.3.7   Outdegree                                                       [2] T. Bray. Measuring the Web. In Proceedings of the
Considering the average outdegree for each KB (defined as                    Fifth International World Wide Web Conference on
the average number of outlinks per node; see Figure 1n), we                  Computer Networks and ISDN Systems, pages
can see that nodes in the DBpedia knowledge graph have the                   993–1005. Elsevier Science Publishers B. V., 1996.
highest number of outgoing links on average. Wikidata con-               [3] A. Broder, R. Kumar, F. Maghoul, P. Raghavan,
tains currently some domains of knowledge which are repre-                   S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener.
sented very densely (such as persons) while other domains                    Graph structure in the web. Computer networks,
are rarely covered yet. On average, however, Wikidata per-                   33(1):309–320, 2000.
forms similarly as YAGO w.r.t. the average outdegree of                  [4] D. Donato, L. Laura, S. Leonardi, and S. Millozzi.
nodes.20                                                                     The Web As a Graph: How Far We Are. ACM Trans.
The average outdegree of the KBs (see Figure 1o) suggest                     Internet Technol., 7(1), Feb. 2007.
– as in the case of the average indegree – a power law dis-              [5] S. Duan, A. Kementsietsidis, K. Srinivas, and
tribution. However, if the outdegree is low, the power law                   O. Udrea. Apples and Oranges: A Comparison of
distribution is broken. This confirms the theory of [11] which               RDF Benchmarks and Real RDF Datasets. In
states that a sufficient number of predicates or classes is nec-             Proceedings of the 2011 ACM SIGMOD, pages
essary for observing a power law distribution.                               145–156, New York, NY, USA, 2011. ACM.
                                                                         [6] C. Guéret, S. Wang, and S. Schlohbach. The Web of
3.   LESSONS LEARNED                                                         Data is a Complex System – First Insight into Its
According to Theoharis et al. [11], ontologies exhibit power                 Multi-Scale Network Properties. In Proceedings of the
law degree distributions as soon as they have a sufficient                   European Conference on Complex Systems, pages
number of predicates or classes. Based on our experiments,                   1–12, 2010.
we can confirm that for the KBs we considered.                           [7] B. Hoser, A. Hotho, R. Jäschke, C. Schmitz, and
Duan et al. [5] stated that “traditional” graph analysis met-                G. Stumme. Semantic Network Analysis of Ontologies.
rics such as the degree or the number of classes are not suit-               In Y. Sure and J. Domingue, editors, The Semantic
able when KBs should be compared. Given our experimen-                       Web: Research and Applications, pages 514–529.
tal results, we can confirm that to a certain extent. Duan                   Springer Berlin Heidelberg, 2006.
et al. proposed a new metric called coherence metric where               [8] G. Kobilarov et al. Media Meets Semantic Web – How
the “filling degree” of all entities of the different classes is             the BBC Uses DBpedia and Linked Data to Make
calculated and aggregated. This might be a good indicator,                   Connections. In Proceedings of the 6th ESWC, pages
however, the calculation for our KBs is tricky, since we of-                 723–737, Berlin, Heidelberg, 2009. Springer.
ten do not know the set of possible properties an entity of              [9] M. A. Rodriguez. A graph analysis of the Linked Data
a specific class is able to have. Iterating over all existing                cloud. arXiv preprint arXiv:0903.0194, 2009.
properties of entities of this class is problematic since the           [10] E. Sandhaus. Semantic Technology at the New York
KBs are often very noisy (different properties use the same                  Times: Lessons Learned and Future Directions. In
meaning, different object types are used for the same prop-                  Proceedings of the 9th ISWC, pages 355–355, Berlin,
erty, etc.) and the considered KBs may contain multiple                      Heidelberg, 2010. Springer-Verlag.
classes per instance.                                                   [11] Y. Theoharis, Y. Tzitzikas, D. Kotzinos, and
20                                                                           V. Christophides. On Graph Features of Semantic
 The outlier where the outdegree is 108 can be traced back
to the fact that Wikidata contains many blank nodes with                     Web Schemas. IEEE Trans. on Knowl. and Data
a high outdegree.                                                            Eng., 20(5):692–702, 2008.


                                                                   21