KBMetrics – A Multi-purpose Tool for
Measuring the Quality of Linked Open Data Sets

               Tong Ruan, Xu Dong, Yang Li, and Haofen Wang

     East China University of Science & Technology, Shanghai, 200237, China
ruantong@ecust.edu.cn,dongxu0220@qq.com,marine1ly@163.com,whfcarter@ecust.edu.cn
       Abstract. While several quality assessment tools focus on evaluating
       the quality of Linking Open Data (LOD), most tools fail to meet di-
       verse quality assessment requirements from the users’ perspective. In
       this demo, we categorized quality assessments requirements into three
       layers: understanding the characteristics of data sets, comparing groups
       of data sets, and selecting data sets according to user-defined usage s-
       cenarios. We have designed KBMetrics to incorporate the above quality
       assessment purposes. Not only does the tool incorporate different kinds
       of metrics to characterize a data set, but it has also adopted ontology
       alignment mechanisms for comparison purposes. Most importantly, end
       users can define usage contexts to adapt to different usage scenarios.
       Both the quality assessment processes and findings in these data sets
       show the effectiveness of our tool.

1     Introduction
In recent years, an increasing number of semantic data sources have been pub-
lished on the Web. There is great demand for knowledge about the qualities of
these data sets. Several tools target at quality assessment tasks. For example,
with Flemming1 in German, users could get the ultimate quality value of a data
set after interactively inputting parameters on lists of metrics. TripleCheckMate2
is a crowdsourcing quality assessment tool focusing on correctness evaluation of
DBpedia. However, these currently available tools fail to meet the diversity re-
quirements of quality assessment. In this paper, we classify the goal of quality
assessment into three layers, as shown in the pyramid on the left of Figure 1.
  – Understand the characteristics of data sets. There exist lots of metrics on
     evaluating special aspects of LOD qualities, including data size, data com-
     plexity, and data consistency. The required metrics vary according to the
     data sets. For example, web-scale extracted data sets such as DBpedia and
     YAGO are prone to errors, so that the Correctness Ratio metric is under in-
     vestigation. On the other hand, for domain-oriented human constructed data
     sets published in LOD, for example, Drugbank, the number of instances in
     the domain may be of great importance.
  – Compare different data sets. The quality of a data set could be better un-
     derstood by comparing its evaluation metrics values with those of other data
     sets. For example, users may have no idea about the meaning of instance size
     100,000, while they could easily understand that one data set is larger than
1
    http://linkeddata.informatik.hu-berlin.de/LDSrcAss/datenquelle.php
2
    http://aksw.org/Projects/TripleCheckMate.html
2        Tong Ruan et al.


     Purposes of End Users                            Modules in KBMetrics

                                 ① Set Context               ② Translate Context        ③ Execute SparQL
                                 Select domain                   to SparQL
                                 Select property
           Selecting             Set property constraint                               ④ Store Data
                                                             Context Generation


                                Instance Matching Comparison
          Comparing


                                 ① Select Metrics           ② Calculate Metrics       ③ Visualize Metrics

        Understanding
                                                               (1) Human Evaluated   (2) Machine Evaluated
                                Metrics Calculation                                        SELECT ?v
                                                                                           WHERE {?v ?p 42}

Fig. 1. Functions in KBMetrics and its Relation with Purposes of Quality Assessment
    another. Furthermore, the comparisons become more meaningful if they are
    carried out under the same or similar conditions. For instance, it is better
    to compare Drugbank and DBpedia on the drug-related domain instead of
    all other domains defined in DBpedia. In that case, calculating the metrics
    on the overlapped instances or the overlapped domains is fairer and more
    reasonable.
  – Select suitable data sets. The ultimate goal of quality assessment is to help
    end users determine which data sets are “fit for use” for their data usage
    requirements. Traditionally, data quality is commonly conceived of as “fitness
    of use for a certain application or user case”. For example, as mentioned in
    3
      , “DBpedia currently can be appropriate for a simple end-user application
    but could never be used in the medical domain for treatment decisions”.
    However, the questions of how to define the “Usage Contexts” and how to
    link these contexts to the quality assessment process have not been well
    investigated in the literatures. To the best of our knowledge, there exists no
    tool that can let users adapt the quality assessment process to their usage
    scenarios.
In this demonstration, we present a multi-purpose tool, KBMetrics, for the quali-
ty evaluation of Linked Open Data Sets. The tool can support the three purposes
mentioned above. We also apply corresponding evaluation processes to DBpedia
and YAGO.
2     Modules in KBMetrics
The relationship between functions in KBMetrics and the three purposes men-
tioned above is shown in Figure 1.
  – Understanding: The understanding purpose is transformed into the Metrics
    Calculation module. Users can Select Metrics, Calculate Metrics , Visualize
    Metrics Results, and Compare/Analyze results, as shown in Figure 1. The
    tool has 12 built-in metrics in five categories. The details of the metrics and
    the methods of calculation can be found via4 . The tool can not only sup-
3
    http://ldq.semanticmultimedia.org/cfp
4
    http://kbeval.nlp-bigdatalab.com/docs/doc.pdf
                   KBMetrics: A Multi-purpose Tool for Measuring Quality       3

   port machine-computable metrics such as data size, but it can also support
   human-evaluated metrics such as correctness. We store data in Jena and pose
   SparQL queries to get the answers of Machine-Computable metrics, and also
   design a process for human-evaluated metrics. The process includes sampling
   a sub-data set to lessen human efforts, assigning tasks to more than three
   evaluators to reduce individual subjective impacts, resolving inconsistencies
   between different evaluators, and calculating the result. Currently, the tool
   supports two sampling methods, random sampling and the Wilson Interval
   Score5 .
 – Comparing: If end users want to calculate metrics on overlapped instances or
   overlapped domains, the additional schema alignment and instance match-
   ing module is provided. Both schema alignment and instance matching be-
   long to the scope of ontology alignment, which has been studied for years.
   Moreover, the community of ontology alignment provides sufficient tools
   so that our module mainly provides interfaces to integrate the results of
   a third-party ontology alignment tool (i.e., PARIS). The results are repre-
   sented as triples with the predicates “owl:sameAs”, “owl:EquivalentClass”,
   or “owl:EquivalentProperty”.
 – Selecting: A pre-processed step to filter users’ requirements is supported by
   the Context Calculation module. Four steps are required to fulfill context
   calculation: a) Define the Context: Users can input their data requirements
   based on their usage scenarios with GUI interfaces. We support various types
   of contexts, e.g. the Domains Context (such as cities, and organizations), the
   Property Context (such as populations of cities), or the Property Constraint
   Context (such as the presidents of the USA or the 100 biggest cities in the
   world ). b) Context Transformation: The context definitions on the UI are
   translated into executable SparQL queries. The queries may be different in
   different data sets due to the vocabulary differences. c) Context Execution:
   The queries are executed on target data sets. d) Store Data: The results
   under contexts, namely the sub-data sets, are stored in Jena too. Users may
   perform metrics evaluation on the sub-data sets.

3     Demonstration
Our demonstration contains three typical scenarios. A recorded video of KBMet-
rics can be downloaded at http://kbeval.nlp-bigdatalab.com/iswc2015.
wmv, and the system can be accessed online as well6 .
    Evaluate A Single Data Set In this demonstration, we firstly select DB-
pedia as a target KB and metrics such as Data Size and Degree of Network. We
find that the 2014 version of DBpedia has 4,465,631 instances and 68,112,887
facts. We further select the Correctness metric and a GUI interface to let users
select sampling methods, and related parameters appear. After we choose the
default parameters, the system randomly selects 423 samples from DBpedia ac-
cording to sampling theory. Then we assign tasks to different evaluators. After
5
    http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
6
    http://kbeval.nlp-bigdatalab.com/v12/
4      Tong Ruan et al.


Fig. 2. Compare DBpedia with YAGO in Fig. 3. Context Definition and Execution
KBMetrics                            in KBMetrics
each evaluator successfully evaluates each data item, the system gives the ulti-
mate correctness ratio result, 0.81. However, we have no idea whether 0.81 is
good or bad, or 4,465,631 instances is large or small. Therefore we add YAGO
for comparison, and the version of YAGO is YAGO2s.
    Compare Multiple Data Sets Figure 2 shows that DBpedia is richer than
YAGO in data size. The number of instances in YAGO is half that in DBpedia,
and the number of facts in YAGO is about a tenth of that in DBpedia. Through
the results of the overlapped metrics we find that DBpedia almost contains all of
the instances in YAGO. The average number of facts each overlapped instance
has in YAGO is 3, which is about the same as that of the whole instances in
YAGO. However in DBpedia, the average number of overlapped instances is
slightly smaller than that of the whole instances. So the distribution of facts on
overlapped instances in DBpedia is different from that on whole instances. The
Degree of Network shows that the connections between DBpedia instances are
richer than those between YAGO instances. But the correctness of YAGO is 0.91
from our evaluation, and it is greater than that of DBpedia.
    Select Data Sets on User Context From the above, we may find DBpedia
to be richer than YAGO. However, it is not the case in special user contexts. For
example, we want to conduct a survey on Presidents of the United States having
more than two children. In DBpedia, we set the domain as “President”, and add
two constraints, including the “nationality” and the number of “children”. We
directly set the domain as “Presidents of the United States”, since YAGO has a
richer taxonomy system. After adding the number of “hasChild” constraints in
YAGO, 16 presidents returned as shown in Figure 3. By contrast, DBpedia has
2 presidents returned. The reason is that, although DBpedia contains all those
instances in YAGO, many of them do not belong to “President” type. Further-
more, DBpedia has many properties denoting the same relationship, for instance,
“country” and “nationality”, and it does not consolidate these relationships.