=Paper= {{Paper |id=Vol-2065/paper13 |storemode=property |title=KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis |pdfUrl=https://ceur-ws.org/Vol-2065/paper13.pdf |volume=Vol-2065 |authors=Mohammad Rashid,Giuseppe Rizzo,Nandana Mihindukulasooriya,Marco Torchiano,Oscar Corcho |dblpUrl=https://dblp.org/rec/conf/kcap/Rashid0MTC17 }} ==KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis== https://ceur-ws.org/Vol-2065/paper13.pdf
      KBQ - A Tool for Knowledge Base Quality Assessment Using
                          Evolution Analysis
               Mohammad Rashid                                 Giuseppe Rizzo                         Nandana Mihindukulasooriya
            Politecnico di Torino, Italy              Istituto Superiore Mario Boella, Italy           Universidad Politécnica de Madrid,
           mohammad.rashid@polito.it                          giuseppe.rizzo@ismb.it                                Spain
                                                                                                             nmihindu@fi.upm.es

                                          Marco Torchiano                              Oscar Corcho
                                      Politecnico di Torino, Italy         Universidad Politécnica de Madrid,
                                      marco.torchiano@polito.it                         Spain
                                                                                  ocorcho@fi.upm.es

ABSTRACT                                                                   given context [22]. Manual quality assessment and representation
Knowledge bases are becoming essential components for tasks                of large KBs is neither feasible nor sustainable [6]. On the other
that require automation with some degrees of intelligence. It is           hand, assessing continuously and automatically the quality of a
crucial to establish automatic and timely checks to ensure high-           knowledge base is a challenging task as data is derived from many
level quality of the knowledge base content (i.e., entities, types, and    autonomous, evolving, and increasingly large data providers.
relations). In this paper, we present KBQ, a tool that automates               Various tools have been developed for linked data quality assess-
the detection and report generation of quality issues for evolving         ment based on manual, semi-automatic, and automated approaches.
knowledge bases. KBQ analyzes the evolution of a KB by measuring           For example, TripleCheckMate3 is a crowdsourced quality assess-
the frequency of change, the change pattern, the change impact and         ment tool focusing on the correctness of the DBpedia resources.
the causes of changes of resources and properties. Data collection         RDFUnit [9] is a tool centered around the definition of data quality
and profiling tasks are performed using Loupe, an online tool for          integrity constraints. Flemming’s [4] data quality assessment tool
linked data profiling. We describe KBQ in action on two different          calculates data quality scores based on manual user input for data
use cases, and we report the benefits that it introduced. KBQ is           sources. Debattista et al. describe a conceptual methodology for
published as open source project, and a demo is available at http:         assessing Linked Datasets, proposing Luzzu [1], a framework for
//datascience.ismb.it/shiny/KBQ/.                                          Linked Data Quality Assessment. Although these tools guarantee
                                                                           an appropriate data quality assessment, less focus has given towards
CCS CONCEPTS                                                               evolution aspects of a KB. In particular, these tools did not consid-
                                                                           ered the impact of KB evolution such as capture the changes that
• Information systems → Data cleaning; • Computing method-
                                                                           indicates an abnormal situation or changes that the curator wants
ologies → Knowledge representation and reasoning;
                                                                           to highlight because they are useful for a specific domain [15].
KEYWORDS                                                                       One of the common preliminary task for data quality assessment
                                                                           is to perform a detailed data analysis. Data profiling is one of the
Knowledge Base, Linked Data, Quality Assessment, Quality Issues,           most widely used techniques for data analysis [16]. Data profiling
Evolution Analysis                                                         is defined as the process of examining data to collect statistics
                                                                           and provide relevant metadata about the data [14]. Based on data
1     INTRODUCTION                                                         profiling we can thoroughly examine and understand each KB, its
In the recent year’s much efforts have been given towards shar-            structure, and its properties before usage. Evolution analysis using
ing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud1 .            dynamic feature help to understand the changes applied to an entire
Popular knowledge bases such as DBpedia, YAGO2, and Wikidata               KB or parts of it. In general, the dynamic feature of a dataset gives
have chosen the RDF data model2 to represent their data due to its         insights into how it behaves and evolves over a certain period [15].
capabilities for semantically rich knowledge representation. RDF           Ellefi et al. [2] explored the dynamic features for data profiling
KBs are evolving since both data instances, and schemes are up-            considering the use cases presented by Käfer et al. [8]. They present
dated, extended, revised and refactored covering more and more             dynamic features in multiple dimension regarding the KB update
topical domains [8]. In particular, entities evolve given that new         behavior, such as frequency of change, changes pattern, changes
data is added, old is removed, and links to entities are updated           impact and causes of change.
or deleted. Within this context, data quality for evolving KBs re-             In this paper, we present KBQ a tool for KB quality assessment
mains a critical aspect to obtain trust by the users. Data quality,        using evolution analysis. One of the core ideas in this work is to
in general, relates to the perception of the “fitness for use” in a        use dynamic features from data profiling results for analyzing the
1 http://lod-cloud.net                                                     KB evolution. Our quality assessment approach based on two main
2 https://www.w3.org/RDF/
                                                                           areas: (1) evolution of resources and (2) impact of the unwanted
K-CAP2017 Workshops and Tutorials Proceedings, 2017
                                                                           3 http://aksw.org/Projects/TripleCheckMate.html
©Copyright held by the owner/author(s).
K-CAP2017 Workshops and Tutorials Proceedings, 2017                                                                               Mohammad et al.


removal of resources in a KB. In particular, based on the detected             version may impact the stability of the KB. Persistency character-
changes between various releases, we aim to analyze and validate               istics help to understand stability feature. Ellefi et al. [2] present
quality issues in the KBs.                                                     stability feature as an aggregation measure of the dataset dynamics.
   ISO/IEC 25012 [7] standard defines data quality as the degree               It helps to understand to what extent the performed update impacts
to which a set of characteristics of data fulfills requirements. Data          the overall state of the knowledge base. In particular, it provides
quality issues are the specific problem instances that we can find             insights into whether there are any missing resources in the last
issues based on quality characteristics and prevent data from being            KB release.
regarded as high-quality [10]. More specifically, quality character-               Quality Indicator: It is a class specific measure and measurement
istics are abstract definition indicating quality issues. In this work         function based on the entity count difference between two KB
we explored two main quality issues, namely lack of persistency and            releases. We compute the persistency measure value of 0 if the
lack of completeness:                                                          entity count of the last version is lower than the previous version
   Lack of Persistency relates to resources that were present in a             otherwise 1. The value of 1 implies no persistency issue present
previous KB release, but then they disappeared. In particular, look            in the class. The value of 0 implies persistency issues found in the
into the problem due to unexpected removal of information.                     class.
   Lack of Completeness refers to the problem due to incom-
plete resources present in a knowledge base; this happens due to
systematic errors in data extraction and integration processes.
                                                                               2.2    Historical Persistency
   In KBQ, quality assessment is performed by four quality charac-             Historical persistency is a derived measure based on persistency
teristics, such as persistency, historical persistency, completeness           characteristics. It measures the lifespan of an entity type. Ellefi et
and KB growth. We use basic statistics (i.e., counts, and diffs) of            al. [2] present lifespan feature based on the degree of changes. The
entities, types, and relations over the extracted triples from various         degree of changes capture the impact of changes observed on an
releases for measurement function.                                             entire dataset or parts of it. Also, lifespan represents the period
   KBQ builds upon the data collection and profiling functionalities           when a certain entity is available. In particular, this value gives an
of Loupe [13], an online system that inspects and extracts automat-            overview of persistency issues present in an entity type over all
ically statistics about the entities, vocabularies used (classes, and          releases. It helps data curators to decide which knowledge base
properties), and frequent triple patterns of a KB. We created a set of         release can be used for future data management tasks.
APIs4 for periodic snapshots generation and maintaining scheduled                  Quality Indicator: The Historical Persistency measure evaluates
tasks for automatic and timely quality assessment. In this paper, we           the persistency over the history of the KB and is computed as the
describe KBQ in action with lode:Event5 entity in the 3cixty [23]              average of the persistency measures for all releases. High percent-
KB and dbo:Place6 entity in the DBpedia [11] KB, reporting the                 age implies an estimation of fewer issues and lower percentage
benefits introduced to the corresponding projects.                             entails more issues present in KB releases.

2     EVOLUTION-BASED QUALITY                                                  2.3    Completeness
      CHARACTERISTICS                                                          This measure focuses on the removal of information as a negative
Data quality is a cross-disciplinary and multidimensional concept.             effect of the KB evolution. Zaveri et al. [25] refer to completeness
According to Pipino et al. [20], based on context, quality can be              as the degree to which all required information is present in a par-
both subjective perceptions and objective measurements. Quality                ticular dataset. They present completeness characteristics based on
measurement function are based on dynamic features from data                   the following four aspects: i) Schema completeness, the degree to
profiling results. The quality indicators are weighted values, which           which the classes and properties of an ontology are represented,
give the freedom to define multiple degrees of importance [4]. In our          thus can be called “ontology completeness”; ii) Property complete-
approach, the quality indicators are based on the changes present              ness, measure of the missing values for a specific property, iii)
at the statistical level in terms of variation of absolute and relative        Population completeness is the percentage of all real-world objects
frequency count of entities and predicates between pairs of KB                 of a particular type that are represented in the datasets, and iv)
release. We formalized each quality indicator values in the range              Interlinking completeness, which has to be considered especially in
[0, 1]. We considered four quality characteristics for quality assess-         Linked Data, refers to the degree to which instances in the dataset
ment tasks, namely Persistency, Historical Persistency, Completeness           are interlinked. We considered the aspects of property completeness
and KB growth.                                                                 for KB evolution.
                                                                                  Quality Indicator: The basic measure we use is the difference
2.1     Persistency                                                            between the frequency of properties for a class between two KB
Knowledge Bases contain information about different real-world ob-             releases. In particular, if the instance count of properties present in
jects or concepts commonly referred as entities. In general, quality           the class has negative count compare to the previous release then
issues regarding unexpected removal of information from current                we assume there is a completeness issue. We assign value of 1 if
                                                                               no completeness issues are present while value of 0 entails none
                                                                               completeness issue is present. Also at the class level, we compute the
4 The src code is available at https://github.com/rifat963/KBDataObservatory
5 http://linkedevents.org/ontology/Event                                       percentage of completeness based on the number of completeness
6 http://dbpedia.org/ontology/Place                                            issue divided by total properties.
KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis K-CAP2017 Workshops and Tutorials Proceedings, 2017


2.4    KB growth                                                                                  Table 1: Quality Indicators
In this measure, we explore the aspect of KB growth by measuring
the growth level of KB resources (instances) over the different                Quality         Quality Indicators             Interpretation
                                                                               Characteristics
releases. Ellefi et al. [2] present growth rate feature as the level of
                                                                               Persistency     Persistency measure values The value of 1 implies no per-
growth of a dataset in terms of data instances. In particular, KB                              [0,1]                      sistency issue present in the
growth explores the change patterns of a knowledge base. Change                                                           class. The value of 0 indicates
patterns help to understand the existence and kinds of categories                                                         persistency issues found in the
                                                                                                                          class.
of updates or change behavior. It can help to understand changes
                                                                               Historical      Percentage (%) of historical High % presents an estimation
present in the KB has upward or downward trend. We assume that                 Persistency     persistency                  of fewer issues, and lower %
if the schema remain consistent then downward trend at the last                                                             entail more issues present in
release may indicate a potential problem in the data extraction                                                             KB releases.
process.                                                                       Completeness    List of properties with com- The value of 1 implies no
    Quality Indicator: We use a simple linear regression model to                              pleteness measures weighted completeness issue present in
                                                                                               value [0,1]                  the property. The value of 0
predict the KB growth level of resources. It is a class specific measure                                                    indicates completeness issues
and measurement function based on the entity count from all the KB                                                          found in the property.
releases. Using the difference between the observed and predicted                              Percentage (%) of complete- High % presents an estimation
entity count values at tbe last KB release, we can detect the trend in                         ness                          of fewer issues, and lower % en-
                                                                                                                             tail more issues in KB release.
the KB growth level. We evaluated the normalized distance based                KB growth       KB growth measure value [0,1] The value of 1 implies no unex-
on the entity type residual value divided by mean residual value. We                                                         pected growth present in the
                                                                                                                             class. The value of 0 indicated
used normalize distance between observed and predicated entity                                                               that unexpected growth may
count value to measure KB growth. In particular, if the normalized                                                           happen in the current version
distance is greater than 1 then the KB may have unexpected growth                                                            of the class.
with unwanted entities otherwise KB remains stable.

3     ARCHITECTURE OVERVIEW                                                issues for validation. (ii) Instances: quality profiling is done based on
KBQ is composed of four modules that are illustrated in Fig. 1. We         summary statistics. To extract the missing instances of a property,
implemented KBQ using the R statistical package that we share as           the instance extraction component performs comparison between
open source in order to foster reproducibility of the experiments7 .       the list of instances from the last two versions. (iii) Inspections:
The modules are explained in detail below.                                 after the instance extraction is done, a user can select every instance
   Collect: generates knowledge base (KB) snapshots and sets up            for inspection and report. We present instance inspection based on
timely schedulers. It supports (i) collection of KB summary statistics     data sources. In particular, validation is performed by inspecting
via a dedicated SPARQL endpoint; this component is built on top            the missing instances and manually evaluate cause of quality issues
of Loupe [13]; (ii) collection of periodic KB snapshots that are           through data source inspections. (iv) Report: a user can report if
accessible through a SPARQL endpoint saved in a CSV files. We              the instance is true positive (the subject presents an issue, and an
named each CSV file based on the entity type. In particular, we            actual problem was detected) or false positive (the item presents
used SPARQL endpoint as an input and save the results extracted            a possible issue, but none actual problem is found), as well as a
from the SPARQL endpoints into CSV files.                                  user can comment on specific issues. Finally, a user can save the
   Analyze: performs quality profiling based on a particular entity        validation report in a HTML file.
type and generates quality problem report. We build an intermedi-
ary data structure by grouping sets of resources and predicates for        4      USE CASES: KBQ IN ACTION
a entity type based on KB releases to speed up the execution of the        We present KBQ in action for 3cixty KB [23] and Spanish DBpedia
measurement functions. We use the values of quality measures as            KB [11]. We selected these two KBs according to: i) popularity and
indicators for the quality issues. In Table 1, we present the quality      representativeness in their domain: DBpedia for the encyclopedic
indicators used in our tool. This module allows saving the analyses        domain, 3cixty for the tourist and cultural domain; ii) heterogeneity
to an HTML file.                                                           in terms of content being hosted, iii) diversity in the update strategy:
   Visualize: is composed of two modules: (i) list of quality assess-      incremental and usually as batch for DBpedia, continuous update
ment results and (ii) data set catalogue. Visualization of quality         for 3cixty. A recorded video of KBQ in action for these two use
assessment results are embedded with analysis module based on              cases is available at https://youtu.be/F02l7ImOZV8.
four quality characteristics. This allows any user to access quality
measures by selecting a specific characteristics. It also allows class     4.1      3cixty KB quality assessment
faceted exploration along the various KB releases.
                                                                           3cixty KB is continuously changing with frequent updates (daily up-
   Validate: extracts, inspects and allows manual annotations of
                                                                           dates). We target lode:Event class for quality profiling. Using KBQ
quality issues. A user can extract properties with quality issues
                                                                           we manually collected 9 snapshots from 2016-03-11 to 2016-09-09.
after performing a quality profiling that consists of: (i) Incomplete
                                                                           In addition, we collected daily snapshots starting from 2017-07-19
properties: visualize a list of properties with completeness quality
                                                                           till 2017-09-27 using the scheduler. Overall, this results: (1) For
7 https://github.com/KBQ/KBQ                                               both manually saved snapshots and scheduler generated ones, the
K-CAP2017 Workshops and Tutorials Proceedings, 2017                                                                                             Mohammad et al.




                                                    Figure 1: High level architecture of the KBQ tool.


Persistency measure value of 1 indicates no missing entities in the                  4.3     Discussion
last version of lode:Event class. (2) Historical persistency value                   We identified a set of properties with quality issues using evolution
of 87.5% for manually saved snapshots estimates little variation                     based quality characteristics from the Spanish DBpedia KB and
presents over all releases. Persistency issues are present only be-                  3cixty KB. Furthermore, we evaluate the results from quality analy-
tween release 2016-06-16 and 2016-09-09. Also using the scheduler                    sis using manual validation approach. We performed the manual val-
measure value of 85.7% estimates small variation presents where                      idation based on the detected missing properties from dbo:Place
persistency issue is only present between 2017-07-22 and 2017-07-                    class. For a selected property validation module collect all instances
23. (3) Completeness measures for manually saved snapshots on last                   presents in the last two releases. For example, we selected the prop-
releases of 2017-07-19 detected two properties with quality issues.                  erty dbo:prefijoTelefóicoNombre 10 to be manually validated.
(4) KB growth monitors the dynamics of knowledge base changes.                       We used KBQ to collect all the instances (56109,55387) from the two
For manually saved snapshots of lode:Event, the value of 0 indi-                     releases (201604,201610). Validation module performed a set dis-
cates higher growth than expected on the last release. Furthermore                   jointed operations between two triple sets to identify those triples
we validated the quality using validation module.                                    missing from the 201610 release. From the set disjoint operation
   In Figure 2 we present the persistency quality assessment results                 we found total 1982 instances missing from 201610 version. In or-
for lode:Event class. Finally, we save quality profiling results in a                der to inspect the missing instances we randomly select a subset
HTML file (example of a generated report8 ).                                         of 200 instances for evaluation. From the manual evaluation, we
                                                                                     identify dbr:Morante11 , which is available in the 201604 release.
4.2     Spanish DBpedia quality assessment                                           However, it is not found in 201610 release of DBpedia. In gen-
Spanish DBpedia has less frequent updates (monthly or yearly up-                     eral, these instances are auto-generated from Wikipedia Infobox
dates). We target dbo:place class for quality profiling. We collected                keys. To further validate them we track the Wikipedia page from
summary statistics of 11 different releases for Spanish DBpedia. The                 which statement was extracted in the DBpedia KB. We checked the
quality profiling results of dbo:place class: (1) Persistency value                  source Wikipedia page using foaf:primaryTopic about Morante12 .
of 1 indicates no missing entities in the last version. (2) Historical               In the Wikipedia page prefijo TelefónicoNombre is present in the
persistency has 100% indicating consistent growth across releases.                   Wikipedia infobox Key. In the Spanish DBpedia from 201604 ver-
(3) Completeness: in version 201610 of DBpedia we detected 9 prop-                   sion to 201610 version update, this data instance has been removed
erties with quality issues. (4) KB growth of dbo:place is equal to 0                 from the property dbo:prefijoTelefóicoNombre. Therefore, these in-
indicating higher growth (over the expected) on the last release. In                 stances are present in the Wikipedia Infobox as Keys but missing
Figure 3 we present the persistency quality assessment results for                   in the DBpedia 201610 release. This example shows a validity of
dbo:place class. Finally, we save quality profiling results in a HTML                the completeness issue presents in the 201610 release of DBpedia
file (example of a generated report9 ).                                              for property dbo:prefijoTelefóicoNombre.
8 http://datascience.ismb.it/shiny/2017-07-21-QualityProblemReport.html              10 http://es.dbpedia.org/property/prefijoTelefóicoNombre
9 http://datascience.ismb.it/shiny/2017-07-21--http---dbpedia.org-ontology-Place-.   11 http://es.dbpedia.org/page/Morante

html                                                                                 12 https://es.wikipedia.org/wiki/Morante
KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis K-CAP2017 Workshops and Tutorials Proceedings, 2017




                                                     Figure 2: 3cixty lode:Event class.




                                                    Figure 3: DBpedia dbo:place class.


5   RELATED WORK                                                         versions), as well as queries involving both changes in schema and
The research activities related to our approach fall into two main       instance. Zabilith et al. [24] ontology conducted an extensive work
research areas: (i) Change Detection in Linked Datasets and (ii)         at the ontology level detection, representation, and management of
Linked Open Data Quality Assessment.                                     the changes. Pernelle et al.[19] present an approach which allows
    Change Detection in Linked Data: There are various features          to detect and represent elementary and complex changes that can
of dataset dynamics which must be considered to achieve a compre-        be detected only on the data level.
hensive overview of how linked data changes evolve on the Web [8].          Linked Data Quality Assessment: Regarding the automated
Issues in curated RDF(S) have been addressed by Papavasileiou et         LOD quality assessment, Fleischhacker et al. [3] proposed a two-fold
al. [17]. They introduce a high-level language of changes and its        approach that relies on unsupervised outlier detection to identify
formal detection and application semantics, as well as a correspond-     numerical errors in objects of RDF triples. A probabilistic frame-
ing change detection algorithm, which satisfies these needs for          work presented by Li et al. [12] that predicts arithmetic relations
RDF(S) KBs. Ellefi et al. [2] present a comprehensive overview of        (equal, greater than, less than) among multiple RDF predicates to
the RDF dataset profiling feature, methods, tools, and vocabularies.     detect inconsistencies in numerical and date values. Based on the
They present dataset profiling in a taxonomy and illustrate the links    statistical distribution of predicates and objects in RDF datasets
between the dataset profiling and feature extraction approaches.         Paulheim et al.[18] presented two algorithms SDType and SDVal-
Recently, Yannis et al. [21] proposed a framework that detected          idate. SDType predicts classes of RDF resource thus completing
changes between versions. It enables easy and efficient navigation       missing values of rdf:type properties. SDValidate detects incorrect
among versions, automated processing, and analysis of changes.           links between resources within a dataset. The framework SWIQA
They also include cross-snapshot queries (spanning across different
K-CAP2017 Workshops and Tutorials Proceedings, 2017                                                                                                       Mohammad et al.


proposed by Furber and Hepp [5] can be applied for detecting accu-                         [4] Annika Flemming. 2010. Quality characteristics of linked data publishing data-
racy quality issues including incorrect object values, datatypes, and                          sources. Master’s thesis, Humboldt-Universität of Berlin (2010).
                                                                                           [5] Christian Fürber and Martin Hepp. 2011. Swiqa-a semantic web information
literals. These solutions are tailored to detect very specific errors                          quality assessment framework.. In ECIS, Vol. 15. 19.
in RDF triples. However, in the current state of the art, less focus                       [6] Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann. 2012. Assessing
                                                                                               linked data mappings using network measures. The Semantic Web: Research and
has been given toward understanding knowledge base resource                                    Applications (2012), 87–102.
changes over time to detect anomalies over various releases.                               [7] ISO/IEC. 2008. 25012:2008 – Software engineering – Software product Quality
                                                                                               Requirements and Evaluation (SQuaRE) – Data quality model. Technical Report.
                                                                                               ISO/IEC. http://iso25000.com/index.php/en/iso-25000-standards/iso-25012
6    LIMITATIONS                                                                           [8] Tobias Káfer, Ahmed Abdelrahman, Júrgen Umbrich, Patrick OâĂŹByrne, and
We have identified the following two limitations, such as:                                     Aidan Hogan. 2013. Observing linked data dynamics. In Extended Semantic Web
                                                                                               Conference. Springer, 213–227.
   First, in this tool we detect changes between two KB releases                           [9] Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens
only based on summary statistics. In particular, we applied coarse-                            Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation
                                                                                               of linked data quality. In Proceedings of the 23rd international conference on World
grained analysis to capture any quality issues for evolving KB.                                Wide Web. ACM, 747–758.
Although coarse-grained analysis cannot capture all possible quality                      [10] Nuno Laranjeiro, Seyma Nur Soydemir, and Jorge Bernardino. 2015. A survey on
issues, it helps to identify common quality issues such as systematic                          data quality: classifying poor data. In Dependable Computing (PRDC), 2015 IEEE
                                                                                               21st Pacific Rim International Symposium on. IEEE, 179–188.
errors in data extraction and integration processes.                                      [11] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
   Second, in KBQ we introduce a manual validation module where                                Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören
we aim to keep track of the detected quality issues by using true or                           Auer, et al. 2015. DBpedia–a large-scale, multilingual knowledge base extracted
                                                                                               from Wikipedia. Semantic Web 6, 2 (2015), 167–195.
false annotations. We aim to use this manually annotated result as                        [12] Huiying Li, Yuanyuan Li, Feifei Xu, and Xinyu Zhong. 2015. Probabilistic error
a gold standard for future quality assessment tasks. Furthermore,                              detecting in numerical linked data. In International Conference on Database and
                                                                                               Expert Systems Applications. Springer, 61–75.
we envision that an automatic schema validation using integrity                           [13] Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, and
constraints could be helpful for the validation process.                                       Asunción Gómez-Pérez. 2015. Loupe-An Online Tool for Inspecting Datasets
                                                                                               in the Linked Data Cloud.. In International Semantic Web Conference (Posters &
                                                                                               Demos).
7    CONCLUSIONS AND FUTURE WORK                                                          [14] Felix Naumann. 2014. Data Profiling Revisited. SIGMOD Rec. 42, 4 (Feb. 2014),
                                                                                               40–49. https://doi.org/10.1145/2590989.2590995
The main motivations for the work presented in this paper is rooted                       [15] Chifumi Nishioka and Ansgar Scherp. 2016. Information-theoretic Analysis of
in the concepts of Linked data dynamics13 on the one side and                                  Entity Dynamics on the Linked Open Data Cloud. In PROFILES@ ESWC.
knowledge base quality on the other side. The focus of this work is                       [16] Jack E. Olson. 2003. Data Quality: The Accuracy Dimension (1st ed.). Morgan
                                                                                               Kaufmann Publishers Inc., San Francisco, CA, USA.
to automate the timely process of quality issue detection without                         [17] Vicky Papavasileiou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, and
user intervention based on evolution analyses. More specifically,                              Vassilis Christophides. 2013. High-level change detection in RDF (S) KBs. ACM
we explored the idea of monitoring KB changes as the premise of                                Transactions on Database Systems (TODS) 38, 1 (2013), 1.
                                                                                          [18] Heiko Paulheim and Christian Bizer. 2014. Improving the quality of linked
this work. We design and develop KBQ tool for Knowledge Base                                   data using statistical distributions. International Journal on Semantic Web and
quality assessment using evolution analysis. We present four qual-                             Information Systems (IJSWIS) 10, 2 (2014), 63–86.
                                                                                          [19] Nathalie Pernelle, Fatiha Saïs, Daniel Mercier, and Sujeeban Thuraisamy. 2016.
ity evolution based quality characteristics persistency, historical                            RDF data evolution: efficient detection and semantic representation of changes.
persistency, completeness and KB growth. KBQ is also knowledge                                 In Semantic Systems-SEMANTiCS2016. 4–pages.
base agnostic and we demonstrated its usage for two different use                         [20] Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. 2002. Data Quality Assessment.
                                                                                               Commun. ACM 45, 4 (April 2002), 211–218. https://doi.org/10.1145/505248.506010
cases, namely 3cixty and Spanish DBpedia. In particular, in this                          [21] Yannis Roussakis, Ioannis Chrysakis, Kostas Stefanidis, Giorgos Flouris, and
work we explored the benefits of aggregated measures using quality                             Yannis Stavrakas. 2015. A flexible framework for understanding the dynamics
profiling.                                                                                     of evolving RDF datasets. In International Semantic Web Conference. Springer,
                                                                                               495–512.
   As future work, we plan to add automatic error annotations of the                      [22] Giri Kumar Tayi and Donald P Ballou. 1998. Examining data quality. Commun.
properties with quality issues. We also plan to extend our validation                          ACM 41, 2 (1998), 54–57.
                                                                                          [23] Raphael Troncy, Giuseppe Rizzo, Anthony Jameson, Oscar Corcho, Julien Plu,
approach for automatic snapshots generation and publishing in a                                Enrico Palumbo, Juan Carlos Ballesteros Hermida, Adrian Spirescu, Kai-Dominik
triple format.                                                                                 Kuhn, Catalin Barbu, et al. 2017. 3cixty: Building comprehensive knowledge
                                                                                               bases for city exploration. Web Semantics: Science, Services and Agents on the
                                                                                               World Wide Web (2017).
ACKNOWLEDGMENTS                                                                           [24] Fouad Zablith, Grigoris Antoniou, Mathieu d’Aquin, Giorgos Flouris, Haridimos
This work was partially funded by the Spanish government with                                  Kondylakis, Enrico Motta, Dimitris Plexousakis, and Marta Sabou. 2015. Ontology
                                                                                               evolution: a process-centric survey. The knowledge engineering review 30, 1 (2015),
the BES-2014-068449 grant and Datos 4.0 project.                                               45–75.
                                                                                          [25] Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann,
                                                                                               and Sören Auer. 2016. Quality assessment for linked data: A survey. Semantic
REFERENCES                                                                                     Web 7, 1 (2016), 63–93.
 [1] Jeremy Debattista, SÓren Auer, and Christoph Lange. 2016. LuzzuA Methodol-
     ogy and Framework for Linked Data Quality Assessment. Journal of Data and
     Information Quality (JDIQ) 8, 1 (2016), 4.
 [2] Mohamed Ben Ellefi, Zohra Bellahsene, J Breslin, Elena Demidova, Stefan Dietze,
     Julian Szymanski, and Konstantin Todorov. 2017. Rdf dataset profiling-a survey
     of features, methods, vocabularies and applications. Semantic Web (2017).
 [3] Daniel Fleischhacker, Heiko Paulheim, Volha Bryl, Johanna Völker, and Christian
     Bizer. 2014. Detecting errors in numerical linked data using cross-checked outlier
     detection. In International Semantic Web Conference. Springer, 357–372.

13 https://www.w3.org/wiki/DatasetDynamics