=Paper=
{{Paper
|id=Vol-2065/paper13
|storemode=property
|title=KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis
|pdfUrl=https://ceur-ws.org/Vol-2065/paper13.pdf
|volume=Vol-2065
|authors=Mohammad Rashid,Giuseppe Rizzo,Nandana Mihindukulasooriya,Marco Torchiano,Oscar Corcho
|dblpUrl=https://dblp.org/rec/conf/kcap/Rashid0MTC17
}}
==KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis==
KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis Mohammad Rashid Giuseppe Rizzo Nandana Mihindukulasooriya Politecnico di Torino, Italy Istituto Superiore Mario Boella, Italy Universidad Politécnica de Madrid, mohammad.rashid@polito.it giuseppe.rizzo@ismb.it Spain nmihindu@fi.upm.es Marco Torchiano Oscar Corcho Politecnico di Torino, Italy Universidad Politécnica de Madrid, marco.torchiano@polito.it Spain ocorcho@fi.upm.es ABSTRACT given context [22]. Manual quality assessment and representation Knowledge bases are becoming essential components for tasks of large KBs is neither feasible nor sustainable [6]. On the other that require automation with some degrees of intelligence. It is hand, assessing continuously and automatically the quality of a crucial to establish automatic and timely checks to ensure high- knowledge base is a challenging task as data is derived from many level quality of the knowledge base content (i.e., entities, types, and autonomous, evolving, and increasingly large data providers. relations). In this paper, we present KBQ, a tool that automates Various tools have been developed for linked data quality assess- the detection and report generation of quality issues for evolving ment based on manual, semi-automatic, and automated approaches. knowledge bases. KBQ analyzes the evolution of a KB by measuring For example, TripleCheckMate3 is a crowdsourced quality assess- the frequency of change, the change pattern, the change impact and ment tool focusing on the correctness of the DBpedia resources. the causes of changes of resources and properties. Data collection RDFUnit [9] is a tool centered around the definition of data quality and profiling tasks are performed using Loupe, an online tool for integrity constraints. Flemming’s [4] data quality assessment tool linked data profiling. We describe KBQ in action on two different calculates data quality scores based on manual user input for data use cases, and we report the benefits that it introduced. KBQ is sources. Debattista et al. describe a conceptual methodology for published as open source project, and a demo is available at http: assessing Linked Datasets, proposing Luzzu [1], a framework for //datascience.ismb.it/shiny/KBQ/. Linked Data Quality Assessment. Although these tools guarantee an appropriate data quality assessment, less focus has given towards CCS CONCEPTS evolution aspects of a KB. In particular, these tools did not consid- ered the impact of KB evolution such as capture the changes that • Information systems → Data cleaning; • Computing method- indicates an abnormal situation or changes that the curator wants ologies → Knowledge representation and reasoning; to highlight because they are useful for a specific domain [15]. KEYWORDS One of the common preliminary task for data quality assessment is to perform a detailed data analysis. Data profiling is one of the Knowledge Base, Linked Data, Quality Assessment, Quality Issues, most widely used techniques for data analysis [16]. Data profiling Evolution Analysis is defined as the process of examining data to collect statistics and provide relevant metadata about the data [14]. Based on data 1 INTRODUCTION profiling we can thoroughly examine and understand each KB, its In the recent year’s much efforts have been given towards shar- structure, and its properties before usage. Evolution analysis using ing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud1 . dynamic feature help to understand the changes applied to an entire Popular knowledge bases such as DBpedia, YAGO2, and Wikidata KB or parts of it. In general, the dynamic feature of a dataset gives have chosen the RDF data model2 to represent their data due to its insights into how it behaves and evolves over a certain period [15]. capabilities for semantically rich knowledge representation. RDF Ellefi et al. [2] explored the dynamic features for data profiling KBs are evolving since both data instances, and schemes are up- considering the use cases presented by Käfer et al. [8]. They present dated, extended, revised and refactored covering more and more dynamic features in multiple dimension regarding the KB update topical domains [8]. In particular, entities evolve given that new behavior, such as frequency of change, changes pattern, changes data is added, old is removed, and links to entities are updated impact and causes of change. or deleted. Within this context, data quality for evolving KBs re- In this paper, we present KBQ a tool for KB quality assessment mains a critical aspect to obtain trust by the users. Data quality, using evolution analysis. One of the core ideas in this work is to in general, relates to the perception of the “fitness for use” in a use dynamic features from data profiling results for analyzing the 1 http://lod-cloud.net KB evolution. Our quality assessment approach based on two main 2 https://www.w3.org/RDF/ areas: (1) evolution of resources and (2) impact of the unwanted K-CAP2017 Workshops and Tutorials Proceedings, 2017 3 http://aksw.org/Projects/TripleCheckMate.html ©Copyright held by the owner/author(s). K-CAP2017 Workshops and Tutorials Proceedings, 2017 Mohammad et al. removal of resources in a KB. In particular, based on the detected version may impact the stability of the KB. Persistency character- changes between various releases, we aim to analyze and validate istics help to understand stability feature. Ellefi et al. [2] present quality issues in the KBs. stability feature as an aggregation measure of the dataset dynamics. ISO/IEC 25012 [7] standard defines data quality as the degree It helps to understand to what extent the performed update impacts to which a set of characteristics of data fulfills requirements. Data the overall state of the knowledge base. In particular, it provides quality issues are the specific problem instances that we can find insights into whether there are any missing resources in the last issues based on quality characteristics and prevent data from being KB release. regarded as high-quality [10]. More specifically, quality character- Quality Indicator: It is a class specific measure and measurement istics are abstract definition indicating quality issues. In this work function based on the entity count difference between two KB we explored two main quality issues, namely lack of persistency and releases. We compute the persistency measure value of 0 if the lack of completeness: entity count of the last version is lower than the previous version Lack of Persistency relates to resources that were present in a otherwise 1. The value of 1 implies no persistency issue present previous KB release, but then they disappeared. In particular, look in the class. The value of 0 implies persistency issues found in the into the problem due to unexpected removal of information. class. Lack of Completeness refers to the problem due to incom- plete resources present in a knowledge base; this happens due to systematic errors in data extraction and integration processes. 2.2 Historical Persistency In KBQ, quality assessment is performed by four quality charac- Historical persistency is a derived measure based on persistency teristics, such as persistency, historical persistency, completeness characteristics. It measures the lifespan of an entity type. Ellefi et and KB growth. We use basic statistics (i.e., counts, and diffs) of al. [2] present lifespan feature based on the degree of changes. The entities, types, and relations over the extracted triples from various degree of changes capture the impact of changes observed on an releases for measurement function. entire dataset or parts of it. Also, lifespan represents the period KBQ builds upon the data collection and profiling functionalities when a certain entity is available. In particular, this value gives an of Loupe [13], an online system that inspects and extracts automat- overview of persistency issues present in an entity type over all ically statistics about the entities, vocabularies used (classes, and releases. It helps data curators to decide which knowledge base properties), and frequent triple patterns of a KB. We created a set of release can be used for future data management tasks. APIs4 for periodic snapshots generation and maintaining scheduled Quality Indicator: The Historical Persistency measure evaluates tasks for automatic and timely quality assessment. In this paper, we the persistency over the history of the KB and is computed as the describe KBQ in action with lode:Event5 entity in the 3cixty [23] average of the persistency measures for all releases. High percent- KB and dbo:Place6 entity in the DBpedia [11] KB, reporting the age implies an estimation of fewer issues and lower percentage benefits introduced to the corresponding projects. entails more issues present in KB releases. 2 EVOLUTION-BASED QUALITY 2.3 Completeness CHARACTERISTICS This measure focuses on the removal of information as a negative Data quality is a cross-disciplinary and multidimensional concept. effect of the KB evolution. Zaveri et al. [25] refer to completeness According to Pipino et al. [20], based on context, quality can be as the degree to which all required information is present in a par- both subjective perceptions and objective measurements. Quality ticular dataset. They present completeness characteristics based on measurement function are based on dynamic features from data the following four aspects: i) Schema completeness, the degree to profiling results. The quality indicators are weighted values, which which the classes and properties of an ontology are represented, give the freedom to define multiple degrees of importance [4]. In our thus can be called “ontology completeness”; ii) Property complete- approach, the quality indicators are based on the changes present ness, measure of the missing values for a specific property, iii) at the statistical level in terms of variation of absolute and relative Population completeness is the percentage of all real-world objects frequency count of entities and predicates between pairs of KB of a particular type that are represented in the datasets, and iv) release. We formalized each quality indicator values in the range Interlinking completeness, which has to be considered especially in [0, 1]. We considered four quality characteristics for quality assess- Linked Data, refers to the degree to which instances in the dataset ment tasks, namely Persistency, Historical Persistency, Completeness are interlinked. We considered the aspects of property completeness and KB growth. for KB evolution. Quality Indicator: The basic measure we use is the difference 2.1 Persistency between the frequency of properties for a class between two KB Knowledge Bases contain information about different real-world ob- releases. In particular, if the instance count of properties present in jects or concepts commonly referred as entities. In general, quality the class has negative count compare to the previous release then issues regarding unexpected removal of information from current we assume there is a completeness issue. We assign value of 1 if no completeness issues are present while value of 0 entails none completeness issue is present. Also at the class level, we compute the 4 The src code is available at https://github.com/rifat963/KBDataObservatory 5 http://linkedevents.org/ontology/Event percentage of completeness based on the number of completeness 6 http://dbpedia.org/ontology/Place issue divided by total properties. KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis K-CAP2017 Workshops and Tutorials Proceedings, 2017 2.4 KB growth Table 1: Quality Indicators In this measure, we explore the aspect of KB growth by measuring the growth level of KB resources (instances) over the different Quality Quality Indicators Interpretation Characteristics releases. Ellefi et al. [2] present growth rate feature as the level of Persistency Persistency measure values The value of 1 implies no per- growth of a dataset in terms of data instances. In particular, KB [0,1] sistency issue present in the growth explores the change patterns of a knowledge base. Change class. The value of 0 indicates patterns help to understand the existence and kinds of categories persistency issues found in the class. of updates or change behavior. It can help to understand changes Historical Percentage (%) of historical High % presents an estimation present in the KB has upward or downward trend. We assume that Persistency persistency of fewer issues, and lower % if the schema remain consistent then downward trend at the last entail more issues present in release may indicate a potential problem in the data extraction KB releases. process. Completeness List of properties with com- The value of 1 implies no Quality Indicator: We use a simple linear regression model to pleteness measures weighted completeness issue present in value [0,1] the property. The value of 0 predict the KB growth level of resources. It is a class specific measure indicates completeness issues and measurement function based on the entity count from all the KB found in the property. releases. Using the difference between the observed and predicted Percentage (%) of complete- High % presents an estimation entity count values at tbe last KB release, we can detect the trend in ness of fewer issues, and lower % en- tail more issues in KB release. the KB growth level. We evaluated the normalized distance based KB growth KB growth measure value [0,1] The value of 1 implies no unex- on the entity type residual value divided by mean residual value. We pected growth present in the class. The value of 0 indicated used normalize distance between observed and predicated entity that unexpected growth may count value to measure KB growth. In particular, if the normalized happen in the current version distance is greater than 1 then the KB may have unexpected growth of the class. with unwanted entities otherwise KB remains stable. 3 ARCHITECTURE OVERVIEW issues for validation. (ii) Instances: quality profiling is done based on KBQ is composed of four modules that are illustrated in Fig. 1. We summary statistics. To extract the missing instances of a property, implemented KBQ using the R statistical package that we share as the instance extraction component performs comparison between open source in order to foster reproducibility of the experiments7 . the list of instances from the last two versions. (iii) Inspections: The modules are explained in detail below. after the instance extraction is done, a user can select every instance Collect: generates knowledge base (KB) snapshots and sets up for inspection and report. We present instance inspection based on timely schedulers. It supports (i) collection of KB summary statistics data sources. In particular, validation is performed by inspecting via a dedicated SPARQL endpoint; this component is built on top the missing instances and manually evaluate cause of quality issues of Loupe [13]; (ii) collection of periodic KB snapshots that are through data source inspections. (iv) Report: a user can report if accessible through a SPARQL endpoint saved in a CSV files. We the instance is true positive (the subject presents an issue, and an named each CSV file based on the entity type. In particular, we actual problem was detected) or false positive (the item presents used SPARQL endpoint as an input and save the results extracted a possible issue, but none actual problem is found), as well as a from the SPARQL endpoints into CSV files. user can comment on specific issues. Finally, a user can save the Analyze: performs quality profiling based on a particular entity validation report in a HTML file. type and generates quality problem report. We build an intermedi- ary data structure by grouping sets of resources and predicates for 4 USE CASES: KBQ IN ACTION a entity type based on KB releases to speed up the execution of the We present KBQ in action for 3cixty KB [23] and Spanish DBpedia measurement functions. We use the values of quality measures as KB [11]. We selected these two KBs according to: i) popularity and indicators for the quality issues. In Table 1, we present the quality representativeness in their domain: DBpedia for the encyclopedic indicators used in our tool. This module allows saving the analyses domain, 3cixty for the tourist and cultural domain; ii) heterogeneity to an HTML file. in terms of content being hosted, iii) diversity in the update strategy: Visualize: is composed of two modules: (i) list of quality assess- incremental and usually as batch for DBpedia, continuous update ment results and (ii) data set catalogue. Visualization of quality for 3cixty. A recorded video of KBQ in action for these two use assessment results are embedded with analysis module based on cases is available at https://youtu.be/F02l7ImOZV8. four quality characteristics. This allows any user to access quality measures by selecting a specific characteristics. It also allows class 4.1 3cixty KB quality assessment faceted exploration along the various KB releases. 3cixty KB is continuously changing with frequent updates (daily up- Validate: extracts, inspects and allows manual annotations of dates). We target lode:Event class for quality profiling. Using KBQ quality issues. A user can extract properties with quality issues we manually collected 9 snapshots from 2016-03-11 to 2016-09-09. after performing a quality profiling that consists of: (i) Incomplete In addition, we collected daily snapshots starting from 2017-07-19 properties: visualize a list of properties with completeness quality till 2017-09-27 using the scheduler. Overall, this results: (1) For 7 https://github.com/KBQ/KBQ both manually saved snapshots and scheduler generated ones, the K-CAP2017 Workshops and Tutorials Proceedings, 2017 Mohammad et al. Figure 1: High level architecture of the KBQ tool. Persistency measure value of 1 indicates no missing entities in the 4.3 Discussion last version of lode:Event class. (2) Historical persistency value We identified a set of properties with quality issues using evolution of 87.5% for manually saved snapshots estimates little variation based quality characteristics from the Spanish DBpedia KB and presents over all releases. Persistency issues are present only be- 3cixty KB. Furthermore, we evaluate the results from quality analy- tween release 2016-06-16 and 2016-09-09. Also using the scheduler sis using manual validation approach. We performed the manual val- measure value of 85.7% estimates small variation presents where idation based on the detected missing properties from dbo:Place persistency issue is only present between 2017-07-22 and 2017-07- class. For a selected property validation module collect all instances 23. (3) Completeness measures for manually saved snapshots on last presents in the last two releases. For example, we selected the prop- releases of 2017-07-19 detected two properties with quality issues. erty dbo:prefijoTelefóicoNombre 10 to be manually validated. (4) KB growth monitors the dynamics of knowledge base changes. We used KBQ to collect all the instances (56109,55387) from the two For manually saved snapshots of lode:Event, the value of 0 indi- releases (201604,201610). Validation module performed a set dis- cates higher growth than expected on the last release. Furthermore jointed operations between two triple sets to identify those triples we validated the quality using validation module. missing from the 201610 release. From the set disjoint operation In Figure 2 we present the persistency quality assessment results we found total 1982 instances missing from 201610 version. In or- for lode:Event class. Finally, we save quality profiling results in a der to inspect the missing instances we randomly select a subset HTML file (example of a generated report8 ). of 200 instances for evaluation. From the manual evaluation, we identify dbr:Morante11 , which is available in the 201604 release. 4.2 Spanish DBpedia quality assessment However, it is not found in 201610 release of DBpedia. In gen- Spanish DBpedia has less frequent updates (monthly or yearly up- eral, these instances are auto-generated from Wikipedia Infobox dates). We target dbo:place class for quality profiling. We collected keys. To further validate them we track the Wikipedia page from summary statistics of 11 different releases for Spanish DBpedia. The which statement was extracted in the DBpedia KB. We checked the quality profiling results of dbo:place class: (1) Persistency value source Wikipedia page using foaf:primaryTopic about Morante12 . of 1 indicates no missing entities in the last version. (2) Historical In the Wikipedia page prefijo TelefónicoNombre is present in the persistency has 100% indicating consistent growth across releases. Wikipedia infobox Key. In the Spanish DBpedia from 201604 ver- (3) Completeness: in version 201610 of DBpedia we detected 9 prop- sion to 201610 version update, this data instance has been removed erties with quality issues. (4) KB growth of dbo:place is equal to 0 from the property dbo:prefijoTelefóicoNombre. Therefore, these in- indicating higher growth (over the expected) on the last release. In stances are present in the Wikipedia Infobox as Keys but missing Figure 3 we present the persistency quality assessment results for in the DBpedia 201610 release. This example shows a validity of dbo:place class. Finally, we save quality profiling results in a HTML the completeness issue presents in the 201610 release of DBpedia file (example of a generated report9 ). for property dbo:prefijoTelefóicoNombre. 8 http://datascience.ismb.it/shiny/2017-07-21-QualityProblemReport.html 10 http://es.dbpedia.org/property/prefijoTelefóicoNombre 9 http://datascience.ismb.it/shiny/2017-07-21--http---dbpedia.org-ontology-Place-. 11 http://es.dbpedia.org/page/Morante html 12 https://es.wikipedia.org/wiki/Morante KBQ - A Tool for Knowledge Base Quality Assessment Using Evolution Analysis K-CAP2017 Workshops and Tutorials Proceedings, 2017 Figure 2: 3cixty lode:Event class. Figure 3: DBpedia dbo:place class. 5 RELATED WORK versions), as well as queries involving both changes in schema and The research activities related to our approach fall into two main instance. Zabilith et al. [24] ontology conducted an extensive work research areas: (i) Change Detection in Linked Datasets and (ii) at the ontology level detection, representation, and management of Linked Open Data Quality Assessment. the changes. Pernelle et al.[19] present an approach which allows Change Detection in Linked Data: There are various features to detect and represent elementary and complex changes that can of dataset dynamics which must be considered to achieve a compre- be detected only on the data level. hensive overview of how linked data changes evolve on the Web [8]. Linked Data Quality Assessment: Regarding the automated Issues in curated RDF(S) have been addressed by Papavasileiou et LOD quality assessment, Fleischhacker et al. [3] proposed a two-fold al. [17]. They introduce a high-level language of changes and its approach that relies on unsupervised outlier detection to identify formal detection and application semantics, as well as a correspond- numerical errors in objects of RDF triples. A probabilistic frame- ing change detection algorithm, which satisfies these needs for work presented by Li et al. [12] that predicts arithmetic relations RDF(S) KBs. Ellefi et al. [2] present a comprehensive overview of (equal, greater than, less than) among multiple RDF predicates to the RDF dataset profiling feature, methods, tools, and vocabularies. detect inconsistencies in numerical and date values. Based on the They present dataset profiling in a taxonomy and illustrate the links statistical distribution of predicates and objects in RDF datasets between the dataset profiling and feature extraction approaches. Paulheim et al.[18] presented two algorithms SDType and SDVal- Recently, Yannis et al. [21] proposed a framework that detected idate. SDType predicts classes of RDF resource thus completing changes between versions. It enables easy and efficient navigation missing values of rdf:type properties. SDValidate detects incorrect among versions, automated processing, and analysis of changes. links between resources within a dataset. The framework SWIQA They also include cross-snapshot queries (spanning across different K-CAP2017 Workshops and Tutorials Proceedings, 2017 Mohammad et al. proposed by Furber and Hepp [5] can be applied for detecting accu- [4] Annika Flemming. 2010. Quality characteristics of linked data publishing data- racy quality issues including incorrect object values, datatypes, and sources. Master’s thesis, Humboldt-Universität of Berlin (2010). [5] Christian Fürber and Martin Hepp. 2011. Swiqa-a semantic web information literals. These solutions are tailored to detect very specific errors quality assessment framework.. In ECIS, Vol. 15. 19. in RDF triples. However, in the current state of the art, less focus [6] Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann. 2012. Assessing linked data mappings using network measures. The Semantic Web: Research and has been given toward understanding knowledge base resource Applications (2012), 87–102. changes over time to detect anomalies over various releases. [7] ISO/IEC. 2008. 25012:2008 – Software engineering – Software product Quality Requirements and Evaluation (SQuaRE) – Data quality model. Technical Report. ISO/IEC. http://iso25000.com/index.php/en/iso-25000-standards/iso-25012 6 LIMITATIONS [8] Tobias Káfer, Ahmed Abdelrahman, Júrgen Umbrich, Patrick OâĂŹByrne, and We have identified the following two limitations, such as: Aidan Hogan. 2013. Observing linked data dynamics. In Extended Semantic Web Conference. Springer, 213–227. First, in this tool we detect changes between two KB releases [9] Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens only based on summary statistics. In particular, we applied coarse- Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In Proceedings of the 23rd international conference on World grained analysis to capture any quality issues for evolving KB. Wide Web. ACM, 747–758. Although coarse-grained analysis cannot capture all possible quality [10] Nuno Laranjeiro, Seyma Nur Soydemir, and Jorge Bernardino. 2015. A survey on issues, it helps to identify common quality issues such as systematic data quality: classifying poor data. In Dependable Computing (PRDC), 2015 IEEE 21st Pacific Rim International Symposium on. IEEE, 179–188. errors in data extraction and integration processes. [11] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Second, in KBQ we introduce a manual validation module where Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören we aim to keep track of the detected quality issues by using true or Auer, et al. 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167–195. false annotations. We aim to use this manually annotated result as [12] Huiying Li, Yuanyuan Li, Feifei Xu, and Xinyu Zhong. 2015. Probabilistic error a gold standard for future quality assessment tasks. Furthermore, detecting in numerical linked data. In International Conference on Database and Expert Systems Applications. Springer, 61–75. we envision that an automatic schema validation using integrity [13] Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, and constraints could be helpful for the validation process. Asunción Gómez-Pérez. 2015. Loupe-An Online Tool for Inspecting Datasets in the Linked Data Cloud.. In International Semantic Web Conference (Posters & Demos). 7 CONCLUSIONS AND FUTURE WORK [14] Felix Naumann. 2014. Data Profiling Revisited. SIGMOD Rec. 42, 4 (Feb. 2014), 40–49. https://doi.org/10.1145/2590989.2590995 The main motivations for the work presented in this paper is rooted [15] Chifumi Nishioka and Ansgar Scherp. 2016. Information-theoretic Analysis of in the concepts of Linked data dynamics13 on the one side and Entity Dynamics on the Linked Open Data Cloud. In PROFILES@ ESWC. knowledge base quality on the other side. The focus of this work is [16] Jack E. Olson. 2003. Data Quality: The Accuracy Dimension (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. to automate the timely process of quality issue detection without [17] Vicky Papavasileiou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, and user intervention based on evolution analyses. More specifically, Vassilis Christophides. 2013. High-level change detection in RDF (S) KBs. ACM we explored the idea of monitoring KB changes as the premise of Transactions on Database Systems (TODS) 38, 1 (2013), 1. [18] Heiko Paulheim and Christian Bizer. 2014. Improving the quality of linked this work. We design and develop KBQ tool for Knowledge Base data using statistical distributions. International Journal on Semantic Web and quality assessment using evolution analysis. We present four qual- Information Systems (IJSWIS) 10, 2 (2014), 63–86. [19] Nathalie Pernelle, Fatiha Saïs, Daniel Mercier, and Sujeeban Thuraisamy. 2016. ity evolution based quality characteristics persistency, historical RDF data evolution: efficient detection and semantic representation of changes. persistency, completeness and KB growth. KBQ is also knowledge In Semantic Systems-SEMANTiCS2016. 4–pages. base agnostic and we demonstrated its usage for two different use [20] Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. 2002. Data Quality Assessment. Commun. ACM 45, 4 (April 2002), 211–218. https://doi.org/10.1145/505248.506010 cases, namely 3cixty and Spanish DBpedia. In particular, in this [21] Yannis Roussakis, Ioannis Chrysakis, Kostas Stefanidis, Giorgos Flouris, and work we explored the benefits of aggregated measures using quality Yannis Stavrakas. 2015. A flexible framework for understanding the dynamics profiling. of evolving RDF datasets. In International Semantic Web Conference. Springer, 495–512. As future work, we plan to add automatic error annotations of the [22] Giri Kumar Tayi and Donald P Ballou. 1998. Examining data quality. Commun. properties with quality issues. We also plan to extend our validation ACM 41, 2 (1998), 54–57. [23] Raphael Troncy, Giuseppe Rizzo, Anthony Jameson, Oscar Corcho, Julien Plu, approach for automatic snapshots generation and publishing in a Enrico Palumbo, Juan Carlos Ballesteros Hermida, Adrian Spirescu, Kai-Dominik triple format. Kuhn, Catalin Barbu, et al. 2017. 3cixty: Building comprehensive knowledge bases for city exploration. Web Semantics: Science, Services and Agents on the World Wide Web (2017). ACKNOWLEDGMENTS [24] Fouad Zablith, Grigoris Antoniou, Mathieu d’Aquin, Giorgos Flouris, Haridimos This work was partially funded by the Spanish government with Kondylakis, Enrico Motta, Dimitris Plexousakis, and Marta Sabou. 2015. Ontology evolution: a process-centric survey. The knowledge engineering review 30, 1 (2015), the BES-2014-068449 grant and Datos 4.0 project. 45–75. [25] Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2016. Quality assessment for linked data: A survey. Semantic REFERENCES Web 7, 1 (2016), 63–93. [1] Jeremy Debattista, SÓren Auer, and Christoph Lange. 2016. LuzzuA Methodol- ogy and Framework for Linked Data Quality Assessment. Journal of Data and Information Quality (JDIQ) 8, 1 (2016), 4. [2] Mohamed Ben Ellefi, Zohra Bellahsene, J Breslin, Elena Demidova, Stefan Dietze, Julian Szymanski, and Konstantin Todorov. 2017. Rdf dataset profiling-a survey of features, methods, vocabularies and applications. Semantic Web (2017). [3] Daniel Fleischhacker, Heiko Paulheim, Volha Bryl, Johanna Völker, and Christian Bizer. 2014. Detecting errors in numerical linked data using cross-checked outlier detection. In International Semantic Web Conference. Springer, 357–372. 13 https://www.w3.org/wiki/DatasetDynamics