Towards maintainable constraint validation and repair for taxonomies: The PoolParty approach Christian Mader1 and Monika Solanki2 1 Semantic Web Company, Austria c.mader@semantic-web.at 2 Department of Computer Science, University of Oxford, UK monika.solanki@cs.ox.ac.uk Abstract. The specification and validation of constraints in semanti- cally annotated datasets has been receiving increasing attention in the last few years. In this paper we present requirements for using such ap- proaches in an industrial setting. We underpin these requirements with concrete examples of data constraints, as encountered during the devel- opment and operation of the PoolParty Thesaurus Server. We show how these constraints can be expressed in SPARQL and propose a SHACL- based approach that combines data validation specification with repair strategies. Based on an implementation that is driven by practical use cases, we show that this approach is able to validate and repair datasets that are provided by customers of PoolParty as well as those openly available on the Web. 1 Introduction Semantic Web Company3 (SWC) provides one of the leading commercial tax- onomy management applications, the PoolParty Thesaurus Server (PPT). In recent years, it has evolved into an authoring tool for knowledge graphs that make use of standard schemas such as SKOS4 , DCTERMS5 , or FOAF6 . Taxonomists use PPT to integrate a variety of models, schemata, ontologies and vocabularies into their knowledge bases. From a data-centric perspective, the taxonomy import functionality is one of the most crucial utilities provided in PPT, to interact with third-party datasets. PoolParty supports simple im- port of existing lists, spreadsheets or taxonomies residing on a local drive or on the Web. This leverages the value of existing taxonomies and makes knowledge organization systems part of the Semantic Web. One of the challenges in combining varied data sources is to ensure that these data mashups at any time conform to a set of quality heuristics. This is needed because applications such as PPT, that consume data from external sources 3 https://www.semantic-web.at/ 4 http://www.w3.org/2004/02/skos/ 5 http://dublincore.org/documents/dcmi-terms/ 6 http://www.foaf-project.org/ including the Web, rely on data processing algorithms which expect a defined set of input data. For instance, PPT accesses DBpedia subject categories whose hierarchical structure frequently changes so that it violates the requirements of Poolparty’s internal data model. Furthermore, as taxonomies transist from “simple” thesauri to fully-fledged ontologies with rich semantics, users of PPT want to impose custom semantics and restrictions on the datasets they develop. Validating data against a set of data-driven rules or constraints7 that ensure the desired data quality is therefore a critical task that needs to be made a part of any workflow involving PPT, both from an internal developer and an external stakeholder perspective. Keeping validation rules or data consistency constraints aligned with changes in PPT’s data processing logic requires additional effort in terms of development resources, so data constraints must be easily maintainable. Good quality data in combination with means for easy data constraint man- agement can therefore (1) ensure that Linked Data consuming applications work as expected, (2) make sure the datasets which customers develop fulfil the in- tended usage scenarios, and (3) relieve SWC of some of the additional overheads in terms of resource allocation for data-related debugging issues. In this paper, we present the results of an evaluation carried out to identify the constraint vio- lations that typically occur when datasets are imported into PPT. We focus on (semi-) automated detection and repair of constraint violations (e.g., missing or inconsistent data) in datasets, both imported as well as developed within PPT. The paper is structured as follows: Section 3 presents the requirements anal- ysis undertaken to assess the need for constraints validation. Section 4 discusses our dataset selection methodology. Section 5 provides an overview of the for- malisation of constraints. Section 6 presents our constraint validation and repair techniques. Section 7 presents our evaluation methodology and discusses the re- sults of our evaluation. Section 2 discusses related work and, finally, Section 8 presents our conclusions and future work. 2 Related Work RuleML8 is a family of rule languages serialised in XML. RuleML [2] covers a wide spectrum of rules from deliberation, derivation, transformation and reaction rules. SWRL9 is an expressive rule language that can be used to increase the amount of knowledge encoded in OWL ontologies. RIF10 is a W3C standard designed for interchanging rules between different rule systems. OWL-RL11 is a syntactic subset of OWL 2 which is amenable to implementation using rule- 7 In this paper, rules and constraints are used interchangeably. 8 http://www.ruleml.org/ 9 http://www.w3.org/Submission/SWRL/ 10 http://www.w3.org/TR/rif-overview/ 11 http://www.w3.org/TR/owl2-profiles/#OWL_2_RL based technologies. Provision for the specification of rules have also been made in Semantic Web frameworks and triple stores (e.g., Jena12 or OWLIM13 ). The use of SPARQL as a rule language based on its CONSTRUCT keyword has long been advocated [9]. Specialised extensions of SPARQL for the rule based processing of events such as EP-SPARQL [1] have also been proposed. Another approach that exploits SPARQL for data validation is RDFUnit [6] which uses a test-driven methodology usable for either direct application on Linked Datasets, or indirectly by evaluating the quality of mapping files that in turn are used to generate Linked Datasets [3]. A pragmatic approach towards dataset analysis is taken by qSKOS [7], which is a tool for evaluating potential quality problems (,,quality issues”) in datasets that use SKOS. SPIN (SPARQL Inferencing notation) is a SPARQL-based rule and constraint language for the Semantic Web. SPIN links class definitions with SPARQL queries using the SPIN modelling vocabulary, to capture constraints and rules. The use of SPARQL and SPIN to identify data quality problems on the Semantic Web has been proposed in [4]. SHACL14 (Shapes constraint language) is a specification under development by the W3C RDF Data shapes working group. We highlight the advantages of SHACL as used in our proposed approach in Section 5. Hansen et al. [5] introduce Validata, an online tool that tests conformance of RDF datasets against specifications written in the Shape Expression language (ShEx) [10]. In this paper we follow a similar approach, but build on SHACL for constraint specification. In contrast to ShEx, SHACL allows constraint def- initions to be specified as RDF datasets, thus levering the potential of Linked Data such as easy reuse and extension by, e.g., attaching repair strategies as we have suggested in this paper. Work on repairing datasets that use the SKOS schema has been done by Suominen et al. [11, 12]. They introduce Skosify, a tool that can automatically resolve potential quality issues such as hierarchical circularities or overlapping labels. The resolution logic is built into Skosify and changes to the code must be made if treatment of the found issues should be adapted or extended. While the above reported approaches can be used to implement validation of RDF datasets, most of them require the creation of sophisticated rule bases to check against each quality dimension separately. This is hard to maintain and require additional knowledge of the rule language syntax (ShEx) or templating capabilites (RDFUnit). We therefore focus on an approach using SHACL because of its capability to abstract from application of basic rules in favour of a higher level description of a graph’s structure. 12 http://jena.sourceforge.net/inference/#rules 13 https://confluence.ontotext.com/display/OWLIMv43/OWLIM-SE+Reasoner# OWLIM-SEReasoner-RuleBasedInference 14 http://www.w3.org/TR/shacl/ 3 Requirements Analysis As part of our requirements analysis, we identified several areas in PPT where constraint violations could potentially occur: 3.1 Data Consistency Constraints PPT uses a triple-store to persist taxonomy information. Changes to these datasets are currently performed in two different ways: First, by executing atomic “actions”, which encapsulate triple changes (i.e., additions and removals). Cur- rently, checks if an action contains valid triples in a way that it does not violate the store’s data consistency are scattered in the code and sometimes performed multiple times, making them hard to maintain. There is also a “legacy way” by adding and deleting triples directly from the store with no constraint validation at all. We therefore need a way to R1 Provide a mechanism to specify data constraints in a formal way, fit for usage with the various PPT software components as well as easily understandable and changeable also by end-users. PPT components impose constraints on the data they process. It is most crit- ical that the core PPT constraints are met at any time, otherwise the application may fail. Therefore, in this paper, most of the exemplary constraints cover these core constraints. Currently PPT fails to display, or, in the worst case, produces error messages if it encounters data not conforming to these constraints. SWC works towards a constraint-based system which would be able to tell the user why data validation has failed and suggest ways in which the user can address the problem(s). This results in the requirements: R2 Identify and analyse customer datasets that are imported into PPT and are a source of constraint violations. R3 Provide a validation mechanism to check for constraint violation and eval- uate this against the selected datasets. 3.2 Requirements for Constraint Resolution In the semantic space, constraint violations can be extremely difficult to under- stand. This is because they may come from a number of different sources or are the result of combining data from these sources – for example when merging two ontologies as it is, e.g., done when importing datasets. Users of PPT would greatly benefit from tools which could help them to understand more easily what the cause of constraint conflicts is and what pathways are available to resolve them. We need a method for formulating actions for responding to constraint violations in a generic way, leading to requirement R4 Combine formal data constraint definitions with reusable repair strategies that can be easily applied by end-users in a (semi-) automatic way. 4 Dataset and Constraint Selection In order to investigate the extent at which consistency constraint violations constitute a problem, we performed an analysis on a selection of datasets created by SWC, customers of SWC, and publicly available on the Web. We complement the datasets by a selection of consistency constraints. In this section we address requirement R2. Regarding dataset selection, SWC maintains a list of datasets that are re- quested by customers for use with PPT either as the basis of a new taxonomy or as target vocabularies for mapping and alignment. For some of these datasets, SWC implemented methods15 to convert them into taxonomies that make use of the SKOS schema and are suitable for import into PPT. The list of datasets also contains taxonomies that have been created by third parties using undoc- umented conversion methods. Furthermore, SWC also identified datasets for which it is currently unknown if they can be used with PPT without conver- sion. We reviewed the list of datasets and assigned each dataset to one of the following groups: SWC-generated - Datasets for which a conversion to a PPT- compatible taxonomy has been performed by SWC (containing 10 datasets), Custom-generated : Datasets for which a conversion to a PPT-compatible tax- onomy has been performed by third-party institutions (containing 9 datasets), and Web: Datasets that are using SKOS, but for which is currently unknown if they are compatible with PPT (containing 7 datasets). Overall, this procedure led to identification of 26 datasets, which we down- loaded either from SWC-internal file servers or, if no version prepared for PPT existed, from the dataset website. Regarding the selection of constraints, we identified 16 data validation con- straints which must be fulfilled by a dataset so that it can be used properly with PPT. There is currently no well-defined application schema, however, we infered these constraints from the application logic and from experiences with import- ing customer-provided datasets into PPT. We selected six constraints which only address the parts of the dataset that use elements of SKOS. Other validation constraints required by different PPT components also rely on project meta- data, such as the default language of the project or the projects linked to the taxonomy. However, these are subject of our future work. 5 Constraint Specification In this section we address requirements R1 and R4 by (i) specifying the con- straints we identified through the method described in Section 4 and (ii) intro- ducing an approach to interweave validation constraints with repair strategies. 15 using, e.g., script languages such as Perl or altering the RDF graph of the original vocabulary with SPARQL. 5.1 Data Validation Constraints We present exemplary data consistency constraints which should be met in order for PPT to function as expected. Some of these constraints have already been defined [6–8], some are specific to PPT. BidirectionalRelationsHierarchical (br): PPT’s algorithms for taxonomy display and processing rely on having both directions of a hierarchical relation materialized as triples. This constraint is also contained in the catalogue of checks performed by RDFUnit [6]. SELECT DISTINCT ? resource WHERE { ? someRes ? p ? resource . ? p owl : inverseOf ? pInv . FILTER NOT EXISTS {? resource ? pInv ? someRes }} ConceptTypeAssertion (cta): PPT expects that for each concept, member- ship to the skos:Concept class is explicitly asserted because no RDFS inferenc- ing is performed by default. SELECT DISTINCT ? resource WHERE { ? resource skos : broader | skos : narrower ? otherRes . FILTER NOT EXISTS {? resource a skos : Concept }} HierarchicalConsistency (hc): PPT manages taxonomies as tree-like graphs where each concept must have a parent resource. SELECT DISTINCT ? resource WHERE { ? resource a skos : Concept FILTER NOT EXISTS { ? resource ( skos : broader |^ skos : narrower ) */ skos : topConceptOf ? parent . ? parent a skos : ConceptScheme .}} LabelAmbiguities (lam): Distinct resources labelled identically may consti- tute a quality problem (e.g., duplicate or redundant concepts). This constraint is not mandatory to be met by the data but hint at improving the taxonomy [7]. SELECT DISTINCT ? resource WHERE { ? resource ? labelProp1 ? label . ? otherRes ? labelProp2 ? label . FILTER (? labelProp1 IN ( skos : prefLabel , skos : altLabel , skos : hiddenLabel ) ) FILTER (? labelProp2 IN ( skos : prefLabel , skos : altLabel , skos : hiddenLabel ) ) FILTER (? resource != ? otherRes ) } UniquePreferredLabels (upl): As pointed out by the SKOS reference doc- umentation [8], a concept must not have assigned two distinct preferred labels in the same language. Since PPT closely follows the SKOS specification, this constraint must be met in order to ensure data consistency. SELECT DISTINCT ? resource { ? resource skos : prefLabel ? pl1 . ? resource skos : prefLabel ? pl2 . FILTER ((? pl1 != ? pl2 ) && ( LANG (? pl1 ) = LANG (? pl2 ) ) ) } 5.2 Validation using SHACL Having one SPARQL query for every constraint is hard to read and maintain. In order to overcome these limitations we specify our constraints encapsulated within SHACL declarations as it provides the following advantages over direct SPARQL representation. – Abstraction: We use SHACL as a kind of “abstraction layer” over SPARQL as we run the validations directly over the data in our triple store. – Declarative: SHACL provides a nice declarative mechanism to associate con- straints with nodes in the data graph that have various scoping options. – Composition: SHACL provides a way to define a kind of “specification” an RDF graph has to be conformant to. It is therefore possible to address mul- tiple consistency constraints in one RDF document. As highlighted earlier in Section 2, it is worth noting that SPARQL and SHACL are meant to complement rather than compete with each other. The following example shows how the constraint upl can be expressed in SHACL. ppts : ConceptShape a sh : Shape ; sh : scopeClass skos : Concept ; sh : property [ a sh : P r o p e r t y C o n s t r a i n t ; sh : predicate skos : prefLabel ; sh : minCount 1; sh : minLength 1; sh : datatype rdf : langString ; sh : uniqueLang true ]. 5.3 Integration of Repair Strategies To interweave validation constraints with repair strategies, we developed a basic vocabulary. Below, we show an exemplary shape, expressing the constraint which says that resources having assigned another resource by the skos:broader prop- erty, must also have an incoming link asserted by the skos:narrower property. If this constraint is violated, the repair strategy rs:AddInverseStrategy should be executed, which adds this missing reciprocal relation. We provide details on the implementation in Section 6. ppts : C o n c e p t H a v i n g B r o a d e r a sh : Shape ; sh : scope [ a sh : Scope ; a sh : PropertyScope ; sh : predicate skos : broader ]; sh : i n ve rs eP r op er ty [ a sh : I n v e r s e P r o p e r t y C o n s t r a i n t ; sh : predicate skos : narrower ; sh : minCount 1; rs : strategy [ a rs : A d d I n v e r s e S t r a t e g y ]]. The type of the repair strategy defined in the example above refers to a resolution algorithm built into our framework. This is just a convention, as re- pair strategies can, e.g., also be stated as SPARQL INSERT operations that are parameterized and executed. 6 Implementation In this section we address requirement R3. To detect the datasets that violate the data validation constraints introduced in Section 5.1, we implemented16 a Java application that takes as an input the dataset file and outputs a report con- taining the resources that cause the violation of a specific constraint. Constraint validation is done in multiple steps: First, a Sesame17 in-memory repository is created. It is initialized with both the SKOS data model18 as well as the dataset that is being validated. For each constraint, the implementation executes the corresponding SPARQL query and assembles the violation report. Our implementation for repairing datasets is based on SHACL and the im- plementation provided by Holger Knublauch19 . We created a Java tool which takes as an input the dataset that should be validated and repaired as well as its constraints formalized by following the SHACL specification20 . The tool then creates a Jena21 RDF model containing this dataset, the shapes formalization and the SHACL schema. Based on these, validation is performed and a RDF report containing the validation errors is created. For dataset repair, we again create an in-memory Sesame repository con- taining the validation report, the SKOS schema, the dataset’s constraints (in SHACL) and the dataset itself. Based on the set of constraint violations, our implementation instantiates the repair strategies that have been specified in the provided constraint definitions. Each repair strategy returns a triple changeset, i.e., sets of RDF statements that should be added or removed. This changeset is then applied to the repository and the dataset can be considered repaired. Our methodology for identifying and repairing constraint violations has an impact on a multitude of Linked Data consuming and processing applications. They must make sure that the data it processes must meet certain requirements imposed by the closed-world environment, i.e., their business logic. Since the technologies we use in our implementation are based on SPARQL, scalability for constraint checking largely depends on the effectiveness of the used RDF store. When using SHACL for constraint specification, time complexity depends on the effectiveness of the SHACL implementation, which, according to the SHACL specification, is not bound to SPARQL. 16 https://github.com/cmader/dataconsistency 17 http://rdf4j.org/ 18 retrieved from http://www.w3.org/2009/08/skos-reference/skos.rdf 19 https://github.com/TopQuadrant/shacl 20 http://w3c.github.io/data-shapes/shacl/ 21 http://jena.apache.org/documentation/rdf/ 7 Evaluation In this section, we provide results of our implementation of the SPARQL-based constraint violation introduced in Section 5.1 as well as a SHACL-based con- straint validation using one exemplary constraint definition and repair strategy (introduced in Section 5.3). 7.1 SPARQL-based Dataset Validation Results Table 1 shows an overview of the observed constraint violations in the datasets we identified. For each data consistency constraint we provide the number of datasets that violate this constraint. As expected, violations of the constraint cta never occurred in datasets that were converted into PPT taxonomies. Interestingly, violations of br, hc, and lam occur also in these converted taxonomies. The reason for this is that the datasets have been extracted from earlier versions of PPT or were created using conversion scripts designed for these earlier versions. Only one constraint, upl, was never observed in any of the datasets. The reason might be that upl is not a PPT-specific constraint but specified in the SKOS reference documentation and is therefore respected by most dataset providers. We can observe that both datasets being converted for PPT as well as datasets on the Web violate consistency constraints. Therefore, during the devel- opment process of an application, it is crucial to maintain and evolve data con- sistency constraint definitions in the same way as the application’s code evolves. The data (constraints) and software development lifecycles must be aligned to each other and there is a need for approaches, such as the one proposed in this paper, that aim to support this goal. Table 1. Number of datasets violating a specific constraint. Constraint SWC-generated Custom-generated Web ConceptTypeAssertion (cta) 0 0 2 BidirectionalRelationsHierarchical (br) 6 5 6 HierarchicalConsistency (hc) 4 3 5 DctermsCreatorLiteral (dcl) 0 0 1 LabelAmbiguities (lam) 8 9 4 UniquePreferredLabels (upl) 0 0 0 Figure 1 illustrates how the constraints we expressed as SPARQL queries (Section 5.1) perform in relation to the number of statements they contain. To maintain readability, we omitted 10 datasets that contain less than 50,000 state- ments. For each of the remaining datasets, we computed the arithmetic mean of the time the six constraint checks take. We can observe that there is no corre- lation between a dataset’s size and the time needed to perform the constraint validation. The difference in validation runtime, however, is determined by the structure of the dataset. The constraint lam, for instance, compares different combinations of labels, of multiple concepts, depending on their type (preferred, alternative, or hidden label). If some concepts lack one or more of these types, the number of combinations drops significantly, resulting in a lower validation time. This situation applies in a similar way also to other constraints. Fig. 1. Dataset Validation Performance 7.2 Dataset Repair Results using SHACL To find out how our approach performs in a practical setting, we implemented a repair strategy and applied it on the datasets we retrieved from the Web. The repair strategy we chose is a special case of the constraint br which we introduced in see Section 5.1. It complements the relation skos:broader for two resources R1 and R2 if the dataset has asserted skos:narrower for R2 and R1 and vice versa. Parts of the SHACL definition of the according data shape have been described in Section 5.3. Table 2 lists the selected datasets, ordered by the number of statements they contain. For each dataset we also list the time loading it into the memory store for validation as well as the time needed for validation against the SHACL def- initions. For those datasets which failed validation, we provide the time needed for setting up the repair environment (Repair Setup Time) and the time needed for performing the actual repair (Repair Time), i.e., creating the required RDF triple changeset. We also provide the number of statements that were automat- ically added by the implemented repair strategy. The reason why, compared to Table 1, fewer datasets are affected by the constraint (no added statements) is that the validation and repair constraint only takes into account the two relations skos:broader and skos:narrower and not other relations being defined as owl:inverseOf each other (Section 5.1). Our results indicate that SHACL-based constraint verification and repair is feasible for datasets on the Web. Our current implementation is not optimal in terms of runtime performance as it uses different RDF frameworks causing overhead in tooling setup. In an optimized version, we will therefore be able to eliminate the time for setting up the repair environment. However, this evalu- ation showed us that dataset validation with the used SHACL implementation performs and scales well even for larger datasets. Table 2. Repair strategy execution performance. Dataset Total State- Loading Validation Repair Repair Added ments Time (ms) Time (ms) Setup Time Time (ms) Statements (ms) gemet-skoscore.rdf 32596 5192 11986 0 0 0 stw.ttl 108967 5923 14735 0 0 0 psh-skos.rdf 119136 9052 18762 0 0 0 thesoz 0 93.rdf 427737 21654 14625 12576 283 1 eurovoc skos.rdf 2828910 112836 23366 60520 35112 6922 agrovoc 2015 4739996 295628 43316 179526 567247 33656 esco skos.rdf 6420749 245587 20277 0 0 0 8 Conclusions In this paper we have presented requirements towards establishing data consis- tency management in an industrial setting for the PPT application. We intro- duced examples of data consistency constraints and showed how these constraints can be formalized using SPARQL. We also demonstrated how SHACL can be used to express them in a more declarative and concise way. Furthermore, we introduced an approach how to interweave SHACL-based data consistency spec- ification with repair strategies and provided information how such an approach performs when used for repairing datasets that are available on the Web. The checking and repair approach is generalizable in a way that it can be applied to any kind of dataset so that it can be consumed by any application that imposes a certain structure on the data. We found that validation of datasets generated by (and for usage with) PPT, which are provided by either SWC or customers as well as datasets from the Web, can be done with reasonable performance. Furthermore, we believe that integrat- ing repair strategies and data constraint specification helps in building a unified, maintainable model for expressing an application’s requirements for reliably pro- cessing data from the Web. Such a model is also crucial for keeping compatibility between software versions and can play a pivotal role in harmonizing data and software development processes. In our next steps, we will build on the foundations introduced in this paper. We will investigate methods for defining reusable repair strategies for constraint violations. The vision is that repair strategies, just as constraint definitions, may be published on the Web as Linked Data for anyone to adopt and extend. As constraint violation repair will, in some cases, require user input, we will also identify ways to generate user interfaces and best practices on how to repair a large number of constraint violations of the same kind. Acknowledgements This work was supported by grants from the EU’s H2020 Programme ALIGNED (GA 644055). References 1. Anicic, Darko et al. EP-SPARQL: A Unified Language for Event Processing and Stream Reasoning. In Proceedings of the 20th International Conference on World Wide Web, WWW ’11. ACM, 2011. 2. H. Boley, A. Paschke, and O. Shafiq. Ruleml 1.0: The overarching specification of web rules. In Semantic Web Rules - International Symposium, RuleML 2010, Washington, DC, USA, October 21-23, 2010. Proceedings, pages 162–178, 2010. 3. A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Man- nens, S. Hellmann, and R. Van de Walle. Assessing and Refining Mappingsto RDF to Improve Dataset Quality, pages 133–149. Springer International Publishing, Cham, 2015. 4. C. Fürber and M. Hepp. Using sparql and spin for data quality management on the semantic web. In Business Information Systems, 13th International Conference, BIS 2010, Berlin, Germany, May 3-5, 2010. Proceedings, Lecture Notes in Business Information Processing. Springer, 2010. 5. J. B. Hansen, A. Beveridge, R. Farmer, L. Gehrmann, A. J. Gray, S. Khutan, T. Robertson, and J. Val. Validata: An online tool for testing rdf data conformance. 6. D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, and A. Zaveri. Test-driven evaluation of linked data quality. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pages 747–758. International World Wide Web Conferences Steering Committee, 2014. 7. C. Mader, B. Haslhofer, and A. Isaac. Finding quality issues in skos vocabularies. In TPDL 2012 Therory and Practice of Digital Libraries, Germany, May 2012. 8. A. Miles and S. Bechhofer. Skos simple knowledge organization system reference. WWW Consortium, Working Draft WD-skos-reference-20080829, August 2008. 9. A. Polleres. From sparql to rules (and back). In Proceedings of the 16th Interna- tional Conference on World Wide Web, WWW ’07. ACM, 2007. 10. E. Prud’hommeaux, J. E. Labra Gayo, and H. Solbrig. Shape expressions: An rdf validation and transformation language. In Proceedings of the 10th International Conference on Semantic Systems, pages 32–40. ACM, 2014. 11. O. Suominen and E. Hyvönen. Improving the quality of skos vocabularies with skosify. In Knowledge Engineering and Knowledge Management, pages 383–397. Springer, 2012. 12. O. Suominen and C. Mader. Assessing and improving the quality of skos vocabu- laries. Journal on Data Semantics, 3(1):47–73, 2014.