Introduction

The uComp Protege Plugin for Crowdsourcing Ontology Validation

Florian Hanika

florian.hanika@wu.ac.at

Gerhard Wohlgenannt

gerhard.wohlgenannt@wu.ac.at

Marta Sabou

marta.sabou@modul.ac.at 0 0 MODUL University Vienna

The validation of ontologies using domain experts is expensive. Crowdsourcing has been shown a viable alternative for many knowledge acquisition tasks. We present a Protege plugin and a work ow for outsourcing a number of ontology validation tasks to Games with a Purpose and paid micro-task crowdsourcing. Protege3 is a well-known free and open-source platform for ontology engineering. Protege can be extended with plugins using the Protege Development Kit. We present a plugin for crowdsourcing ontology engineering tasks, as well as the underlying technologies and work ows. More speci cally, the plugin supports outsourcing of some typical ontology validation tasks (see Section 2.2) to Games with a Purpose (GWAP) and paid-for crowdsourcing. The research question our work focuses on is how to integrate ontology engineering processes with human computation (HC), to study which tasks can be outsourced, how this a ects the quality of the ontological elements, and to provide tool support for HC. This paper concentrates on the integration process and tool support. As manual ontology construction by domain experts is expensive and cumbersome, HC helps to decrease cost and increase scalability by distributing jobs to multiple workers.

Protege plugin ontology engineering crowdsourcing human computation

Introduction The uComp Protege Plugin

The uComp Protege Plugin allows the validation of certain parts of an ontology, which makes it useful in any setting where the quality of an ontology is questionable, for example if an ontology was generated automatically with ontology learning methods, or if a third-party ontology needs to be evaluated before use. This section covers the uComp API, and the uComp Protege plugin (functionality and installation).

3 protege.stanford.edu The uComp API

The Protege plugin sends all validation tasks to the uComp HC API. Depending on the settings, the API further delegates the tasks to a GWAP or to CrowdFlower4. CrowdFlower is a platform for paid micro-task crowdsourcing. The uComp API5 currently supports classi cation tasks (other task types are under development). The API user can create new HC jobs, cancel jobs, and collect results from the service. All communication is done via HTTP and JSON. 2.2

The plugin

The plugin supports the validation of various parts of an ontology: relevance of classes, subClassOf relations, domain and range axioms, instanceOf relations, etc. The general usage pattern is as follows: the user selects the respective part of the ontology, provides some information for the crowdworkers, and submits the job. As soon as available, the results are presented to the user.

Class relevance check For the sake of brevity, we only describe the Class Relevance Check and SubClass Relation Validation in some detail. The other task types follow a very similar pattern. Class Relation Check helps to decide if a given class (or a set of classes) { based on the class label { is relevant for the given domain. Figure 1 shows an example class relevance check for the class bond. After selecting a class, the user can enter a ontology domain (here: Finance) to validate against, and give additional advice to the crowdworkers. Furthermore, (s)he can choose between the GWAP and CrowdFlower for validation. If CrowdFlower is

4 www.crowd ower.com

5 tinyurl.com/mkarmk9

The uComp Protege Plugin for Crowdsourcing Ontology Validation selected, the expected cost of the job can be calculated. The validate subtree option allows to validate not only the current class, but also all its subclasses (recursively). To validate the whole ontology in one go, the user selects the root class (Thing) and marks the validate subtree option. When available, the results of the HC task are presented in a textbox. In Figure 1 only one judgment was collected { the crowdworker stated that class bond is relevant for the domain. Validation of SubClass Relations With this component, a user can ask the crowd if there exists a subClass relation between a given class and its superclasses.

Similar to the class relevance check, users can set the ontology domain, and choose CrowdFlower or GWAP (\uComp-Quiz"). In Figure 2 the subClass relation between dollar and currency is evaluated. Before sending to CrowdFlower, expected costs can be calculated as number of units (elements to evaluate) multiplied by number of judgments per unit and payment per judgment. 2.3

Installation and Con guration

As the uComp plugin is part of the o cial Protege repository, it can easily be installed from within Protege with File ! Check for plugins ! Downloads. To con gure and use the plugin, the user needs to create a le name ucomp api settings.txt in folder .Protege. The le contains the uComp API key6, the number of judgments per unit which we be collected, and the payment per judgment (if using CrowdFlower), for example: abcdefghijklmnopqrst,5,2 6 For API requests see tinyurl.com/mkarmk9 Detailed information about the functionality, usage and installation of the plugin is provided with the plugin documentation.

Related Work

Human computation outsources computing steps to humans, typically for problems computers can not solve (yet). Together with altruism, fun (as in GWAPs) and monetary incentives are central ways to motivate humans to participate. Early work in the eld of GWAPs was done by von Ahn [ 1 ]. Games have successfully been used for example in ontology alignment [ 6 ] or to verify class de nitions [ 3 ]. Micro-task crowdsourcing is very popular recently in knowledge acquisition and natural language processing, and has also been integrated into the popular NLP framework GATE [ 2 ]. A number of studies show that crowdworkers provide results of similar quality as domain experts [ 4, 5 ]. 4

Conclusions

In this paper we introduce a Protege plugin for validating ontological elements, and its integration into a human computation work ow. The plugin delegates validation tasks to a GWAP or to CrowdFlower and displays the results to the user. Future work includes an extensive evaluation of various aspects: HC workows in ontology engineering, quality of crowdsourcing results, and the usability of the plugin itself.

Acknowledgments. The work presented was developed within project uComp, which receives the funding support of EPSRC EP/K017896/1, FWF 1097-N23, and ANR-12-CHRI-0003-03, in the framework of the CHIST-ERA ERA-NET.

1. von Ahn , L.: Games With a Purpose . Computer 39 ( 6 ), 92 { 94 ( 2006 )

2. Bontcheva , K. , Roberts , I. , Derczynski , L. , Rout , D. : The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy . In: Proc. of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL) . ACL ( 2014 )

3. Markotschi , T. , Voelker , J.: Guess What? ! Human Intelligence for Mining Linked Data . In: Proceedings of the Workshop on Knowledge Injection into and Extraction from Linked Data (KIELD) at the International Conference on Knowledge Engineering and Knowledge Management (EKAW) ( 2010 )

4. Noy , N.F. , Mortensen , J. , Musen , M.A. , Alexander , P.R. : Mechanical Turk As an Ontology Engineer?: Using Microtasks As a Component of an Ontology-engineering Work ow . In: Proc. 5th ACM WebSci Conf . pp. 262 { 271 . WebSci '13, ACM ( 2013 )

5. Sabou , M. , Bontcheva , K. , Scharl , A. , Fols, M.: Games with a Purpose or Mechanised Labour?: A Comparative Study . In: Proc. of the 13th Int. Conf. on Knowledge Management and Knowledge Technologies . pp. 1 { 8 . i-Know ' 13 , ACM ( 2013 )

6. Siorpaes , K. , Hepp , M. : Games with a Purpose for the Semantic Web . IEEE Intelligent Systems 23 ( 3 ), 50 { 60 ( 2008 )