Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl Garcı́a-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial. Facultad de Informática, Universidad Politécnica de Madrid, Spain {mesteban,rgarcia,asun}@fi.upm.es Abstract. The SEALS European project aims to develop an infrastruc- ture for the evaluation of semantic technologies. This paper presents in detail the approach followed to automate the execution of these evalua- tions in the infrastructure. To materialize this approach, we have defined the entities managed by the infrastructure and their life cycle, the pro- cess followed to execute evaluations, the management of the computing resources that form the execution infrastructure, and how tools can be integrated with the infrastructure. 1 Introduction The SEALS European project is developing an infrastructure for the evaluation of semantic technologies, named the SEALS Platform, that will offer independent computational and data resources for the evaluation of these technologies. With the SEALS Platform users will define and execute evaluations on their own and will support the organization and execution of evaluation campaigns, i.e., worldwide activities in which a set of tools is evaluated according to a certain evaluation specification and using common test data. One of the challenges in the development of this platform is to cope with the different heterogeneous semantic technologies and with the different evaluations that could be performed over them. On the one hand, this requires reconciling heterogeneity in the technical level, where we need to execute evaluations by uniformly accessing semantic technologies with different hardware and software requirements. On the other hand, in the information level we need to achieve a common understanding of all entities that participate in the evaluation process. This paper presents how the execution of evaluations over semantic tech- nologies is performed in the SEALS Platform and the mechanisms defined and developed to achieve it. These mechanisms involve the definition of the materi- als needed during the evaluation, together with the format in which they have to be provided so they can be used within the platform, to the provision of an automated way of managing the tools to be evaluated, the evaluations to be per- formed over the tools, and the computing infrastructure where the evaluations will be carried out. This paper is structured as follows. Section 2 introduces our approach to automatically manage evaluations over semantic technologies. Section 3 gives Proceedings of the International Workshop on Evaluation of Semantic Technologies (IWEST 2010). Shanghai, China. November 8, 2010. an overview of the architecture of the SEALS Platform and section 4 presents the entities managed in the platform and their life cycles. Section 5 describes the process followed for processing an evaluation using the platform. Section 6 explains the execution infrastructure used in the SEALS Platform, that is, how the computing resources used in the platform are managed and what is the life cycle of such resources. Section 7 describes how to integrate tools with the SEALS Platform so they can be used in evaluations. Finally, section 8 provides some conclusions of this work. 2 The SEALS Approach to Software Evaluation As illustrated in Figure 1, in any evaluation a given set of tools are exercised, following the workflow defined by a given evaluation plan and using determined test data. As an outcome of this process, a set of evaluation results is produced. Tools Evaluation Results Evaluation Test data plan Fig. 1. Main entities in a software evaluation scenario. This idea of software evaluation is largely inspired by the notion of evalu- ation module as defined by the ISO/IEC 14598 standard on software product evaluation [1]. The general process of executing a software evaluation comprises: 1. To prepare all the evaluation materials, having them in the appropriate format and accessible to be used in the evaluation. 2. To prepare and configure the evaluation infrastructure, that is, to set up the computing resources where the tools under examination will be executed during the evaluation process. In this context, the infrastructure refers to both hardware and software dimensions of a computing resource. 3. To deploy the tools to be evaluated in the evaluation infrastructure. This requires installing the tool in a suitable computing resource and configuring the tool appropriately, paying special attention to the integration of the tool with required third party applications in the evaluation infrastructure. 4. To define the evaluation plan, identifying the tasks that have to be carried out and the order in which each of them is to be performed. For each task it is necessary to define the set of input test data as well as the expected results. The plan must also identify the control conditions that allow progressing between the tasks, which can be summarized as the set of the success/failure conditions for each of the tasks and also the pre and post conditions for the execution of a particular task. 5. To execute the evaluation plan to obtain the evaluation results. Clearly, all these steps could be performed manually. However, the manual evaluation of software does not scale when one tool has to be evaluated several times over time, when different tools have to be evaluated following the same evaluation plan, or when different tools have to be evaluated following different evaluation plans, which is our goal. The SEALS Platform aims to automate most of the software evaluation pro- cess. To this end we need to have: – All the materials needed in the evaluations represented in a machine-processable format and described with metadata to allow their discovery and access. – Automated mechanisms to prepare and configure the infrastructure required for executing tools with different hardware and software requirements. – Automated mechanisms to install tools in the evaluation infrastructure and to configure them. – Formal and explicit evaluation plans provided as machine-processable spec- ifications, also known as evaluation descriptions. – Means to automatically execute such evaluation plan descriptions, which includes the interaction with the tools. 3 Overview of the SEALS Platform The architecture of the SEALS Platform comprises a number of components, shown in Figure 2, each of which are described below. – SEALS Portal. The SEALS Portal provides a web user interface for inter- acting with the SEALS Platform. Thus, the portal will be used by the users for the management of the entities in the SEALS Platform , as well as for requesting the execution of evaluations. The portal will leverage the SEALS Service Manager for carrying out the users’ requests. – SEALS Service Manager. The SEALS Service Manager is the core mod- ule of the platform and is responsible for coordinating the other platform components and for maintaining consistency within the platform. This com- ponent exposes a series of services that provide programmatic interfaces for the SEALS Platform. Thus, apart from the SEALS Portal, the services of- fered may be also used by third party software agents. – SEALS Repositories. These repositories manage the entities used in the platform (i.e., test data, tools, evaluation descriptions, and results). – Runtime Evaluation Service. The Runtime Evaluation Service is used to automatically evaluate a certain tool according to a particular evaluation description and using some specific test data. Evaluation Organisers Technology Technology Providers Adopters SEALS  Portal   Entity Evaluation management requests requests Run1me   SEALS     Evalua1on   Service  Manager   Service   Software agents, i.e., technology evaluators SEALS Repositories Test  Data     Tools     Results     Evalua1on   Repository   Repository   Repository   Descrip1ons   Service   Service   Service   Repository  Service   Fig. 2. Architecture of the SEALS Platform. 4 The SEALS Evaluation Entities The high-level classification of software evaluation entities presented in section 2 can be further refined as needed. For example, in the context of SEALS, tools are classified into different types of semantic technologies according to their functional scope, namely, ontology engineering tools, ontology storage systems, ontology matching tools, etc. Similarly, it is also possible to distinguish different types of test data: persis- tent test data (those whose contents are stored in and physically managed by the evaluation platform), external test data (those whose contents reside outside the evaluation platform and whose life cycle is not controlled by it), and synthetic test data generators (pieces of software that can generate synthetic test data on-demand according to some determined configuration). In accordance with the approach followed in the IEEE 1061 standard for a software quality metrics methodology [2], evaluation results are classified ac- cording to their provenance, differentiating raw results (those evaluation results directly generated by tools) from interpreted results (those generated from other evaluation results). Besides, our entities include not only the results obtained in the evalua- tion but also any contextual information related to such evaluation, a need also acknowledged by other authors [3]. To this end, we also represent the informa- tion required for automating the execution of an evaluation description in the platform that, with the other entities presented, allows obtaining traceable and reproducible evaluation results. Finally, another type of entities are evaluation campaigns, which represent the information needed to support the organization and running of campaigns for the evaluation of different (types of) participating tools. An evaluation cam- paign contains one or more evaluation scenarios, which include the evaluation description and test data to be used in the evaluation and the tools to evaluate. Each of the abovementioned entities is composed of two different elements: the data that define the entity itself and the, for instance, a set of ontologies serialized in RDF/XML in the case of an persistent test data set; and descrip- tion of the entity, that is, the set of metadata that characterizes the entity (both generally and specifically) and enables the provision of the discovery mecha- nisms required for entity integration, consumption, and administration by the evaluation platform, i.e., in the previous example, the metadata would be the information about the purpose of the persistent test data set, which evaluations may use it, and who may access to the test data set. Different entities have different life cycles in the evaluation platform. Next, we describe the life cycles of the most relevant entities. 4.1 Life Cycle of Artifacts and Artifact Versions Tools, test data, and evaluation descriptions are defined in the platform as ar- tifacts, which are collections of artifact versions; for example, a particular tool can have a number of different tool versions that evolve over time. Figure 3 shows state diagrams for artifacts and artifact versions, including the possible states, the operations that alter the state, and the operations that retrieve the entity information (data and metadata) in dotted arrows. It can be observed that, once registered in the platform, artifacts can always be retrieved and have a single state until they are deregistered. On the other hand, artifact versions have two states, published and unpublished; in the former state artifact versions can only be retrieved, and in the latter they can only be updated. Hence, evaluations can only be performed using fixed (i.e., published) artifact versions. add update Unpublished Artifact register update Version remove publish unpublish Artifact Published deregister Artifact Version retrieve retrieve Fig. 3. Life cycle of artifacts (left) and artifact versions (right). Evaluation results (raw results and interpretations) are defined as artifacts with no version information. Besides, once registered they cannot be updated. 4.2 Life Cycle of Execution Requests Evaluation descriptions are processed by the evaluation platform through exe- cution requests. An execution request encapsulates the execution needs that a particular user has at some point in time, i.e., the evaluation description to be executed, the tools to be evaluated, the test data to be used, etc. During its life cycle, an execution request transits among eight different states, as shown in Figure 4. The starting state of an execution request is that of “pending”, which takes place whenever a new execution request is created. update create remove Pending Removed submit for execution cancel Inactive Cancelled start processing abort Processing Aborted end processing Has completed successfully? yes no Completed Failed Fig. 4. Life cycle of an execution request. At this point, the execution request can be updated, removed, or submitted for execution. Whereas the first operation does not change the state of the exe- cution request, the other two do change it: on the one hand, when the execution request is removed, the state transits to the “removed” state, a state in which no further operations are possible1 ; on the other hand, when the execution request is submitted to execution, the state transits to the “inactive” state. Beyond this point, the execution request shall not be further modified. While the execution request is inactive, two possible courses of action can take place: it can be cancelled or it can start being processed. In the first case, the state transits to the “cancelled” state, a state in which, again, no further 1 That is, the state of the execution request will not be changed beyond this point. operations are possible. The latter case takes place once the execution require- ments of the execution request are fully satisfied and, then, the state transits to the “processing” state. Once the execution request is being processed, three possible outcomes may occur: (1) The execution request may be completed successfully, and thus the state transits to the “completed” state. (2) Some failure might prevent complet- ing the execution of the execution request, causing the state to transit to the “failed” state. (3) It is also possible that the processing of the evaluation request is aborted (e.g., due to an abnormal duration time), thus forcing the state to transit to “aborted”. Regardless of the course of action, no further operations over the execution request will be carried out. As can be seen, execution requests are not disposed by the evaluation plat- form. On the contrary, regardless of the execution request’s internal state, its information is available to the user at any time, providing a complete and his- torical view of the evaluation activities over time. 5 Processing Evaluations Processing an evaluation execution request consists in executing the evaluation description associated with that request. The process required for accomplishing such task is carried out in four stages, as shown in Figure 5, namely, evaluation description analysis, execution environment preparation, evaluation description execution, and execution environment clean-up. The following subsections will cover each of the stages of this process. Execution Execution Evaluation Execution request environment description environment analysis preparation execution clean-up Fig. 5. Stages for the processing of an execution request. 5.1 Execution Request Analysis In this very first stage, the Runtime Evaluation Service analyses the execution request in order to guarantee that it can be processed, and prepares all the information that is required for driving the rest of the evaluation process. Among other things, the analysis includes: (1) Checking the evaluation de- scription, i.e., validating the syntax of the evaluation description and checking that the work-flow described is well-defined; (2) Checking if the execution re- quest arguments satisfy the evaluation description contract, i.e., verifying the availability and type of the specified entities. If any of these verifications fails (syntax, semantics, or resources), the stage will fail and thus the processing of the execution request will completely fail. Otherwise, the stage will successfully complete and trigger the next stage. 5.2 Execution Environment Preparation Once the Runtime Evaluation Service has checked that the execution request may be safely executed, it is time to prepare the execution environment, that is, to prepare the set of computing resources where the tools to be exercised during the execution of the evaluation description will be physically run. In this context, a computing resource is any network accessible computing appliance that shall be used for the on demand execution of tools, and exposes a series of mechanisms for its remote administration and usage. Examples of this are desktop PCs, workstations, servers, and even virtual machines running atop the previously mentioned appliances. Since each tool may have its own computing requirements (i.e., a determined operating system or a particular third party application) and the computing resources available will be limited, computing resources have to be reused. In order to enable the usage of the computing resources the SEALS Service Manager provides the means for tracking the availability of resources, as well as for its lease and release, which requires to provide the means for describing the characteristics of these resources so that it is easy to choose those which better fit for the execution of a given tool. In order to enable the reuse of these shared resources, the Runtime Evaluation Service will be in charge of preparing the computing resources according to the requirements of the tools under evaluation, and this will be carried out in two steps. First, the Runtime Evaluation Service will request from the SEALS Service Manager the computing resources that will be needed for executing the tools involved in the evaluation description (see section 6). Then, once the computing resources have been acquired, the Runtime Evaluation Service will have to deploy in them the tools to be used during the execution of the evaluation description, as well as to deploy any third party application required by the tools. 5.3 Evaluation Description Execution In this stage the Runtime Evaluation Service enacts the workflow defined in the evaluation description following the defined flow of control and executing the activities specified within the workflow. The execution of these activities is composed of one or more of the following steps, depending on the specific activity and the current state of execution: 1. The first step is to stage-in all the data to be used in the activity, in other words, making the data involved available in the computing resource where the activity will be executed. 2. Once the data is available it is time to execute the particular activity. The activity can imply invoking a tool’s functionality or the interpretation of raw results by means of specific software artifacts, the interpreters. 3. Regardless of the specific activity executed, the next step consists in storing the results obtained in the Results Repository. These results will be raw results if the activity executed was the invocation of a tool’s functionality, and interpreted results otherwise. 4. Finally, any data that is not going to be further used in the computing resource should be deleted and thus the storage space freed. This final step will occur even if any previous step has failed to complete. If any of these steps fails, whichever the cause, the evaluation description execution stage will fail, and thus the processing of the execution request will fail to complete. Otherwise, the stage will complete successfully and trigger the execution environment clean-up stage. 5.4 Execution Environment Clean-up In this last stage, the Runtime Evaluation Service will ensure that the shared computing resources can be reused after processing the execution request. This stage is carried out in two steps. The first step in the clean-up consists in removing the tools that have been previously deployed in each of the computing resources acquired with the ob- jective of leaving each computing resource in the same state as it was before deploying the tools. Finally, after ensuring that the computing resources used are in the same state as they were when acquired, the Runtime Evaluation Service will release them, acknowledging this to the SEALS Service Manager, who will then be able to lease again these computing resources for further reuse. 6 Evaluation Infrastructure The purpose of the SEALS Platform is the automated evaluation of semantic tools, being the Runtime Evaluation Service the component in charge of carrying out the process. Thus, this service is responsible for executing the tools that are under evaluation. SEALS deliverable D9.1 [4] defines the way in which this component will interact with them. In order to be executed, a tool may require using determined third party applications or tools, from now on referred to as modules. To this respect, the Runtime Evaluation Service identifies two types of runtime dependencies: in- ternal dependencies and external dependencies. The former type refers to those modules provided in the tool’s package, e.g., a given third party library. The latter type refers to those modules not provided in the tool’s package and thus the platform must provide in the execution environment, e.g., a DBMS. This division is aimed at solving the “deployment” issue. This way, tech- nology providers can include in the tool’s package those modules that can be deployed without user intervention, and rely on the platform for providing those modules whose deployment is much more complex, or requires user intervention. The SEALS Platform will have to publish which modules are provided in the execution environment, so that technology providers are informed about what they can use and act in consequence requesting the addition of new modules or implementing the means for deploying the module dynamically. Regardless of the type of dependency, the configuration of the runtime de- pendencies of a tool (tool wrapper to platform and vice versa) will be carried out via the package descriptor. 6.1 Management of Computing Resources To execute different types of tools under variable circumstances, it is necessary to provide different computing resources, which shall cope with some sensible as- pects related to the execution environment: operating system, computing power, memory size, storage size, and execution supporting modules. The SEALS Ser- vice Manager needs to manage these computing resources, leasing these resources as required by the Runtime Evaluation Service. Our approach for managing computing resources has been to provide an array of virtual machines that are allocated to different physical computing resources. Each of these virtual machines will have a different set of characteristics and will be used exclusively for executing a single tool at a time. The advantages of this approach are the following. First, thanks to virtualiza- tion the usage of the physical computing resources would be maximized, as any physical computing resource could execute any required virtual machine snap- shot, thus enabling the scalability of the solution. Second, the usage of physical computing resources enables: speeding up tool execution; trustfully comparing performance metrics; coping with an increasing number of required features from technology providers, since a pool of specific virtual machine snapshots could be used. Third, an allocation policy is not required, as physical computing resource sharing is not allowed. And fourth, there is no single point of failure, as each virtual computing resource would be run on its own virtualized environment. Nevertheless, this approach still has some drawbacks. First, performance is still an issue; the need for a virtual machine manager mediating between the execution environment and the physical computing resource is still a factor to take into account. Second, using the physical computing resources exclusively does not prevent resource underutilization completely. Finally, this additional mediation layer is still a risk from the architectural standpoint. 6.2 Life-cycle of Computing Resources As presented above, the computing resources that will be provisioned by the plat- form consist of virtual machines running atop a collection of physical computing resources. Each virtual machine will run a particular snapshot of a given virtual machine image. These images define both the hardware and software character- istics of the computing resource, since they define the physical resources to be used when running the virtual machine (processors, memory, disk, etc.) as well as the operating system together with the preinstalled applications. Next, we provide an overview of the life-cycle of computing resources: – Provisioning computing resources. To provision computing resources, the SEALS Service Manager needs to deal with two different entities: physical computing resources and virtual machine images. Thus, the SEALS Service Manager will maintain registries for both entities and will identify the virtual machine image that suits the requirements of the Runtime Evaluation Service and will allocate it to any of the available physical computing resources that fulfil the hardware requirements of the virtual machine image. Once the virtual machine image has been allocated to a physical computing resource, the SEALS Service Manager will start a virtual machine using the image in the physical computing resource. To do so, the physical computing resource exposes a series of management capabilities that allow deploying a virtual machine in it and controlling its life-cycle. Finally, after deploying the virtual machine, the SEALS Service Manager will hand over the control of it to the Runtime Evaluation Service. – Consuming computing resources To consume computing resources, the Runtime Evaluation Service needs mechanisms for acquiring these computer resources from the SEALS Service Manager. To this end, the Runtime Eval- uation Service shall specify the characteristics of the computing resources needed and a time window for using the requested computing resources. Once the SEALS Service Manager leases the matching computing resources, the Runtime Evaluation Service will use an entry point service for discovering the services deployed on the computing resource that shall enable the usage of the computing resource during the processing of an execution request. – Decommissioning computing resources Finally, whenever the Runtime Evaluation Service is done with a computing resource, it will release it. After being acknowledged, the SEALS Service Manager will stop the associated virtual machine, and deallocate the virtual machine image from the physical computing resource. In case that the maximum wall-clock limit is reached and the Runtime Eval- uation Service has not released the computing resource, the SEALS Service Manager will decommission the computing resource and acknowledge the decommission to the Runtime Evaluation Service. 7 Integrating Tools with the SEALS Platform The Execution Worker is the component of the Runtime Evaluation Service in charge of executing the tools used when running evaluation descriptions. In order to be usable by the Execution Worker, tools must meet certain integration criteria regarding the capabilities that have to be exposed to the platform. From a functional point of view, the Execution Worker provides a plug-in framework for allowing the dynamic usage of a priory unknown tools, which is based on the usage of an extensible interface hierarchy which defines the set of operations that the Execution Worker requires for using a certain type of tool. Thus, in order for a tool to be usable by the Execution Worker it needs to provide an entry point which implements the interfaces required according to the particular nature of the tool. This entry point, hereafter referred to as tool wrapper, is thus in charge of linking the tool to the platform, or from the opposite point of view, it decouples the Execution Worker from the tool itself. The tool wrapper provides two kinds of capabilities, namely the tool man- agement and the tool invocation capabilities. On the one hand, the management capabilities include those mechanisms that allow the integration of the tool in the evaluation infrastructure as well as for controlling the life cycle of the tool itself. In particular, the management capabilities provide the means for deploying and undeploying a tool, and for starting and stopping the tool. On the other hand, the invocation capabilities provide the mechanisms for invoking the particular functionalities that have to be provided by each particular type of tool. For the time being, the tool wrapper can be implemented in two different ways: using a set of shell scripts, or using Java applications. However, not any arbitrary implementation of the abovementioned capabili- ties might be used by the Execution Worker. In order for it to be consumed, it has to be provided in the form of a tool package bundle. The bundle consists of a ZIP file with a given directory structure that includes the binaries of the tool itself, the binaries of other applications required when running the tool, and a package descriptor, which instructs the Execution Worker about how the tool wrapper is implemented and which are its dependencies, so that the evaluation infrastructure can be properly set up. 8 Conclusions This paper presents the approach followed in the SEALS Platform for the auto- mated evaluation of different semantic technologies according to different evalu- ations. Such approach is based on the usage of machine-processable evaluation descriptions that define the evaluation plan without any ambiguity and identi- fies when the tools have to be executed, how they are executed, with which test data, and which results are to be obtained and stored. Also, by means of the metadata about all the entities involved in the evaluation process it is not just possible to discover the entities that may participate in an evaluation but also to validate the entities involved in a particular execution. Acknowledgements This work has been supported by the SEALS European project (FP7-238975). References 1. ISO/IEC: ISO/IEC 14598-6: Software product evaluation - Part 6: Documentation of evaluation modules. (2001) 2. IEEE: IEEE 1061-1998. IEEE Standard for a Software Quality Metrics Methodol- ogy. (1998) 3. Kitchenham, B.A., Hughes, R.T., Linkman, S.G.: Modeling software measurement data. IEEE Trans. Softw. Eng. 27 (2001) 788–804 4. Miguel Esteban Gutiérrez: Design of the architecture and interfaces of the Runtime Evaluation Service. Technical report, SEALS Project (2009)