-

Evaluation of Ontologies and Ontology-based tools

Workshop Organizers:

Denny Vrandecic

Raúl García-Castro

Asunción Gómez Pérez

York Sure

Zhisheng Huang

31 72

Workshop 6

ISWC 2007 Sponsor Emerald Sponsor Gold Sponsor Silver Sponsor We would like to express our special thanks to all sponsors ISWC 2007 Organizing Committee General Chairs Riichiro Mizoguchi (Osaka University, Japan) Guus Schreiber (Free University Amsterdam, Netherlands) Local Chair Sung-Kook Han (Wonkwang University, Korea) Program Chairs Karl Aberer (EPFL, Switzerland) Key-Sun Choi (Korea Advanced Institute of Science and Technology) Natasha Noy (Stanford University, USA) Workshop Chairs Harith Alani (University of Southampton, United Kingdom) Geert-Jan Houben (Vrije Universiteit Brussel, Belgium) Tutorial Chairs John Domingue (Knowledge Media Institute, The Open University) David Martin (SRI, USA) Semantic Web in Use Chairs Dean Allemang (TopQuadrant, USA) Kyung-Il Lee (Saltlux Inc., Korea) Lyndon Nixon (Free University Berlin, Germany) Semantic Web Challenge Chairs Jennifer Golbeck (University of Maryland, USA) Peter Mika (Yahoo! Research Barcelona, Spain) Poster & Demos Chairs Young-Tack, Park (Sonngsil University, Korea) Mike Dean (BBN, USA) Doctoral Consortium Chair Diana Maynard (University of Sheffield, United Kingdom) Sponsor Chairs Young-Sik Jeong (Wonkwang University, Korea) York Sure (University of Karlsruhe, German) Exhibition Chairs Myung-Hwan Koo (Korea Telecom, Korea) Noboru Shimizu (Keio Research Institute, Japan) Publicity Chair: Masahiro Hori (Kansai University, Japan) Proceedings Chair: Philippe Cudré-Mauroux (EPFL, Switzerland Metadata Chairs Tom Heath ( KMi, OpenUniversity, UK) Knud Möller (DERI, National University of Ireland, Galway) EON 2007 Organizing Committee

Raúl García-Castro (Universidad Politécnica de Madrid, Spain) Denny Vrandecic (AIFB, Universität Karlsruhe (TH), Germany) Asunción Gómez-Pérez (Universidad Politécnica de Madrid, Spain) York Sure (EON series inventor), (SAP Research, Germany) Zhisheng Huang (Vrije University of Amsterdam, The Netherlands)

EON 2007 Program Committee

Mathieu D'Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta:

Characterizing Knowledge on the Semantic Web with Watson Paul Buitelaar, Thomas Eigner:

Evaluating Ontology Search Ameet Chitnis, Abir Qasem, Jeff Heflin:

Benchmarking Reasoners for Multi-Ontology Applications Sourish Dasgupta, Deendayal Dinakarpandian, Yugyung Lee:

A Panoramic Approach to Integrated Evaluation of

Ontologies in the Semantic Web Willem Van Hage, Antoine Isaac, Zharko Aleksovski:

Sample Evaluation of Ontology-Matching Systems Yuangui Lei, Andriy Nikolov:

Detecting Quality Problems in Semantic Metadata without the Presence of a Gold Standard Vojtech Svatek, Ondrej Svab:

Tracking Name Patterns in OWL Ontologies page

1 11 21 31 41 51 Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc,

Sofia Angeletou, Marta Sabou, and Enrico Motta?

Knowledge Media Institute (KMi), The Open University, United Kingdom {m.daquin,c.baldassarre,l.gridinoc,s.angeletou,r.m.sabou,e.motta}@open.ac.uk Abstract. Watson is a gateway to the Semantic Web: it collects, analyzes and gives access to ontologies and semantic data available online with the objective of supporting their dynamic exploitation by semantic applications. We report on the analysis of 25 500 ontologies and semantic documents collected by Watson, giving an account about the way semantic technologies are used to publish knowledge on the Web, about the characteristics of the published knowledge, and about the networked aspects of the Semantic Web. Our main conclusions are 1- that the Semantic Web is characterized by a large number of small, lightweight ontologies and a small number of large-scale, heavyweight ontologies, and 2- that important efforts still need to be spent on improving the published ontologies (coverage of different topic domains, connectedness of the semantic data, etc.) and the tools that produce and manipulate them. 1

Introduction The vision of a Semantic Web, “an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” [ 3 ], is becoming more and more a reality. Technologies like RDF and OWL, allowing to represent ontologies and information in a formal, machine understandable way are now well established. More importantly, the amount of knowledge published on the Semantic Web – i.e, the number of ontologies and semantic documents available online – is rapidly increasing, reaching the critical mass required to enable the vision of a truly large scale, distributed and heterogeneous web of knowledge.

In a previous paper [ 4 ], we presented the design and architecture of Watson, a gateway to the Semantic Web. Watson is a tool and an infrastructure that automatically collects, analyses and indexes ontologies and semantic data available online in order to provide efficient access to this knowledge for Semantic Web users and applications. Besides enabling the exploitation of the Semantic Web, Watson can be seen as a research platform supporting the exploration of ? This work was funded by the Open Knowledge and NeOn projects sponsored under EC grant numbers IST-FF6-027253 and IST-FF6-027595 the Semantic Web to better understand its characteristics. This paper reports on the use of this infrastructure to provide quantitative indications about the way semantic technologies are used to publish knowledge on the Web, about the characteristics of the knowledge available online, and about the way ontologies and semantic documents are networked together.

A number of researchers have already produced analyses of the Semantic Web landscape. For example, [ 6 ] presents an analysis of 1 300 ontologies looking in particular at the way ontology language primitives are used, and at the distribution of ontologies into the three OWL species (confirming results already obtained in [ 2 ]). In [ 5 ], the authors of Swoogle present an analysis of the semantic documents collected by Swoogle. The forthcoming section shows complementary results to the ones presented in both these studies, based on a set of almost 25 500 semantic documents collected by Watson. In particular, in comparison with [ 5 ] that focuses on the Web aspects of the Semantic Web (number of files, provenance in terms of website and internet domain, RDF(S) primitive usage, etc.), we consider a more “Semantic Web” centric view, by providing an insight on characteristics like the expressiveness of the employed ontology languages, the structural and domain-related coverage characteristics of semantic documents, and their interconnections in a knowledge network. 2

Characterizing Knowledge on the Semantic Web with Watson Below, we report on some of the results that have been obtained by collecting, validating and analyzing online ontologies and semantic documents. We focus on three main aspects in this study: the usage of semantic technologies to publish knowledge on the Web (Section 2.1), the characteristics of the knowledge published (Section 2.2) and the connectedness of semantic documents (Section 2.3).

Different sources are used by the Watson crawler to discover ontologies and semantic data (Google, Swoogle1, Ping the Semantic Web.com2, etc.) Once located and retrieved, these documents are filtered to keep only valid RDF based documents (by using Jena3 as a parser). In addition, we have chosen to exclude RSS and FOAF files from the analysis. The main reason to exclude these documents is that RSS and FOAF together represent more than 5 times the number of other RDF documents in our collection. These two vocabularies being dedicated to specific applications, we believe that they would have introduced a bias in our characterization and therefore, that they should be studied separately. We consider here a set of almost 25 500 semantic documents collected by Watson. 1 http://swoogle.umbc.edu/ 2 http://pingthesemanticweb.com/ 3 http://jena.sourceforge.net/ 2.1

Usage of Semantic Technologies Semantic technologies such as OWL and RDF are now well established and commonly used by many developers. In this section, we look at the details of how the features provided by Semantic Web languages are exploited to describe ontologies and semantic data on the Web.

OWL RDF

RDF-S (a)

6200 DAML+OIL

1500 1700 22200

OWL

DL OWL 6% Lite 13% (b)

OWL Full 81%

Representation Languages. Watson implements a simple, but restrictive language detection mechanism. It is restrictive in the sense that it considers a document to employ a particular language only if this document actually instantiates an entity of the language vocabulary (any kind of description for RDF, a class for RDF-S, and a class or a property for OWL and DAML+OIL). Figure 1(a) provides a visualization of the results of this language detection mechanism applied on the entire set of semantic documents collected by Watson. A simple conclusion that can be drawn from this diagram is that, while the majority of these documents are exclusively considering factual data in RDF, amongst the ontology representation languages (RDF-S, OWL and DAML+OIL), OWL seems to have been adopted as standard. Another element that is worth to consider is the overlap between these languages. Indeed, our detection mechanism only considers a document to employ two different languages if it actually declares entities in both languages. For example, a document would be considered as being written in both RDF-S and OWL if it contains the definition of an owl:Class or an owl:Property, together with the definition of an rdfs:Class. According to this definition, the use of RDF-S properties like rdfs:label is not sufficient to consider the document as being written in RDF-S. Combining entities from two different meta-models, like for example OWL and RDF-S, can be problematic for the tools that manipulate the ontology (in particular, the inference mechanisms can become undecidable). These considerations have been taken into account in the design of OWL. As a consequence, unlike DAML+OIL documents, most of the OWL documents only employ OWL as an ontology language, leading to cleaner and more exploitable ontologies (see Figure 1(a)).

OWL is divided into three sub-languages, OWL Lite, OWL DL, and OWL Full, that represent different (increasing) levels of complexity. In this respect, the results obtained on the proportion of OWL documents of the three species are surprising (see Figure 1(b)): a large majority of the OWL ontologies are OWL Full. This confirms the results obtained by Wang et al. in [ 6 ] on a set of 1 300 ontologies. The explanation provided in [ 6 ] is that most ontologies fall into the OWL Full category because of simple syntactic mistakes. This intuition that documents are considered as OWL Full ontologies not because they use the expressive power of this sub-language is confirmed in the next paragraph, which looks at the expressiveness employed by ontologies.

Expressiveness. The Pellet reasoner4 provides a mechanism to detect the level of expressiveness of the language employed in an ontology in terms of description logics (DLs). DLs are named according to the constructs they provide to describe entities, and so, to their expressive power. For example, the DL of OWL Lite is ALCR+HIF (D), meaning for example that it allows the description of inverse relations (I) and of limited cardinality restrictions (F ).

Total OWL OWL Full DL Nb Documents DL Nb Documents DL Nb Documents AL(D) 21375 (84%) AL(D) 3644 (59%) AL(D) 3365 (78%) AL 2455 (10%) AL 1406 (23%) AL 281 (6.5%) ALH(D) 293 (1%) ALCF (D) 105 (1.5%) ALCF (D) 68 (1.5%) ALCF (D) 105 (<1%) ALC 94 (1.5%) ALH(D) 44 (1%) ALH 102 (<1%) ALH(D) 54 (<1%) ALCOF (D) 28 (<1%) ALC 101 (<1%) ALCOF (D) 43 (<1%) ALC 27 (<1%)

Using this mechanism allows us to assess the complexity of semantic documents, i.e., how they employ the expressive power provided by ontology representation languages. Indeed, the analysis presented in Table 1 shows that the advanced features provided by the ontology representation languages are rarely used. AL is the smallest DL language that can be detected by Pellet. Only adding the use of datatypes (D) and of hierarchies of properties (H) to AL is sufficient to cover 95% of the semantic documents. It is worth mentioning that these two elements are both features of RDF-S. 4 http://www.mindswap.org/2003/pellet/

Looking at the results for OWL and OWL Full ontologies (second and third parts of Table 1), it appears that the division of OWL in Lite, DL and Full, which is based on the complexity and on the implementation cost, is not reflected in practice. Indeed, the fact that most OWL Full ontologies employ only very simple features confirms the intuition expressed in the previous paragraph: while these ontologies would get the disadvantages of using OWL Full, they do not actually exploit its expressiveness. Moreover, while one of the most popular feature of OWL, the possibility to build enumerated classes (O), is only permitted in OWL DL, transitive and functional properties (R+), which are features of OWL Lite, are rarely used.5 2.2

Structural and Topic Coverage Characteristics of Knowledge on the Semantic Web One important aspect to consider for the exploitation of the Semantic Web concerns the characteristics of the semantic documents in terms of structure and topic coverage. In this section, we report on the analysis of these aspects from the data provided by the Watson repository with the objective of helping users and developers in knowing what they can expect from the current state of the Semantic Web.

20000 15000 10000 5000 0 <10

10-100 (a) 1001000 >1000 7000 6000 5000 4000 3000 2000 1000 0 1-10

Size. As already mentioned, Watson has collected almost 25 500 distinct semantic documents (by distinct we mean that if the same file appears several times, it is counted only once, see Section 2.3). Within these documents, about 1.1 million distinct entities (i.e. classes, properties, and individuals having different URIs) have been extracted. 5 Considering only features not handled by RDF-S (i.e. excluding ALH(D)), O is the third most used feature of OWL with 236 ontologies, after C (748) and F (598), while R+ is last with only 31 ontologies.

An interesting information that can be extracted from this analysis is that ontologies on the Semantic Web are generally of very small size. Indeed, the average number of entities in semantic documents is around 43, that is far closer to the minimum size of semantic documents (1 entity) than to the bigger one (more than 28 000 entities). Looking more in detail, it can be seen that the Semantic Web is in fact characterized by a large number of very small documents, and a small number of very large ones (see Figure 2(a)). It is worth mentioning that, as shown in Figures 2(b) and 2(c), this observation is valid for both ontological knowledge and factual data.

Measures Total number of classes Total number of properties Total number of individuals Total number of domain relations Total number of sub-class relations Total number of instance relations average P-density (number of properties per class) average H-density (number of super-classes per class) average I-density (number of instances per class)

Density. One way to estimate the richness of the representation in semantic documents is to rely on the notion of density. Extending the definition provided by [ 1 ], we consider the density of a semantic entity to be related to its interconnection with other entities. Accordingly, different notions of density are considered: the number of properties for each class (P-density), the number of super-classes for each class (H-density), and the number of instances for each class (I-density). In the case of P-Density, a class is considered to possess a property if it is declared as the domain of this property. It is worth mentioning that none of these measures takes inheritance into consideration: only directly stated relations are counted. Computing these measures on the whole Watson repository (see Table 2) allows us to conclude that, on average, ontology classes are described in a lightweight way (this correlates with the results obtained in the previous section concerning the expressiveness of the employed language). More precisely, the P-density and H-density measures tend to be low on average, in particular if compared to their maximum (17 and 47 respectively). Moreover, it is often the case that ontologies would contain a few “central”, richly described classes. This characteristic cannot be captured by simply looking at the average density of the collected entities. Therefore, we looked at the maximum density within one ontology (i.e. the density of the densest class in the ontology). The average maximum P-density in ontologies that contain domain relations is still low (1.1), meaning that, in most cases, classes may at most possess only 1 property, if any. Similar results are obtained for H-density (1.2 average maximum H-density in ontologies having sub-class relations).

Another straightforward conclusion here is that the amount of instance data is much bigger than the amount of ontological knowledge in the collected semantic documents. It is expected that the Semantic Web as a whole would be built on a similar ratio of classes, properties and individuals, requiring ontology based tools to handle large repositories of instances.

Topic Coverage. Understanding the topic coverage of the Semantic Web, i.e. how ontologies and semantic documents relate to generic topic domains like health or business, is of particular importance for the development of semantic applications. Indeed, even if it has already been demonstrated that the Semantic Web is rapidly growing [ 5 ], we cannot assume that this increase of the amount of online knowledge has been achieved in the same way for every application domain.

The Watson analysis task includes a mechanism that categorizes ontologies into the 16 top groups of DMOZ6. Each category is described by a set of weighted terms, corresponding to the name of its sub-categories in DMOZ. The weight 1 1 w(t) = l(t) × f(t) of a term t is calculated using the level l(t) of the corresponding sub-category in DMOZ and the number of times f (t) the term is used as a subcategory name. In this way, a term would be considered as a good descriptor for the category (has a high weight) if it is high in the corresponding sub-hierarchy and if is is rarely used to describe other categories. The level of coverage of a given ontology to a given category then corresponds to the sum of the weight of the terms that match (using a simple lexical comparison) entities in the ontology. trsepu itcyeo m S o C s s s isen trA u B g n i p p s s ohS idK eaGm ileaogn itreaon

R c e R

This simple mechanism allows us to compute a rough overview of the relative coverage of these 16 high level topics on the Semantic Web. Among the semantic documents collected by Watson, almost 7 000 have been associated to one or several topics (have a non null level of coverage on some topics). Figure 3 describes the relative coverage of the 16 considered topics. In this figure, the y axis corresponds to the sum of the levels of coverage of all ontologies for the considered topic. The actual numbers here are not particularly significant, as we 6 http://dmoz.org/ are more interested in the differences in the level of coverage for different topics. As expected, it can be seen that, while some topics are relatively well covered (e.g. computers, society, business), others are almost absent from the collected semantic documents (home, adult, news). Also, when comparing these results to the distribution of web documents within the DMOZ hierarchy, it is interesting to find that, according to this categorization, the coverage of these topics on the “classical Web” is also rather unbalanced (with categories varying from 31 294 to 1 107 135 documents), but that the order of the topics according to coverage is very different (computers for example is the 6th category in coverage).

Finally, by looking at the level of coverage of each ontology, the power law distribution that has been found for other characteristics (size, expressiveness) also applies here: a few semantic documents have a high level of coverage, often with respect to several topics, whereas the large majority have a very low level of coverage, with respect to one or two topics only. 2.3

The Knowledge Network While the Web can be seen as a network of documents connected by hyperlinks, the Semantic Web is a network of ontologies and semantic data. This aspect also needs to be analyzed, looking at the semantic relations linking semantic documents together.

Connectedness. Semantic documents and ontologies are connected through references to their respective namespaces. While the average number of references to external namespaces in the documents collected by Watson seems surprisingly high (6.5), it is interesting to see that the most referenced namespaces are very often hosted under the same few domains (w3.org, stanford.edu, ontoworld.org, etc.)7 This seems to indicate that a small number of large, dense “nodes” tend to provide the major part of the knowledge that is reused.

Another element of importance when considering the inter-connection between online semantic data is whether the URIs used to describe entities are dereferenceable, i.e., wether the namespaces to which they belong correspond to an actual location (a reachable URL) from which descriptions of the entities can be retrieved. Several applications, like Tabulator8 or the Semantic Web Client Library 9 are indeed based on this assumption: that the Semantic Web can be traversed through dereferenceable URIs. However, among the semantic documents that explicitly declare their namespace, only about 30% correspond to actual locations of semantic documents, which means that these applications can only access a restricted part of the Semantic Web.

Redundancy. As in any large-scale distributed environment, redundancy is inevitable on the Semantic Web and actually contributes to its robustness: it 7 It is important to remark here that the references to the namespaces of the representation languages, such as RDF and OWL, were not counted. 8 http://www.w3.org/2005/ajar/tab 9 http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/ is useful for an application to know that the semantic resources it uses can be retrieved from alternative locations in case the one it relies on becomes unreachable. As already mentioned, the 25 500 documents collected by Watson are distinct, meaning that if the same file is discovered several times, it is only stored and analyzed once, even if Watson would keep track of all its locations. On average, every semantic document collected by Watson can be found in 1.27 locations, meaning that around 32 350 URLs actually address semantic data or ontologies. Ingnoring this simple phenomenon, like it is the case for example with the analysis described in [ 5 ], would have introduced an important bias in our analysis.

At a more fine-grained level, descriptions of entities can also be distributed and do not necessarily exist in a single file. Pieces of information about the same entity, identified by its URI, can be physically declared at different locations. Indeed, among the entities collected by Watson, about 12% (approximately 150 000) are described in more than one place.

URI duplication. In theory, if two documents are identified by the same URI, they are supposed to contribute to the same ontology, i.e. the entities declared in these documents are intended to belong to the same conceptual model. This criterion is consistent with the distributed nature of the Semantic Web in the sense that ontologies can be physically distributed among several files, on different servers. However, even if this situation appears rarely (only 60 URIs of documents are “non unique”), in most cases, semantic documents that are identified by the same URI are not intended to be considered together. We can distinguish different situations leading to this problem: Default URI of the ontology editor. http://a.com/ontology is the URI of 20 documents that do not seem to have any relation with each other, and that are certainly not meant to be considered together in the same ontology. The reason for this URI to be so popular is that it is the default namespace attributed to ontologies edited using (some of the versions) of the OWL Plugin of the Prot´eg´e editor10. Systematically asking the ontology developer to give an identifier to the edited ontology, like it is done for example in the SWOOP editor11, could avoid this problem.

Mistaken use of well known namespaces. The second most commonly shared URI in the Watson repository is http://www.w3.org/2002/07/owl, which is the URI of the OWL schema. The namespaces of RDF, RDF Schema, and of other well known vocabularies are also often duplicated. Using these namespaces as URIs for ontologies is (in most cases) a mistake that could be avoided by checking, prior to giving an identifier to an ontology, if this identifier has already been used in another ontology.

Different versions of the same ontology. A third common reason for which different semantic documents share the same URI is in situations where an ontology evolves to a new version, keeping the same URI (e.g., http:// lsdis.cs.uga.edu/proj/semdis/testbed/). As it is the same ontology, it 10 http://protege.stanford.edu/ 11 http://www.mindswap.org/2004/SWOOP/ seems natural to keep the same URI, but in practice, this can cause problems in these cases where different versions co-exist and are used at the same time. This leads to a need for recommendations of good practices on the identification of ontologies, that would take into account the evolution of the ontologies, while keeping different versions clearly separated. 3

Conclusion The main motivation behind Watson is that the Semantic Web requires efficient infrastructures and access mechanisms to support the development of a new kind of applications, able to exploit dynamically the knowledge available online. We believe that a better understanding of the current practices concerning the fundamental characteristics of the Semantic Web is required. In this paper, we have reported on the analysis of the 25 500 distinct semantic documents collected by Watson, giving an account about the way semantic technologies are used to publish knowledge on the Web, about the characteristics of the published knowledge, and about some of the networked aspects of the Semantic Web. Our main conclusions are 1- that the Semantic Web is characterized by a large number of small, lightweight ontologies and a small number of large-scale, large-coverage and heavyweight ontologies, and 2- that important efforts still need to be spent on improving published ontologies (coverage of different domains, connectedness of the semantic data, etc.) and the tools that produce and manipulate them.

Many other aspects and elements could have been analyzed, and the research work presented here can be seen as a first step towards a more complete characterization of the Semantic Web. In particular, we only considered the characterization of the current state of the Semantic Web, analyzing a snapshot of the online semantic documents that represent the Watson repository. In the future, we plan to also consider the dynamics of the Semantic Web, looking at how the considered characteristics evolve over time. Evaluating Ontology Search

Paul Buitelaar, Thomas Eigner German Research Center for Artificial Intelligence (DFKI GmbH) Language Technology Lab & Competence Center Semantic Web

Stuhlsatzenhausweg 3 Saarbrücken, Germany

paulb@dfki.de Abstract. As more and more ontologies are being published on the Semantic Web, selecting the most appropriate ontology will become an increasingly important subtask in Semantic Web applications. Here we present an approach towards ontology search in the context of OntoSelect, a dynamic web-based ontology library. In OntoSelect, ontologies can be searched by keyword or by document. In keyword-based search only the keyword(s) provided by the user will be used for the search. In document-based search the user can provide either a URL for a web document that represents a specific topic or the user simply provides a keyword as the topic which is then automatically linked to a corresponding Wikipedia page from which a linguistically/statistically derived set of most relevant keywords will be extracted and used for the search. In this paper we describe an experiment in evaluating the document-based ontology search strategy based on an evaluation data set that we constructed specifically for this task. 1

Introduction A central task in the Semantic Web effort is the semantic annotation or knowledge markup of data (textual or multimedia documents, structured data, etc.) with semantic metadata as defined by one or more ontologies. The added semantic metadata allow for automatic processes (agents, web services, etc.) to interpret the underlying data in a unique and formally specified way, thereby enabling autonomous information processing. As ontology-based semantic metadata are in fact class descriptions, the annotated data can be extracted as instances for these classes. Hence, another way of looking at ontology-based semantic annotation is as ontology population.

Most of current work in ontology-based semantic annotation assumes ontologies that are typically developed specifically for the task at hand. Instead, a more realistic approach would be to access an ontology library and to select one or more appropriate ontologies. Although the large-scale development and publishing of ontologies is still only in a beginning phase, many are already available. To select the most appropriate ontology (or a combination of complementary ontologies) will therefore be an increasingly important subtask of Semantic Web applications.

Until very recently the solution to this problem was supposed to be handled by foundational ontology libraries [ 1,2 ]. However, in recent years, dynamic web-based ontology libraries and ontology search engines like OntoKhoj [ 3 ], OntoSelect [ 4 ], SWOOGLE [ 5 ] and Watson [ 6 ] have been developed that enable a more data-driven approach to ontology search and retrieval.

In OntoSelect, ontologies can be searched by keyword or by document. In keywordbased search only the keyword(s) provided by the user will be used for the search. In document-based search the user can provide either a URL for a web document that represents a specific topic or the user simply provides a keyword as the topic which is then automatically linked to a corresponding Wikipedia page from which a linguistically/statistically derived set of most relevant keywords will be extracted and used for the search. In this paper we describe an experiment in evaluating the document-based ontology search strategy based on an evaluation data set that we constructed specifically for this task.

The remainder of the paper is structured as follows. Section 2 gives a brief overview of the content and functionality of the OntoSelect ontology library. Section 3 presents a detailed overview of the ontology search algorithm and scoring method used. Section 4 presents the evaluation benchmark, experiments and results. Finally, section 5 presents some conclusions and gives an outlook on future work 2

The OntoSelect Ontology Library OntoSelect is a dynamic web-based ontology library that collects, analyzes and organizes ontologies published on the Semantic Web. OntoSelect allows browsing of ontologies according to size (number of classes, properties), representation format (DAML, RDFS, OWL), connectedness (score over the number of included and referring ontologies) and human languages used for class- and object property-labels. OntoSelect further includes an ontology search functionality as described above and discussed in more detail in the following sections.

OntoSelect uses the Google API to find published ontologies on the web in the following formats: DAML, OWL and RDFS. Jena is used for reading and analyzing the ontologies. In the case of OWL, OntoSelect also determines its type (Full, DL, Lite) and indexes this information accordingly. Each class and object property defined by the ontology is indexed with reference to the ontology in which it occurs. Correspondingly, each label is indexed with reference to the corresponding ontology, class or object property, the human language of the label (if available), and a normalized label name, e.g. TaxiDriver is normalized to “taxi driver”. Object properties are handled similarly as classes except that also information on their type (functional, transitive, symmetric) is indexed. Finally, a separate index is build up in which we keep track of the distribution of labels over all of the collected ontologies. In this way, a ranked list of frequently used labels can be maintained and browsed by the user. 3 3.1

Ontology Search

Ontology Search Measures and Criteria The ontology search problem is a very recent topic of research, which only originated with the growing availability of ontologies on the web. A web-based ontology, defined

Fig. 1. Browsing ontologies in OntoSelect by representation languages such as OWL or RDFS, is in many respects just another web document that can be indexed, stored and retrieved. On the other hand, an ontology is a highly structured document with possibly explicit semantic links to other ontologies. The OntoSelect approach is based on both observations by ranking ontologies by coverage, i.e. the overlap between query terms and index terms; by structure, i.e. the ratio of class vs. property definitions; and by connectedness, i.e. the level of integration between ontologies.

Other approaches have similarly stressed the importance of such measures, e.g. [ 7 ] describe the “Class Match”, “Density”, “Semantic Similarity” and “Betweenness” measures. The Class Match and Density measures correspond roughly to our coverage and structure measure, whereas the Semantic Similarity and Betweenness measure the semantic weight of query terms relative to the different ontologies that are to be ranked. These last two measures are based on the assumption that ontologies are well-structured with equal semantic balance throughout all constitutive parts, which unfortunately is only seldom the case and we therefore do not take such measures into account.

Another set of measures or rather criteria for ontology search has been proposed by [ 8 ]. The focus here is more on the application of found ontologies and therefore includes such criteria as: ‘modularization’ (can retrieved ontologies be split up in useful modules); ‘returning ontology combinations’ (can retrieved ontologies be used in combination); ‘dealing with instances’ (do retrieved ontologies include instances as well as classes/properties).

These criteria are desirable but are currently not central to the OntoSelect approach and to this paper. Our focus is rather on providing data-driven methods for finding the best matching ontology for a given topic and on providing a proper evaluation of these methods. 3.2

Ontology Search in OntoSelect Ontology ranking in OntoSelect is based on a combined measure of coverage, structure and connectedness of ontologies as discussed above. Further, OntoSelect provides automatic support in ontology ranking relative to a web document instead of just one or more keyword(s). Obviously this allows for a much more fine-grained ontology search process.

For a given document as search query, OntoSelect first extracts all textual data and analyses this with linguistic tools (i.e. ‘part-of-speech tagger’ and ‘morphological analysis’) to extract and normalize all nouns in the text as these can be expected to represent ontology classes rather than verbs, adjectives, etc. The frequencies of these nouns in the query document is then compared with their frequencies in a reference corpus - consisting of a large collection of text documents on many different topics and covering a large section of the English language - to estimate the relevance for each noun based on how often it is expected to appear in a more general text of the same size. Chi-square is used to estimate this relevance score (see also Coverage score below). Only the top 20 nouns are used further in the search process as extracted keywords.

To calculate the relevance of available ontologies in OntoSelect, the set of 20 extracted keywords is used to compute three separate scores (coverage, structure, connectedness) and a combined score as described below: Coverage: How many of the terms in the document are covered by the labels in the ontology? To estimate the coverage score, OntoSelect iterates over all ontologies containing at least one label (either the original label name or the normalized label name) occurring in the top 20 keyword list of the search document. For each label occurring in the document, OntoSelect computes its relevance, with which the coverage score of an ontology O is calculated.

QD = Query Document KW = Set of extracted keywords of QD

OL = Set of labels for ontology O Ref C = Reference Corpus Expk = RefCk jQDj 2(k) = jQRDefkCjExpk

Expk coverage(O; QD) = Pk2KW (QD)\OL(O) 2(k) Detecting Quality Problems in Semantic Metadata without the Presence of a Gold

Standard

Yuangui Lei, Andriy Nikolov, Victoria Uren, and Enrico Motta Knowledge Media Institute (KMi), The Open University, Milton Keynes, {y.lei, a. nikolov, v.s.uren, e.motta}@open.ac.uk Abstract. Detecting quality problems in semantic metadata is crucial for ensuring a high quality semantic web. Current approaches are primarily focused on the algorithms used in semantic metadata generation rather than on the data themselves. They typically require the presence of a gold standard and are not suitable for assessing the quality of semantic metadata. This paper proposes a novel approach, which exploits a range of knowledge sources including both domain and background knowledge to support semantic metadata evaluation without the need of a gold standard. We have conducted a set of preliminary experiments, which show promising results. 1 Because poor quality data can destroy the effectiveness of semantic web technology by hampering applications from producing accurate results, detecting quality problems in semantic metadata is crucial for ensuring a high quality semantic web. State-of-art approaches are primarily focused on the assessment of algorithms used in data generation rather than on the data themselves. Examples include the GATE evaluation model [ 3 ], the learning accuracy (LA) metric model [ 2 ], and the balanced distance metric (BDM) model [ 11 ].

As pointed out by [ 5 ], semantic metadata evaluation differs significantly from metadata generation algorithms. In particular, the gold standard based approaches that are often used in algorithm evaluation are not suitable for two main reasons. First, it is simply not feasible to obtain gold standards from all the data sources involved, especially, when the semantic metadata are large scale. Second, the gold standard based approaches are not applicable to dynamic evaluation, where the process needs to take place on the fly without prior knowledge about data sources.

The approach proposed in this work addresses this issue by exploiting a range of available knowledge sources. In particular, two types of knowledge source are used. One is the knowledge sources that are available in the problem domain, including ontologies. The other type is background knowledge, which includes knowledge sources that are available globally for all applications, e.g., knowledge sources published on the (Semantic) Web. A set of preliminary experiments have been conducted, which indicate promising results.

The rest of the paper is organized as follows. We begin in Section 2 by describing the motivation of this work in the context of a use scenario. We then present an overview of the approach in Section 3. Next in Section 4 and Section 5, we describe how to exploit each type of knowledge to support the evaluation task. We then describe the settings and the results of the experiments we carried out in this work in Section 6. Finally, we conclude with the key contributions and future work in Section 7. 2

Motivating Scenario: Ensuring High Quality for

Semantic Metadata Acquisition This work was motivated by our work on building a Semantic Web (SW) portal for KMi that would provide an integrated access to resources about various aspects of the academic life of our lab1. The relevant data is spread over several different data sources such as departmental databases, knowledge bases and HTML pages. In particular, KMi has an electronic newsletter2, which describes events of significance to the KMi members. New entries are kept being added to the archive.

There are two essential activities involved in the portal, including i) extracting named entities (e.g., people, organizations, projects, etc.) from news stories in an automatic manner and ii) verifying the derived data to ensure that only data at high quality proceeds to the semantic metadata repository. Both activities take place dynamically on a continuous basis whenever new information becomes available. In particular, the involved data source is unknown to the portal prior to the metadata acquisition process. Hence, traditional gold standard based evaluation approaches are not applicable, as pre-constructing gold standards is simply not possible.

Please note that although it is drawn from the context of semantic metadata acquisition, the scenario also applies to generic semantic web applications, where evaluation often needs to be performed in an automated manner in order to filter out poor quality data dynamically whenever intermediate results are produced. 3

An Overview of the Proposed Approach The goal of the proposed approach is to automatically detect data deficiencies in semantic metadata without having to construct gold standard data sets. It was inspired by our previous work ASDI [ 9 ], which employs different types of knowledge sources to verify semantic metadata. We extend this method towards a more powerful mechanism to support the checking of data quality by exploiting more types of knowledge sources and by addressing more types of data deficiencies. 1 http://semanticweb.kmi.open.ac.uk 2 http://kmi.open.ac.uk/news

Fig. 1. An Overview of the Proposed Evaluation Approach

Figure 1 shows an overview of the proposed approach. In the following sub sections, we first describe the deficiencies addressed by the proposed approach. We then clarify the knowledge sources used in detecting quality problems. 3.1

Data Deficiencies Addressed To clarify, we define semantic metadata as RDF triples that describe the meaning of data sources (i.e. semantic annotations) or denote real world objects (e.g., projects and publications) using the specified ontologies. In our previous work, we have developed a quality framework, called SemEval [ 13 ], which has identified a set of important data deficiencies that occur in semantic metadata, including: – Incomplete annotation, which defines the situation where the mapping from the objects described in the data source to the instances contained in the semantic metadata set is not exhaustive. – Inconsistent annotation, which denotes the situation where entities are inconsistent with the underlying ontologies. For example, an organization ontology may define that there should be only one director for an organization. The inconsistency problem occurs when there are two directors in the semantic metadata set. – Duplicate annotation, which describes the deficiency in which there is more than one instance referring to the same object. An example situation is that the person Clara Mancini is annotated as two different instances, for example Clara Mancini and Clara. – Ambiguous annotation, which expresses the situation where an instance of the semantic metadata set can be mapped back to more than one real world object. One example would be the instance John (of the class Person) in the context where there are several people described in the same document who have the name. – Inaccurate annotation, which defines the situation where the object described by the source has been correctly picked up but not accurately described. An extreme scenario in this category is mis-classification, where the data object has been successfully picked up and been associated with a wrong class. An example would be the Organization instance Sun Microsystem marked as a person. – Spurious annotation, which defines the deficiency where there is no object to be mapped back to for an instance. For example, the string “Today” annotated as a person.

The proposed approach is designed to address all these data deficiencies except the first one. This is because the approach concentrates especially on the quality status of the semantic annotations that are already contained in the given semantic metadata set. 3.2 As shown in Figure 1, two types of knowledge sources are exploited to support the evaluation task, namely domain knowledge and background knowledge.

Domain knowledge. Three types of knowledge sources are often available in the problem domain: i) domain ontologies, which model the problem domain and offer rules and constraints for detecting conflicts and inconsistencies contained in the evaluated data set; ii) semantic data repositories, which contain facts of the problem domain that can be looked upon to examine problems like inaccuracy, ambiguity, and inconsistency; and iii) lexical resources, which contain domain specific lexicons that can be used to link the evaluated semantic metadata with specific domain entities. As will be detailed in Section 4, domain knowledge is employed to detect inconsistency, duplicate, ambiguous and inaccurate annotation problems.

Background knowledge. The knowledge sources that fall into this category include: i) online ontologies and data repositories, ii) online textual resources, and iii) general lexical resources. The first two types of knowledge sources are exploited to detect possible deficiencies that might be associated with those entities that are not included in the problem domain (i.e., those entities that do not have matches). General lexical resources, on the other hand, are employed to expand queries when finding matches of the evaluated entity.

Compared to domain knowledge, one characteristic of background knowledge is that it is generic and is available to all applications. Another important feature of the knowledge, especially the first two types of background knowledge, is that they are less trustworthy than domain specific knowledge as the (semantic) web is an open environment where anyone can contribute. Corresponding to the two types of knowledge sources exploited, the deficiency detection process comprises two major steps, which are described in the following sections. 4

Detecting Data Deficiencies Using Domain Knowledge The tasks involved in this step are centered around the detection of four types of quality problems that are common to semantic metadata, namely inconsistent, duplicate, ambiguous, and inaccurate problems. The process starts with the detection of inconsistencies that may exist between the evaluated semantic metadata entity with the data contained in the specified semantic data repositories. It then investigates the duplicate problem using the same annotation context. The third task involved is detecting ambiguous and inaccurate problems by querying the available semantic data repositories.

Detecting inconsistencies. Please note that we are only interested in data inconsistencies at the ABox level. Such inconsistencies may be caused by disjointness axioms or the violation of property restrictions. First, disjointness leads to inconsistency when the same individual belongs to two disjoint classes at the same time. For example, the annotation “Ms Windows is a Person” is inconsistent with the statement that defines it as a technology, as the two classes are disjoint with each other. Second, violation of property restrictions (e.g., domain/range restriction, cardinality restriction) also causes inconsistencies. For example, if the ontology defines that there should be only one director in an organization, there is an inconsistency if two people are classified as director.

To achieve the task of inconsistency detection, we employ ontology diagnosis techniques. Each inconsistency is represented by a so-called minimal inconsistent subontology (MISO) [ 7 ], which includes all statements and axioms that contribute to the conflict. An OWL-reasoner with explanation capability is able to return a MISO for the first inconsistency found in the data set. The process starts with locating a single inconsistency using the Pellet OWL reasoner [ 8 ]. It then discovers all the inconsistencies by using Reiter’s hitting set tree algorithm [ 12 ], which builds a complete consistent tree by removing each ABox axiom from the MISO one by one. Please see [ 12 ] for the detail.

Detecting duplicate problems. This task is achieved by seeking matches of the evaluated entity within the same annotation context, i.e., within the values of the same property of the same instance that contains the evaluated entity. For example, when evaluating the annotation (story x, mentions-person, enrico), the proposed approach examines other person entities mentioned in the same story for detecting the duplicate problem. Domain specific lexicons are used in the process (e.g., the string “OU” stands for “Open University”) to address domain specific abbreviations and terms.

Detecting ambiguous and inaccurate problems. This task is fulfilled by querying the available data repositories. When there is more than one match found, the evaluated entity is considered to be ambiguous, as its meaning (i.e., the mapping to real world data objects) is not clear. For example, in the case of evaluating the person entity “John”, there is more than one match found in the KMi domain repository. The meaning of the instance needs to be disambiguated. In the situation where there is an inexact match, the entity is computed as inaccurate. As to the third possibility where there is no match found, the proposed approach turns to background knowledge to carry out further investigation. We used SemSearch [ 10 ], a semantic search engine, to query the available data repositories, and a suite of string matching mechanisms to refine the matching result. 5

Checking Entities Using Background Knowledge There are three possibilities when matches could not be found for the evaluated entity in the problem domain. One is that the entity is correct but not included in the problem domain (e.g., IBM, BBC, and W3C with respect to the KMi domain). The second possibility is mis-classification, where the entity is wrongly classified, e.g., “Sun Microsystems” as a “person”. The third one is spurious annotation, in which the entity is erroneous, e.g., “today” as a “person”. Hence, this step focuses on detecting two types of quality problems: mis-classification) and spurious annotation.

The task is achieved by computing possible classifications using knowledge sources published on the (semantic) web. The process begins by querying the semantic web. If satisfactory evidence cannot be derived, the approach then turns to textual resources available on the web (i.e., the general web) for further investigation. If both attempts fail, the system considers the evaluated entity spurious.

We used i) WATSON [ 4 ], a semantic search tool developed in our lab, to seek classifications of the evaluated term from the semantic web; and ii) PANKOW [ 1 ], a pattern-based term classification tool, to derive possible classifications from the general web. Detecting mis-classification problems is achieved by comparing the derived classifications (e.g., company and organization in the case of evaluating the annotation “Sun Microsystems as person”) to the type of the evaluated entity (which is the class person in the example) by exploring domain ontologies and general lexicon resources like WordNet [ 6 ]. In particular, the disjointness of classes are used to support the detection of the problem. General lexicon resources are also exploited to compute the semantic similarities of the classifications. 6

Experiments In this work we have carried out three preliminary experiments, which investigate the performance of the proposed approach in the KMi domain. In the following subsections, we first describe the settings and the methods of the experiments. We then discuss the results of the experiments. The experimental data were collected from the previous experiment carried out in ASDI [ 9 ], in which we randomly chose 36 news stories from the KMi news archive3 and constructed a gold standard annotation collection by asking several KMi researchers to manually mark them up in terms of person, organization and projects. We used the semantic metadata set that was automatically gathered from the chosen news stories by the named entity recognition tool ESpotter [ 14 ] as the data set that needs evaluation. We then experimented with this semantic metadata set using a gold standard based approach and the proposed approach.

In order to get a better idea of the performance of the proposed approach on employing different types of knowledge sources, we conducted three experiments: the first experiment used the constructed gold standard annotation collection; the second one used domain knowledge sources; and the third experiment used both domain knowledge and background knowledge. In particular, for the purpose of minimizing the influences that may be caused by other factors such has human intervention, we developed automatic evaluation mechanisms for both the gold standard based approach and the proposed approach, which use the same matching mechanism. Table 1 shows the results, with each cell presenting the total number of the correspondent data deficiencies (i.e., row) found in the data set with respect to the extracted entity type (or the the sum of all types). 6.2

Discussion Assessing the performance of the proposed approach is difficult, as it largely depends on three factors, including i) whether it is possible to get hold of good data repositories that cover most facts of the problem domain, ii) whether the relevant topics have gained good publicity on the (semantic) web, and iii) whether the background knowledge itself is of good quality and trustworthy. Here we compare the results of the different experiments in the hope of finding some clues of the performance.

Comparing the proposed approach with the gold standard based approach. As shown in the table, the performances on detecting duplicate, ambiguous and inaccurate problems are quite close. This is because that, like gold standard annotations, the KMi domain knowledge repositories cover all the facts (including people, projects, organizations) that are contained in the domain. On the other hand, there are two major differences between the gold standard based approach and the proposed approach.

One major difference is that, in contrast with the gold standard based approach, the proposed approach is able to detect inconsistent annotations but with no support for the incomplete annotation problem. This is because the proposed approach deliberately includes domain ontologies as a type of knowledge sources and does not have the knowledge of full set of annotations of the data source.

Another major difference lies in the detection of the spurious annotation problem. More specifically, there is a big difference between the first experiment and the second one. This is mainly caused by the fact that many entities extracted from the news stories are not included in the domain knowledge (e.g., “IBM”, “BBC”, “W3C”), and thus are not being to be covered by the second experiment. But they are contained in the manually constructed gold standard.

There is also a significant difference between the first experiment and the third one with respect to the detection of spurious annotations. Further investigation reveals two problems. One is that the gold standard data set is not perfect. Some entities are not included but correctly picked up by the extraction tool. “EU Commissioner Reding as a person” is such an example. The other problem is that background knowledge can sometimes lead to false conclusions. On the one hand, some spurious annotations are computed as correct, due to the difficulties in distinguishing different senses of the same word in different contexts. For example, “international workshop” as an instance of the class Organization is computed as correct, whereas the meaning of the word organization when associating with the term is quite different from the meaning of the class in the KMi domain ontology. On the other hand, false alarms are sometimes produced due to the lack of publicity of the evaluated entity in the background knowledge. For example, in the KMi SW portal, the person instance Marco Ramoni is computed as spurious, as not enough evidence could be gathered to draw a positive conclusion.

Comparing the performance of the approach between using and without using background knowledge. With 12 inconsistencies discovered and 58 spurious problems cleared among the 80 spurious problems detected in the second experiment, the use of background knowledge has proven to be effective in problem detection in the KMi domain. This is mainly because the relevant entities that are contained in the chosen news stories collection have gained fairly good publicity. “Sun Microsystem”, “BBC” and “W3C” are such examples. As such, classifications can be easily drawn from the (Semantic) web to support the deficiency detection task. However, as described above, we have also observed that several false results have been produced by the proposed approach.

In summary, the results of the experiments indicate that the proposed approach works reasonably well for the KMi domain when considering zero human effort is required. In particular, domain knowledge is proven to be useful in detecting those problems that are highly relevant to the problem domain, such as ambiguous and inaccurate annotation problems. The background knowledge, on the other hand, is quite useful for investigating those entities that are outside. 7

Conclusions and Future Work The key contribution of this paper is the proposed approach, which, in contrast with existing approaches that typically focus on the evaluation of semantic metadata generation algorithms, pays special attention to the quality evaluation of semantic metadata themselves. It addresses the major drawback of current approaches suffered when applying to data evaluation, which is the need for gold standards, by exploiting a range of knowledge sources.

In particular, two types of knowledge source are used. One is the knowledge sources that are available in the problem domain, including domain ontologies, domain specific data repositories and domain lexical resources. They are used to detect quality problems of those semantic metadata that are contained in the problem domain, including data inconsistencies, duplicate, ambiguous and inaccurate problems. The other type is background knowledge, which includes ontologies and data repositories published on the semantic web, online textual resources, and general lexical resources. It is mainly used to detect quality problems that are associated with those data that are not contained in the problem domain, including mis-classification and spurious annotations.

We have conducted three preliminary experiments examining the performance of the proposed approach, with each focusing on the use of different types of knowledge sources. The study shows encouraging results. We are, however, aware of a number of issues associated with the proposed approach. For example, real time response is crucial for dynamic evaluation, which takes places at run time. How to speed up the evaluation process is an issue that needs to be investigated in the future. Another important issue is the impact of the trustworthiness of different types of knowledge on the evaluation. Acknowledgements This work was funded by the X-Media project (www.x-media-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-026978. Tracking Name Patterns in OWL Ontologies

Vojteˇch Sva´tek, Ondrˇej Sˇ va´b University of Economics, Prague, Dep. Information and Knowledge Engineering, Winston Churchill Sq. 4, 130 67 Praha 3, Prague, Czech Republic

svatek@vse.cz, svabo@vse.cz Abstract. Analysis of concept naming in OWL ontologies with set-theoretic semantics could serve as partial means for understanding their conceptual structure, detecting modelling errors and assessing their quality. We carried out experiments on three existing ontologies from public repositories, concerning the consistency of very simple name patterns—subclass name being a certain kind of parent class name extension, while considering thesaurus relationships. Several probable taxonomic errors were identified in this way. 1 Introduction Concept names in semantic web (OWL) ontologies with set-theoretic semantics are sometimes viewed as secondary information. Indeed, for logic-based reasoners, which are assumed to be the main customers exploiting these ontologies, anyhow cryptic URLs can serve well. Experience however shows that even in ontologies primarily intended for machine consumption, the naming policy is almost never completely arbitrary. It is important for ontology developers (and maintainers, adoptors etc.) to be able to see the semantic structure of a large part of the ontology at once, and ontology editors normally use base concept names (local URLs) and not additional linguistic labels within their taxonomy view. At the same time, while inspecting possibly complex OWL axioms, self-explaining concept names (even independent of their context in the taxonomy) are extremely helpful.

This leads us to the hypothesis that concept naming in OWL ontologies can (at least in some cases) be a useful means for analysing their conceptual structure, detecting modelling errors and assessing their quality. Obviously, a ‘true’ evaluation of concept naming in specialised domain ontologies requires deep knowledge of the domain. We however assume that even in specialised ontologies, the ‘seed’ terms often belong to generic vocabulary and the domain specialisation is rather achieved via adding syntactic attributes (such as adjectives or nouns in apposition), leading to multi-word terms. The class-subclass pairs would then often be characterised by the presence of a common token (or sequence of tokens) on some particular position; we can see this as a simple (atomic) name pattern. Although the proportion of instances of such a pattern only represent a fraction of all subclass relationships1, in large- and medium-sized ontologies this may suffice for partial assessment of the consistency of naming, as part of ontology quality evaluation. 1 Based on our preliminary analysis, we estimate this fraction to float around 50%, depending on domain specificity and other factors.

Atomic name patterns can then be weaved into more complex pattern structures with their own semantics. The deeper understanding of the structure of an ontology thus acquired can help in e.g. mapping it properly to other ontologies.

The paper is structured as follows. Section 2 sets up the token-level background for our name patterns. Section 3 explaines the name patterns themselves. Section 4 describes the initial experiments on three ontologies and attempts to interpret their results. Section 5 surveys some related work. Finally, section 6 summarises the paper and outlines some future work. 2

Matching Tokens in OWL Concept Names All name patterns we consider in this work are built upon the sub-string relationships between pairs of concept names, at token level. The token-level relationship can in general have the nature of prefix, postfix or infix, possibly adjusted with some connective. For example, the name ‘WrittenDocument’ can be extended via prefix to ‘HandWrittenDocument’ or via infix to ‘WrittenLegalDocument’. A postfix extension could be e.g. ‘WrittenDocumentTemplate’, which, however, unlike the previous ones, would not be adequate for a subclass of ‘WrittenDocument’, as the main term (distinguished by its placement as end token) has changed. An adequate postfix extension for a subclass would in turn be e.g.‘WrittenDocumentWithComments’; here however the postfix has the form of prepositional construction appended to the main term (thus preserved).

Tokenisation is, for ‘technical’ items such as OWL concept names, usually assumed to rely on the presence of one of a few delimiters, in particular: underscore (Concept name), hyphen (Concept-name) and change of lowercase letter to uppercase (ConceptName), which is most parsimonious and therefore most frequent. Although the semantics of these delimiters could in principle differ (especially the hyphen is likely to be used for more specific purposes than the remaining two, on some occasions), we will treat them as equivalent for the sake of simplicity. We will also ignore sub-string relationship without explicit token boundary (i.e. between two single-word expressions), assuming that they often deviate from proper subclass relationship (as in ‘fly’ vs. ‘butterfly’, or even worse e.g ‘stake’ vs. ‘mistake’).

The mentioned token-level structures then have to be tracked over the ontology structure (for simplicity let us only consider taxonomic paths). This could lead to an inventory of naming patterns, some of which we considered in our start-up analysis presented below. The most obvious naming pattern is of course the one already mentioned: a subclass name being token-level extension of its parent class. Such patterns can already be assigned some status wrt. ontology content evaluation and possible refactoring. Although the ‘token analysis’ approach used is admittedly quite naive from the NLP point of view, we believe that, due to the restricted nature of concept names in ontologies, we would not need much more for covering the majority of multi-word names in real-world ontologies. Let us now outline a few, still rather vague, initial hypotheses concerning the interpretation of name patterns.

The first one, concerning subclassing, is central in our initial investigation: Hypothesis 1 If the main term in the name of a class and the main term in the multiword name of its immediate subclass do not correspond2 then it is likely that there is a conceptual incoherence.

The hypothesis anticipates that ontology designers should not often, while subclassing, substantially change the meaning of the main term in the name, as the main term is likely to denote the conceptual type of the underlying real-world entity, and they are obliged to keep the set-theoretic consistency (all instances of the subclass also have to be instances of the parent class). They may however subclass a multi-word name with a rather specific single-word name.

The second hypothesis is closely related: Hypothesis 2 If the two main terms from Hypothesis 1 only correspond via some longrange terminological link then it is likely that there is a shift to a more specific domain with its own terminology.

This hypothesis might help suggesting points for breaking large monolithic ontologies into more and less specific parts.

We also formulated two hypotheses that involve more extensive graph structures of the taxonomy.

Hypothesis 3 Concept with the same main term in their names should not occur in separate taxonomy paths.

In other words, if there are several partial taxonomies with the same main term, they are candidates for merger.

Hypothesis 4 If two taxonomy paths exist such that one contains a class X and its subclass Y, and the other contains a class Z and its subclass W, such that the name of X is token-level extension of the name of Z, with different main term, and the name of Y is token-level extension of the name of W, with different main term, then both paths should be linked with some property and the name pattern should probably apply for the descendants of Y and W as well.

This amounts to identification of ‘parallel’ taxonomies of related (but conceptually different) entities, which may also be quite important e.g. in ontology refactoring as well as mapping.

In the experiments below we only systematically compare Hypothesis 1 to our findings. We however occasionally mention the other three hypotheses where relevant. 2 The specification of ‘correspondance’ is discussed in section 4.1. In the initial, manual3, phase of our experiments, we restricted the analysis to 3 small- to medium-sized ontologies we picked from public repositories. Their choice was moreor-less ‘random’, we however avoided ontologies that appear as mere (converted) ad hoc taxonomies without the assumption of set-theoretic semantics, as well as ‘toy’ models designed for demonstrating DL reasoning (such as ‘pizzas’ or ‘mad cows’), which are actually quite common in such repositories, cf. [ 9 ]. 4.1

Settings In designing the experiments, there were numerous choices, especially concerning: 1. What patterns to follow 2. Whether to only consider the own structure of the ontology or also that of imported ontologies such as upper-level ones (namely, SUMO, in two out of the three cases) 3. Whether to require for fulfilling the patterns that the main term should be identical in the parent class and subclass, or also allow hyponymy/synonymy.

For the first issue we eventually decided to only consider two concrete patterns. The one is the presence of a common end token; note that this covers all cases of prefix and infix extension. The second (which proved much more rare) is the postfix extension starting with the ‘of’ preposition.

For the second issue we decided to restrict the analysis to the current ontology only (i.e. both members of the evaluated concept pairs had to be from the current ontology), but including concepts from imported ontologies that belong to the same domain (or mean only very slight domain generalisation). The rationale is that we did not intend to evaluate the way the concepts from the current ontology are grafted on the upper-level ontology, but only the design of the current ontology proper.

For the third issue, we decided to use WordNet4, with the assumption that a general thesaurus is likely to contain the main terms of multi-word domain terms. However, we separately counted and listed the cases where the pattern compliance was established via WordNet only. We did not use WordNet for single-token subclass terms5; we rather excluded them from the analysis.

The results of the analysis amount to the simple statistics of: 1. Class-subclass pairs where (one of the two considered) name patterns hold directly. 2. Class-subclass pairs where a name pattern holds via WordNet only. 3. Class-subclass pairs where name patterns don’t hold even via WordNet, but we eventhough assessed the subclass relationship as correct. 4. Class-subclass pairs where name patterns don’t hold even via WordNet, and we assessed the subclass relationship as incorrect (at least at the level of class names). 3 For examining the ontologies, we simply unfolded their taxonomies in Prote´ge´. 4 http://wordnet.princeton.edu/ 5 Our main focus are specialised domain ontologies, whose single-token terms are likely to either miss in standard lexical databases or exhibit a meaning shift there. In the tables below, the cases 2, 3 and 4 are explicitly listed and commented. Three symbolic labels were added for better overview. means: correct relationship, contradicts our Hypothesis 1. means: incorrect, conforms to our Hypothesis 1. Finally, ⊗ means: main terms correspond via thesaurus, i.e. Hypothesis 1 does not apply6.

The number of cases 3 (‘false positives’) and 4 (‘true positives’) can be viewed as evaluation measures for our envisaged method of conceptual error detection. There could potentially be ‘false positives’ even among the cases 2 (and theoretically even among the cases 1) due to homonymy of terms; we however did not clearly identify any such case. The accuracy of our approach can thus be simply established as the ratio of the number of cases 4 vs. the number of cases 3+4. 4.2

ATO Mission Models Ontology This, US-based military (ATO probably stands for ‘Air Tasking Order’) ontology, which we picked from the DAML repository7, is an ideal example of highly specific ontology rich in multi-token names; there are very few single-token ones, and none of these is involved as subclass in one of the subclass relationships. The ontology contains 86 classes (aside classes inherited from imported ontologies), and there are 116 immediate subclass relationships8 (including some multiple inheritance). Of them, 95 comply with the name patterns, and 21 don’t. Table 1 lists and comments the subclass relationships that break the name pattern. We assume (see the table) that the majority of non-compliance cases (11, i.e. 52%) are modelling errors 9; some others (5, i.e. 24%) are not strict non-compliance as relationship between the names could be determined using WordNet, and only a few (5, i.e. 24%) seem to be ‘false alarms’. In addition, the ontology contains some portions relevant to Hypotheses 3 (e.g. some ‘missions’ placed beyond the main ‘mission’ taxonomy and under some other concepts) and 4 (e.g. parallel taxonomies for ‘missions’ and ‘mission plans’). This ontology (also from the DAML repository), is relatively smaller and less domainspecific; it contains 53 classes (aside classes inherited from imported ontologies), and there are 27 immediate subclass relationship (including some multiple inheritance). Of the subclass relatioships, 11 comply with the name patterns and 13 don’t; finally, 3 involve a single-token subclass, thus being irrelevant for our method. Table 2 lists and comments the subclass relationships that break the name patterns. 4.4 This ontology, picked from the OntoSelect10 repository, contains 71 classes. It has no explicit imports, but largely borrows from SUMO at higher levels of the taxonomy. It 6 But Hypothesis 2 might do if the correspondence is ‘long range’ only. 7 http://www.daml.org/ontologies/ 8 Here we also considered relationships such that the superclass belonged to the imported but tightly thematically linked ATO ontology. 9 Or, possibly, artifacts of the DAML→OWL conversion. 10 http://olp.dfki.de/ontoselect/ Superclass Subclass/es Comment AirspaceControlMeasure AirCorridor Subclassing indeed looks TimingReferencePoint misleading. A ‘measure’ can DropZone be setting up e.g. a corridor,

CompositeAirOperationsRoute but not the corridor itself.

AirStation AirTankerCellAirspace Rather evokes part-of relationship but hard to judge w/o domain expertise.

ATOMission AircraftRepositioning By the available comment, means AircraftRepositioningMission.

However, ‘repositioning’ looks like acceptable term, though not hyponym of ‘mission’ in WordNet.

ATOMission CompositeAirOperations ⊗ ‘Mission’ is direct hyponym of ‘operation’ in WordNet. Note however the misuse of plural form.

ATOMissionPlan IndividualLocationReconnais- The ‘Plan’ token erroneously sanceRequestMission missing. The remaining 19

MissileWeaponAttackMission sibling subclasses do have it.

CommandAndCon- AirborneElementsTheaterAirCon- Subclass clearly misplaced, trolProcess trolSystemMission ‘mission’ concept non contiguous. CommandAndCon- ForwardAirControl Probably means trolProcess ForwardAirControlProcess. CommandAndCon- FlightFollowing ⊗ ‘Following’ could be seen as trolProcess process (it is hyponym of ‘processing’ in WordNet).

Hypothesis 2 might apply.

ConstraintChecking RouteValidation Specialisation to subdomain; ‘validation’ should be closely related to ‘checking’ but surprisingly is not in WordNet.

ControlAgency ForwardAirControllerAirborne A tricky case: the end token in subclass is actually an attribute of the true entity (‘controller’).

Furthermore, although the relationship between ‘agency’ and ‘controller’ is not intuitive, it might be OK in the domain context.

ForwardAirControl AirborneBattleDirection ⊗ ‘Direction’ is direct subclass of ‘control’ in WordNet.

GroundTheaterAirCon- ControlAndReportingCenter Though the relationship between trolSystem ControlAndReportingElement the end tokens is not intuitive, it looks OK in the domain context.

IntelligenceAcquisition AirborneEarlyWarning Rather looks like two subsequent processes: warning is preceded by intelligence acquisition.

However the end token ‘acquisition’ bears little meaning by itself.

ModernMilitaryMissile ArmyTacticalMissileSystem A system (i.e. group) of missiles, possibly including a launcher, is probably not a subclass of ‘missile’.

PrepositionedMate- GroundStationTankerMission ⊗ ‘Mission’ is close hyponym rielTask 66 of ‘task’ in WordNet. SupportingTask GroundStationTankerMission ⊗ As above.

Table 1. Name pattern breaks in the ATO Mission Models ontology JudicialOrganization OverseasArea PoliticalParty SuffrageLaw SuffrageLaw HumanAttribute HumanBloodGroup LandArea Region TeamSport WaterSport CombatSport ContentBearingObject NaturalLanguage is rather heterogeneous (with respect to its relatively tiny size), but contains clusters of related concepts, where name patterns can be identified. The overall quality of the ontology does not seem to be very high, as it contains many clear modelling errors, such as apparent instances formalised as classes. The outcomes of analysis are in Table 3. Table 4 shows the overall figures. The results are obviously most promising for the ATO Mission Models ontology, which is most domain-specific of the three. In general, the proportion of multi-word names seems to decrease with the growing generality of the ontology (EuroCitizen being the most general of the three). The accuracy of ‘inconsistency alarms’, if they were properly implemented, could be acceptable for human inspection and evaluation of the ontology. However, perhaps with the exception of ATO Mission Models, the coverage of our simple approach is still too small to guarantee substantial ‘cleaning’ of taxonomic errors. Subclass relationships with multi-token subclass Pattern-compliant (identical) Pattern-compliant (WordNet) Pattern-non-compliant, incorrect (‘true alarm’) Pattern-non-compliant, correct (‘false alarm’) Pattern proportion (w/o use of WordNet) Accuracy of ‘alarm’ Our research is to some degree similar to projects aiming at converting shallow models such as thesauri or directory headings to more structured and conceptually clean ontologies [ 2–6 ]. The main difference lays in our assumption that the ontologies in question are already intended to bear set-theoretical semantics, and that the ‘inconsistencies’ in naming patterns are due to either sloppy naming (possibly just reflecting shortcut terminology used by domain practitioners) or more serious modelling errors, rather than being an inherent feature of (shallow) models.

On the other hand, the research in ‘true’ OWL ontology evaluation and refactoring has typically been focused on their logical aspects [ 1, 10 ]. Our research is, in a way, parallel to theirs. We aim at similar long-term goals, such as detecting potential modelling inconsistencies or making implicit structures explicit. We however focus on a different aspect of ontologies: the naming policy. Due to the subtler nature of consistency or implicit structures in these realms (usually requiring some degree of acquaintance with the domain), the conclusions of name pattern analysis have probably to be more cautious than those resulting from logic-based analysis.

Conclusions and Future Work We presented a simple method of tracking name patterns (based on token-level extensions) over OWL ontology taxonomies, which could help detect some errors with respect to their set-theoretic interpretation. Initial experiments on three ontologies from public repositories indicated that the method has some potential, although the performance will probably largely vary from one ontology to another, especially with respect to their domain specificity.

There are various directions in which our current work ought to be extended. First of all, the so far manual process of pattern (non-compliance) detection used in the very first experiments should be replaced by an automatic one. We also plan to reuse experience from popular NLP-oriented methods of ontology ‘reconstruction’ from shallow models, such as those described in [ 3 ] or [ 5 ]. Consequently, we should, analogously to those approaches, adopt at least a simple formal model. Furthermore, concept names used as identifiers are obviously not the only lexical items available in ontologies. future (especially, more automated) analysis should pay similar attention to additional, potentially even multi-lingual lexical labels (based on rdf:label) and comments, which may help reveal if the identifier name is just a shortcut of the ‘real’ underlying concept name. In addition to class names, property naming (in connection with their domain and range) should also be followed, e.g. as drafted in [ 7 ]. In long term, we perceive as important to combine the analysis of naming patterns with the analysis of logical patterns, in the sense of ‘guessing’ the modeller’s original intention that got distorted due to the representational limitations of OWL. Our closely related interest is also the use of discovered patterns for mapping between ontologies. We already started to test the behaviour of some well-known (string-based and graph-based) ontology mapping methods with respect to naming patterns present in ontologies, using synthetic ontology-like models [ 8 ]. In the future, the analysis of (naming and other) patterns would be used as pre-processing step to mapping.

The research was partially supported by the IGA VSE grants no.12/06 “Integration of approaches to ontological engineering: design patterns, mapping and mining”, no.20/07 “Combination and comparison of ontology mapping methods and systems”, and by the Knowledge Web Network of Excellence (IST FP6-507482).

References

Cimiano ,

Handschuh , and

Staab . Towards the Self-Annotating Web . In Proceedings of the 13th International World Wide Web Conference , pages 462 - 471 , 2004 .

Cimiano ,

Staab , and

Tane . Acquisition of Taxonomies from Text: FCA meets NLP . In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining , pages 10 - 17 , 2003 .

Cunningham ,

Maynard ,

Bontcheva , and

Tablan . GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications . In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL02) , 2002 .

4. M. d'Aquin , M.

Sabou , M.

Dzbor , C.

Baldassarre , L.

Gridinoc , S.

Angeletou , and E. Motta.

WATSON: A Gateway for the Semantic Web . In 4th European Semantic Web Conference (ESWC'07) , 2007 .

Dellschaft and

Staab . Strategies for the Evaluation of Ontology Learning . In P. Buitelaar and P. Cimiano, editors, Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text . IOS Press, December 2007 .

Fellbaum . WORDNET: An Electronic Lexical Database . MIT Press, 1998 .

Haase ,

Harmelen ,

Huang ,

Stuckenschmidt , and

Sure . A framework for handling inconsistency in changing ontologies . In Proceedings of the International Semantic Web Conference (ISWC2005) , 2005 .

Kalyanpur ,

Parsia , E. Sirin, and

Grau . Repairing Unsatisfiable Concepts in OWL Ontologies . In 3rd Eupopean Semantic Web Conference , pages 170 - 184 , 2006 .

Lei ,

Sabou ,

Lopez ,

Zhu ,

V. S.

Uren , and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata . In Proceedings of the 3rd European Semantic Web Conference , 2006 .

10.

Lei ,

Uren , and E. Motta. SemSearch: A Search Engine for the Semantic Web . In Proceedings of the 15th International Conference on Knowledge Engineering and Knowledge Management(EKAW2006) , 2006 .

11.

Maynard ,

Peters , and

Li . Metrics for Evaluation of Ontology-based Information Extraction . In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web , Edinburgh, UK, May 2006 .

12.

Reiter . A theory of diagnosis from first principles . Artificial Intelligence , 32 ( 1 ): 57 - 95 , 1987 .

13. Enrico Motta Yuangui Lei , Victoria Uren. SemEval: A Framework for Evaluating Semantic Metadata . In The Fifth International Conference on Knowledge Capture (KCAP 2007 ), 2007 .

14. J. Zhu , V.

Uren , and E. Motta.

ESpotter: Adaptive Named Entity Recognition for Web Browsing . In Proceedings of the Professional Knowledge Management Conference , 2004 .

1. Baumeister

, Seipel

: Smelly Owls - Design Anomalies in Ontologies . In: Proc. FLAIRS 2005 , 215 - 220 .

2. Giunchiglia

, Marchese

, Zaihrayeu

: Encoding Classifications into Lightweight Ontologies . In: Proc. ESWC 2006 .

3. Hepp

, de Bruijn J.: GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies . In: Proc. ESWC 2007 .

4. Kavalec

, Sva´tek V. : Information Extraction and Ontology Learning Guided by Web Directory . In: ECAI Workshop on NLP and ML for ontology engineering . Lyon 2002 .

5. Magnini

, Serafini

, Speranza

: Making Explicit the Hidden Semantics of Hierarchical Classifications . In: Proc. AI*IA 2003 .

6. Serafini

, Zanobini

, Sceffer

, Bouquet

: Matching Hierarchical Classifications with Attributes . In: Proc. ESWC 2006 .

7. Sva´tek, V.: Design Patterns for Semantic Web Ontologies: Motivation and Discussion . In: 7th Conf. on Business Information Systems (BIS-04) , Poznan, April 2004 .

8. Sˇ va´b O. , Sva´tek V.: In Vitro Study of Mapping Method Interactions in a Name Pattern Landscape . Accepted to the Ontology Matching (OM-07 ) workshop at ISWC 2007, Busan, Korea.

9. Tempich

, Volz

: Towards a benchmark for Semantic Web reasoners - an analysis of the DAML ontology library . In: EON Workshop at ISWC 2003 .

10. Vrandecic

, Sure

: How to Design Better Ontology Metrics . In: Proc. ESWC 2007 .