Introduction

Integration of information from di erent ontologies

Julia Dmitrieva

Fons J. Verbeek

fverbeek@liacs.nl 0 0 Universiteit Leiden, Leiden Institute of Advanced Computer Science , Niels Bohrweg 1, 2333 CA Leiden , The Netherlands

In this work we describe our approach directed to the integration of information from di erent ontologies. We propose a method to extract and integrate modules from ontologies. These modules are generated as subgraphs from original ontologies on the basis of a term de ned by the user. The module extraction is related to the view traversal method [1]. The di erence is that we not only extract modules from OWL ontologies but use these modules for further integration. The small modules are integrated to one graph structure on basis of mappings. These mappings are based on the similarity between concepts in di erent ontologies. With these mappings we introduce merged nodes in the integrated graph which connect concepts from di erent ontologies. The integrated graph can be considered as a view or as a query answer and contains information collected from di erent ontologies. The method can also be useful to check the correctness of the mappings, because the user can see whether the merged concept is correctly placed in the hierarchy.

information integration ontology integration ontology mapping module extraction ontology visualization

Introduction

In order to nd similar concepts we use the string similarity. For each concept we extract the following characteristics which are used for the calculation of the similarity.

Label The label of the concept. This is in most cases the name of the concept. Description Description of the concept. This is an annotation property, where a description of the meaning of the concept is given in a human language.

At this moment, we make only use of English language in our prototype. ID This is a concept identi er. In some ontologies this is a unique string automatically generated during the serialization of the ontology. In other ontologies the concept ID can be the same as the name of the concept. Because this property depends on how an ontology is serialized, a little weight is assigned to it during the calculation of the similarity.

The comparison of concepts from di erent ontologies is done on the basis of Levenstein distance algorithm [ 2 ]. By this mapping the concepts from ontology O1 (seed ontology ) are mapped to the concepts from other ontologies O2; O3; : : : ; On. 3

Modules extraction: Graph-based Approach

We introduce a graph-based approach for module extraction. Although we are aware that our approach can lead to information loss, it is easy to implement and is su cient for testing our prototype.

In our previous work [ 3 ] we have explained how to represent an ontology as a graph structure. In this work we elaborate further on this approach. The most important di erence and di culty that we have to cope with during a module extraction is that the interesting term is matched not by one but by multiple concepts in the ontology. It means that we have to redesign our graph extraction algorithm, and apply it for multiple nodes.

The backbone of the graph that represents an ontology module is a spanning tree. Nodes in the spanning tree represent classes in the ontology and edges represent hierarchical relations as well as other kinds of relations between classes. These relations are extracted with the help of the reasoner [ 4 ]. Inferring that two classes C and E are related to each other via R requires extraction of properties from axioms of the concept. Then the reasoner can be asked whether the concept C u :9R:E is satis able, if not then it is necessary for the class C to have the property R with the ller E.

In order to extract a module we need to collect multiple graphs created on the basis of concepts matched by the interesting term. Let S0 be a set of the concepts which are matched during the term search procedure. This procedure is implemented as the comparison of label, comment and id of each concept from the ontology with the user query. We are constructing a graph G that corresponds to the extracted module, where Go is the set of nodes, and Ge is the set of edges. The algorithm for module extraction works as follows: 1. Initialize G0 S0 2. G0 G0 [ P0; where P0 is a set of parents of S0 . 3. G0 G0 [ S1; where S1 contains the siblings of the nodes in S0, if the parent of ci 2 S0 has 2 or more children in S0. 4. G0 G0 [ P1; where P1 contains the parents of nodes in G0; if these parents have 2 or more children in G0. 5. Ge [eij ; where eij is an edge between nodes ci 2 G0 and cj 2 G0 that exists in the original ontology. 6. After this point we have the set of probably unconnected trees. The trees are connected by the following algorithm: - if the root r of tree ti is descendant of some node s1 of tree tj , then make s parent of r. Introduce an edge from s to r. 1 With descendant here we mean that s and r are in the transitive closure of the sublClassOf relation. Whether r is a subclass of s can be solved by Description Logics[ 5 ] (DL) reasoner.

- otherwise we have trees t1; t2; : : : ; tn which are not subtrees of each others. For these trees we need to nd the least common subsumer2 a. This node will be the parent of trees t1; t2; : : : ; tn, and need to be added to the set G0. For each ti introduce an edge from node a to root of ti. From this tree an ontology can be created, where the hierarchical relation A is subclass of B is represented as subclass axiom A v B, and relations of the type A Relation B are represented as A v 9Relation:B. The annotation properties, e.g. label, comment, description, etc., can be copied from the original ontology. 4

Integrated Graph

We are interested in the integration of a set of di erent ontologies in one graph structure. We begin with the term that was chosen by the user during the ontology extraction procedure (root concept ). For each ontology we connect the children of the most general Thing with this root concept. Then we traverse each ontology and add the children recursively until we reach the leaves of the hierarchy. During this walk, for each concept C1, we interrogate the mapping le in order to check whether there is a mapping for this concept. If this is the case then we merge the information from every concept in this mapping and introduce a merged concept M . The concept M will represent C1 and all other concepts which are mapped to M . When another ontology is traversed and a concept C2 is found which is also mapped to M , we don't insert this concept into the graph. Instead, we introduce a new link from the parent of C2 to M .

The integrated graph structure can be visualized with our visualization approach [ 3 ] and further explored by the user. It can also be transformed to the new ontology.

Acknowledgements References

This work has been partially supported by the BioRange program of the Netherlands BioInformatics Centre (NBIC, BSIK grant). 2 Least common subsumer can be calculated by a DL reasoner.

1. Noy , N.F. , Musen , M.A. : Specifying ontology views by traversal . LNCS 3298 ( 2004 ) 713 { 725

2. Levenshtein , V.I. : Binary codes capable of correcting deletions, insertions, and reversals . ( 1966 )

3. Dmitrieva , J. , Verbeek , F.J.: Multi-view ontology visualization . In: The 11th International Protege Conference, June 23-26 , 2009 - Amsterdam

4. Pellet: Owl 2 reasoner for java . http://clarkparsia.com/pellet

5. Baader , F. , Calvanese , D. , McGuinness , D. , Nardi , D. , Patel-Schneider , P. : The Description Logic Handbook . Cambridge University Press, Cambridge ( 2002 )