-

CROSI Mapping System (CMS) Results of the 2005 Ontology Alignment Contest

Yannis Kalfoglou

y.kalfoglou@ecs.soton.ac.uk 0

Bo Hu

1 0 Advanced Knowledge Technologies (AKT), School of Electronics and Computer Science, University of Southampton , UK 1 School of Electronics and Computer Science, University of Southampton , UK

77 84

In this results report we summarize our experiences from running the CROSI Mapping System (CMS) over three test cases for this year's OAEI contest: bibliography, Web directories and medical ontologies alignment case studies. CMS successfully parsed and aligned all input ontologies in all three case studies. We also elaborate on the insights gained and potential research directions towards building more robust alignment systems to cope with the increasing diversity of alignment requirements.

Although WordNet-based approaches equip themselves with the lexical synonymy of the names of classes, they do not have the right measure to capture the structural information that is conveyed in most taxonomies. Structural information is exploited in di®erent ways. Heuristic rules is the most common way to take structures into account, e.g. identifying similarity of two entities based on the status of their parents and siblings. The modular architecture depicted in ¯gure 1 employs a multi-strategy system comprising of four modules, namely, Feature Generation, Feature Selection and Pro$ % $ % $% cessing, Aggregator and Evaluator. In this system, different features of the input data are generated and selected to ¯re o® di®erent sorts of feature matchers. The resultant similarity values are compiled by multiple similarity aggregators running in parallel or consecutive order. The overall similarity is then evaluated to initiate iterations that backtrack to di®erent stages.

CMS, is an instantiation of such a system. We include a screenshot of the Web-based interface of CMS in ¯gure 2. The system is still under development and we only used the ¯rst two components, Feature generation and Feature Selection and Processing, for aligning the ontologies in the three case studies of the OAEI contest. The alignment algorithms and techniques used are described in later sections but ¯rst we elaborate, in the next section, on the purpose of CMS and highlight some of its key characteristics, like the robust features extraction module.

1.1 State, purpose, general statement

The process of ontology mapping (or alignment), can be summarised as: given two ontologies, a system measures the similarity of the source ontological entities against the target ones and produces a list of correspondences, i.e. mapping : Os; Ot ! Cs £ Ct [ Ps £ Pt [ Is £ It where Oi is the input ontologies with i 2 fs; tg, subscript s indicating the source and t indicating the target, Ci the set of classes, Pi the set of properties and Ii the set of instances. Hence, the ¯rst step when deploying CMS was to extract characteristics that can be used to identify similar entities from di®erent ontologies. We summarize the characteristics we extracted in table 1. There are several points that need further explanation. First, in many cases, identifying corresponding instances is considered to be an easier task than identifying corresponding classes. This is because instances are expected to have more grounded variables. Corresponding instances provide a ground on which the number of candidate mapping classes can be narrowed down to a few (as we discovered in our past work with the IF-Map instance-based system [?]). Second, in case of complement classes, let cs be a class from the source ontology and ct from the target ontology, if sim(cs; ct) = a and d = :c, we can safely conclude that sim(d; cs) = 1 ¡ a, where sim=2 is the similarity function and a, a real number, gives the con¯dent value.

1.2 Specific techniques used

To ¯t the requirements of di®erent applications, we developed and implemented a series of mapping techniques, which are regarded as independent components that made up the CMS.

Name matchers

Ranging from pure syntactical approaches to more semantic enriched ones, name matchers are categorised as: String (tokenised) distance, Thesaurus, and WordNet hierarchical distance.

Levenstain distance is the simplest implementation of string distance. More sophisticated ones are: MongeElkan distance optimises edit-distance functions with well-tuned editing cost and Jaro Metric and its variants computes an accumulated similarity of s and t from the order and number of common characters between s and t, just to name a few. In our system thesaurus comes into play in two forms: WordNet3 and a prede¯ned corpora that are implemented as WNNameMatcher and CorpusNameMatcher respectively. To facilitate the use of WordNet, we assume that the local names of classes are either nouns or noun phrases while the local names of properties are phrases starting with verbs followed by either nouns or adjectives. Elements in the retrieved synsets are then compared against each other using either exact string matching or one of the string-distance based algorithms discussed in the previous section. WordNet arranges it entries in hierarchical structures. Hence, the similarity between names can be computed as followings: let wi and wj be the corresponding WordNet entries of namei and namej , w 3http://wordnet.princeton.edu/ Local features class labels and URIs equivalent classes related property names complement classes property labels and URIs property domain and range inverse (transitive) property functional property instance labels and URIs instantiated classes comments Global features super and sub classes sibling classes super and sub properties disjoint classes comments version information subsumption relationship help to identify the location of a class in the taxonomy and thus capture the structural semantics. sibling classes provide the hint of how the parent class is de¯ned. properties' hierarchy is useful in matching both properties and classes disjoint cover should be treated as a special case. comments sometimes are also given at the global level.

the record of modi¯cations and authentication provides alternatives. be the least common hypernym of wi and wj , r be the root of the underlying WordNet hierarchy, and hi, hj , h be the distances between wi and r, wj and r, w and r, respectively, the similarity between wi and wj is approximated as 2 £ h=hi + hj .

Semantic matchers

In CMS, the °avour of semantic is added in two di®erent ways: namely structure-aware matchers and intensionaware matchers.

Structure-awareness refers to the capability of traversing class hierarchies and accumulating similarities along the sub-class (sub-property) relationships. Let c and d be two classes from source and target ontologies, ci and di are their direct parents in respective ontologies, the similarity between c and d is recursively de¯ned as sim(c; d) = ®simlocal(c; d) + ¯sim(ci; di), where ® and ¯ are arbitrary weights and simlocal=2 gives the local similarity with regard to c and d which can be computed using one or a combination of techniques discussed above. Intension-awareness takes into account the de¯nitions of classes. A class c are regarded as a tuple hS; P i where S is a set of classes of which c is a subclass and P is a set of properties having c as the domain and other classes or concrete data types as the range. Hence, ¯nding the semantic similarity between c = hSc; Pci and d = hSd; Pdi amounts to ¯nding the similarity between Sc and Sd as well as Pc and Pd, i.e. sim(c; d) = ®sim(Sc; Sd) + ¯simproperty(Pc; Pd), where ® and ¯ are arbitrary weights and simproperty=2 computes the property similarity. More speci¯cally, we di®erentiate the following situations: ² classes with matching property names, property domains and property ranges: Lpc = Lpd and simset(¢pc ; ¢pd ) ¸ v and simset(©pc ; ©pd ) ¸ v where simset=2 computes the similarity of two sets of entities and v is a prede¯ned threshold. ² classes with matching property names and property domains but di®erent property ranges: Lpc = Lpd and simset(¢pd ; ¢pd ) ¸ v, simset(©pc ; ©pd ) < v, and ² classes with matching property names but di®erent property domains as well as ranges: Lpc = Lpd and simset(¢pc ; ¢pd ) < v and simset(©pc ; ©pd ) < v. The ¯rst situation contributes the most to the similarity of c and d. We regard classes with matching names and exact matching properties, i.e., properties with same name, domain and range, as semantically equivalent classes.

In many cases, matching between ¢Pc and ¢Pd (©Pc and ©Pc , respectively) can only be concluded after traversing several levels upwards or downwards the class hierarchy. Although not as strong as exact matching of property domains and ranges, matching classes of ¢Pc (©Pc ) to remote ancestors or descendants of classes of ¢Pd (©Pd ) provides a hint on how close the di®erent properties are, and thus how similar the two concepts c and d are. Such an idea is implemented in our system as a ClassDefPlusMatcher method.

1.3 Adaptations made for the contest

We didn't do any major adaptations to CMS in order to align the OAEI contest ontologies. We only did minor, routine programmatic adjustments, as for example running the CMS system from the command line prompt in a batch mode to parse and align the hundreds of ontologies in the Web directories case or include speci¯c Java heap size adjustment °ags in order to run the system over the vast FMA ontology. Other than that, the system ran as normal.

2. RESULTS

CMS bene¯ts from the plug and play of modular matchers. In this contest, four di®erent matchers were used, namely ClassDef for examining the domain and range of properties associated with classes, CanoName for accumulating similarities among class hierarchies, WNDisSim for computing the distance between two class names based on WordNet structures and HierarchyDisSim for distributing similarity among class hierarchies. The four major matchers were invoked both in parallel and sequentially. When invoked in parallel their results were then aggregated as weight average. On the other hand, when invoked in sequence, CanoName and WNDisSim give a list of corresponding classes whose similarities were then re¯ned by ClassDef and HierarchyDisSim. CMS ran each test case with di®erent con¯gurations (combination and sequencing) of the aforementioned four mapping modules and precision and recall values were calculated for each run. In this report, we include the the con¯gurations with the highest precision and recall values.

2.1 Case 1: benchmark/BibTex ontologies

For all the ontologies in this case we used a threshold of 0.8. ontology 202: CMS fails to produce any mapping candidates with high similarity score in test case 202 due to the naming convention. We consider class names as the foundation on which other techniques can be applied (although not the sole and dominant clue for ¯nding mapping candidates). Similarly, cases 248 to 266 also fall into this category: no candidates with high similarity value were found. ontology 205: CMS does not achieve a high recall rate for benchmark test case 205 due to the restriction of WordNet. In case 205, class names are replaced by randomly selected synonyms. CMS relies heavily on external resources, e.g. WordNet, to provide lexical alternatives for class and property names and thus fails to respond well for synonyms that are not recognised by WordNet. A customised corpus might alleviate the problem and improve the performance with signi¯cant e®orts and domain expertise. ontology 301: In test case 301, smaller similarity scores were assigned to mapping candidates. This is due to the fact that although classes have similar names, they are de¯ned with di®erent properties which have di®erent names, domains and/or ranges. It is our contention that for classes restricted with di®erent properties, they should either not be considered as equivalent classes or their similarity value should be reduced to re°ect such di®erence.

2.2 Case 2: Web directories ontologies

We do not have any speci¯c comments for Case 2. All 2265 were parsed successfully by CMS and fetched for alignment. However, 29 ontologies did not produced any alignment results due to circular de¯nitions in the original source.owl and target.owl ¯les. So, a total of 2236 pairs of source.owl/target.owl were aligned. The system parsed them from the command line in a batch mode, and the results produced after 2 hours and 53 minutes. Each cycle involved reading and parsing the source and target ontologies, ¯nd alignments (if any) and save and write the results in the common alignment format in a ¯le. This was repeated 2265 times.

2.3 Case 3: Medical ontologies

This case was the most interesting. The sheer size of the input ontologies (especially that of FMA), the modelling style of OWL, the conventions used, and the complexity of the paradigm made it an interesting adventure from the research point of view. We report in more detail about our experiences in section 3.3.

3. GENERAL COMMENTS

Performance tuning and hardware settings: As we were facing some really large ontologies (i.e., the 72k classes FMA ontology), we had to do certain optimizations to the code and to the computer settings in order to obtain alignment results in acceptable time. We ran the tests on a stand-alone PC running Microsoft Windows XP operating system, service pack II, 2003 version. The PC had 1GB of memory installed (DDR400SDRAM), an 80GB Serial ATA hard disk, and a Pentium 4, 3.0GHz processor. We used Java VM (version 1.5.0 04) and we had to do certain con¯gurations to adjust the heap size in Java. For example, the standard Java heap size is 64MB. This was not enough though for the Web directory and medical ontologies case. In fact, for the medical ontologies case, the sheer size of the input ontologies (especially that of FMA) forced us to use a 768MB heap size. Settings lower than this threshold caused the system to run out of memory.

Parsing and extracting experiences: FMA owl is a 31MB .owl ¯le comprising of 72545 declarations of owl classes and 100 relations (object and data type properties). These numbers were obtained when using our Jena 2.2 API and probably deviate slightly from other parsers. Parsing and extracting features from the FMA ontology took 9 minutes and 17 seconds with Java Heap Size adjusted to 512MB. However, in order to run the CMS and ¯nd alignments with the OpenGALEN we had to use a 768MB heap size setting. While parsing, Jena API was complaining about the syntax idioms used. For example we had a lot of warnings from Jena's RDF syntax handler, or the form "bad URI in qname XXX: no scheme found". We elaborate on the reasons behind this parsing warnings in section 3.3.

OpenGALEN.owl is a 4MB .owl ¯le comprising of 24 declarations of owl classes and 30 relations (as previously, object and data type properties, and these numbers were obtained from Jena 2.2 API). Parsing and extracting features from OpenGALEN took just a few seconds. There was no need to adjust the Java heap size.

3.1 Comments on the results

Di®erent combinations of CMS plug-in matchers perform signi¯cantly di®erently due to the nature of benchmark test cases. Table 3.1 lists the choice of matchers with regard to each test cases while Table 3.2 shows performance values of di®erent matchers4 with regard to alignment of ontology 303 in case 1, in terms of precision and recall.

3.2 Discussions on the way to improve the proposed system

CMS is expected to be improved on the following aspects: a more sophisticated aggregation mechanism, a uni¯ed alignment representation formalism, and parameterised algorithms for class hierarchy distance. Firstly, as discussed in previous sections, results from multi-matchers are aggregated as weighted average with arbitrary weights to start with. Thus far, the weights are ¯ne-tuned manually relying on the knowledge of the 4Results are obtained with equal weights for matchers.

Test Case # A A, B A, C, D domain of discourse and the underlying algorithms of CMS. A more sophisticated approach would hire machine learning techniques to work out the most appropriate weights with regard to di®erent matchers aiming to solve di®erent sort of mappings. Furthermore, results from di®erent matchers can be sorted locally ¯rst which could make accumulating results from di®erent matchers to be reduced to ranking aggregation [ 2 ]. Secondly, the heterogeneous nature of di®erent matchers { some external matchers produce pairwise equivalence with numeric values stating the similarity score while others output high level relationships, e.g. same entity as, more speci¯c than, more general than and disjoint with expressed in high level languages such as OWL and RDF { suggests that output from di®erent matchers has to be lifted to the same syntactical and semantic level. A uni¯ed representation formalism equipped with both numeric and abstract expressivity can facilitate the aggregation of heterogeneous matchers. Thirdly, CMS takes into account the exact position of classes in the class hierarchy. We would like to develop algorithms that penalise mapping candidates that are found to be quite apart from each other, and then propagate their similarity values upwards and downwards in the hierarchy to their descendants and/or ancestors. There could also be pre-de¯ned parameters that as we go up or down the hierarchy we change the similarity values of their descendants and/or ancestors accordingly. We expect that this could reduce the number of false positive results.

3.3 Comments on the test cases

We do not have any speci¯c comments for test cases on BibTex and Web directories alignments. However, we found interesting the last test case, that of medical ontologies alignment, and we summarize our experiences below.

FMA.owl was a di®erent case altogether. The ontology describes the domain of human anatomy and it aims to provide "a reference ontology in biomedical informatics for correlating di®erent views of anatomy, aligning existing and emerging ontologies in bioinformatics" [ 6 ]. However, there are two notable facts regarding the syntactic and modelling idioms of FMA and existing results from previous e®orts in trying to align FMA and GALEN. As far as the former is concerned, the OWL version we had to work with was a result of translation from Protege. Previous work has shown that this result is not always a faithful representation of the original FMA Protege model. For instance, it has been reported that FMA DL constructs are often ill-de¯ned and they lead to inconsistencies when a reasoner parses the ontology [ 5 ]. Consistency checking for FMA is an acknowledged problem though, even by its authors: "[. . . ] feedback from these investigators revealed an aggregate of a few hundred errors, many of which related to spelling and only a few to cycles in the class subsumption and partonomy hierarchies." [ 6 ].

Leaving aside this fact of life (as it is natural for an ontology that big and so close to human practice to be inconsistent), we point to a couple of syntactic idioms that we found interesting when parsing the ontology with our Jena-based CMS system. Firstly, the rather unusual use of unique frame IDs for class names (<owl:Class rdf:ID> constructs) and the textual description of a class in an rdfs:label construct. We also noticed some unusual uses of references to frame IDs. For instance, the declaration of "arterial supply" as an object property: <owl:ObjectProperty rdf:ID="arterial supply" rdfs:label="arterial supply"> is used in other parts of the ontology where it refers to a rdf:resource which points to a di®erent resource: <arterial supply rdf:resource="#frame 14586"/>. Tracing that frame ID leads us to a de¯nition of a "Tissue" class, and not the "arterial supply": <owl:Class rdf:ID="frame 14586" rdfs:label="Tissue">. The de¯nition of an instance (with frame ID 14586) of an object property ("arterial supply") that is a class ("Tissue") could lead to modelling misunderstandings and confusion (although, syntactically speaking, it is allowed in some versions of OWL).

Going back to our argument for the notable facts, we found that previous e®orts for aligning FMA to GALEN reported rather controversial results. For example, in [ 7 ], the authors employed two di®erent alignment methods to map FMA to GALEN. Despite of the subtle differences of OpenGALEN with GALEN, the similarity of their work with that of the OAEI contest 3rd case study is high but some of their ¯ndings are questionable from the semantics point of view: for example, it was reported that "Pancreas" in FMA matches "Pancreas" in OpenGALEN with 1.0 similarity value which "indicates a perfect match" [ 7 ]. When we looked carefully at the de¯nitions of "Pancreas" in both ontologies we saw that "Pancreas" is de¯ned as a class in FMA ( <owl:Class rdf:ID="frame 12280" rdfs:label="Pancreas">) whereas in GALEN (OpenGALEN) as an instance of class "Body Cavity Anatomy" <owl:Class rdf:ID="Body Cavity Anatomy"> <rdfs:subClassOf rdf:resource="#OpenGALEN Anatomy Metaclass"/> <Body Cavity Anatomy rdf:ID="Pancreas"> Even if OWL semantics allow to map an individual to a class (when dealing with OWL Full), such an alignment is misleading especially when we consider the high level of abstraction for the "Pancreas" class in OpenGALEN. It seems that the "lexical phase" parsing used in [ 7 ] was the main contributor to this high similarity value when relatively little structure information was taken into account. As a ¯nal comment on the case, we also point the reader to observations made by the FMA authors when trying to validate mapping results and di®erences in terminologies with these two ontologies: "[. . . ]the reasons for the di®erences have not yet been explored, but at least some of them may be the di®erent contexts of modelling. GALEN represents anatomy in the context of surgical procedures, whereas FMA has a strictly structural orientation." [ 6 ].

3.4 Comments on the measures

The proposed measures of precision and recall have been studied and practiced in the NLP community for years and they are a de facto standard metric for commercial applications, like search engines. However, we believe that their adaptation for measuring the performance of an ontology mapping system is somewhat questionable. We cannot elaborate fully on our reservations regarding the use of such a metric in this short paper, but we highlight the main points of our objections: (a) precision is regarded as hard to implement and reveals the usefulness of a retrieved document (or hit in a hitlist) for a search engine. We can't judge the usefulness of a found alignment by comparing it with the reference alignment; (b) neither precision nor recall take into account the possible applications of the alignments found. In all the past EON (and this year OAEI) contests, a set of pre-de¯ned alignments were used as a standard against which all found alignment were compared. This does not say anything about the usefulness of the found alignments, or even of they are complete as the prede¯ned ones can be erroneous. Further to these comments, we would also like to add that the assignment of numerical values in the range 0.0 to 1.0 does not reveal their semantic relevance, but purely a brute-force algorithmic way of comparing performance. We also observed a variety of interpretations of precision and recall metrics by the ontology alignment community.

3.5 Proposed new measures

Devising new measures for assessing the found alignments between two ontologies in a universally agreed manner is a di±cult task. We do not see a quick solution to this problem, but as ontology engineers we can apply knowledge engineering technologies that encompass as much semantic information as possible; for example, we were surprised that the semantically rich definitions of OWL for declaring class or property equality (and inequality) and the universal construct for declaring similarity, are hardly used by the community. We would also like to see ways of introducing "applicationdriven" alignment metrics where an example application (i.e., a Semantic Web service information lookup engine) will need to access two di®erent ontologies and the alignments found will need to be used in the application in a speci¯c way. Having an application-driven alignment metric, we can experiment with the notion of usefulness of alignment in a real world scenario, rather than doing meaningless number crunching with regard to found and pre-de¯ned alignments. After all, alignment needs to be done in the ¯rst place because there is a real world need for it.

4. CONCLUSION

The 2005 OAEI ontology alignment contest was the ¯rst one that introduced sizeable ontologies and posed some interesting and challenging problems with respect to performance, scaling and domain exploration. We found it a rewarding experience and we look forward to continue the fruitful exploration of this key ¯eld in the emergent Semantic Web.

6. RAW RESULTS

All of our results are included in a tabular format in table 6.3. These results have been the best of the CMS combinations with di®erent matcher. We report on those in section 3.1. So, for example, alignments for case #103 were produced using CMS Matcher A, whereas alignments for case 225 were produced using CMS Matchers A+B+C. A list of all this combibnation can be found in table 3.2.

6.1 Link to the system and parameters file

Access to the Web-based interface of the CMS system is provided via www.aktors.org/crosi/cms. We note that the system is not available in the community for free distribution yet, due to the legalities of the IPR for the CROSI project.

Name

Reference alignment Irrelevant ontology Language generalization Language restriction No names No names, no comments No comments Naming conventions Synonyms Translation

No specialisation Flatenned hierarchy Expanded hierarchy No instance

No restrictions No properties Flattened classes Expanded classes

Real: BibTeX/MIT Real: BibTeX/UMBC Real: Karlsruhe Real: INRIA Acknowledgements

This work is supported under the Capturing, Representing, and Operationalising Semantic Integration (CROSI) project which is sponsored by Hewlett Packard Laboratories at Bristol, UK. The ¯rst author is also supported by the Advanced Knowledge Technologies (AKT) Interdisciplinary Research Collaboration (IRC) project which is sponsored by the UK EPSRC under Grant number GR/N15764/01.

6.2 Link to the set of provided alignments (in align format)

The results of all three cases (BibTex, Web directories, Medical) are available for download from the CROSI web site at www.aktors.org/crosi/eon05contest/results. 6.3

Matrix of results

[1]

W.W.

Cohen ,

Ravikumar , and

S.E.

Fienberg . A comparison of string distance metrics for name-matching tasks . In IJCAI 2003 IIWeb Workshop , pages 73 { 78 , 2003 .

[2]

Fagin ,

Kumar , and

Sivakumar . E± cient similarity search and classi¯cation via rank aggregation . In Proceedings of the ACM SIGMOD International Conference on Management of Data , pages 301 { 312 . ACM Press, 2003 .

[3]

Fellbaum. WordNet: An Electronic Lexical Database . The MIT Press, 1998 .

[4]

Ehrig and

Staab. QOM - Quick Ontology Mapping . In Proceedings of the 3rd International Semantic Web Confernece (ISWC'04) , LNCS 3298, Hiroshima , Japan, page 683 { 697 , 2004 .

[5]

Golbreich ,

Zhang , and

Bodenreider . Migrating the FMA from Protege to OWL . Technical report, jul 2005. In notes of the 8th International Protege Conference , Madrid, Spain.

[6]

Rosse and JL. Mejino . A Reference Ontology for Bioinformatics: The Foundational Model of Anatomy . Journal of Biomedical Informatics , 36 : 478 { 500 , 2003 .

[7]

Zhang , P. Mork, and

Bodenreider . Lessons learned from aligning two representations of anatomy . In in Proceedings of the KR 2004 Workshop on Formal Biomedical Knowledge Representation , Whistler, BC , Canada, pages 102 { 108 , 2004 .