Effective method for large scale ontology matching Gayo Diallo Mouhamadou Ba Univ. Bordeaux, ISPED- LESIM Univ. Bordeaux, ISPED- LESIM 146 rue Léo Saignat 146 rue Léo Saignat F-33000 Bordeaux F-33000 Bordeaux Gayo.Diallo@isped.u- Mouhamadou.Ba@isped.u- bordeaux2.fr bordeaux2.fr ABSTRACT Moreover, they are becoming more complex, large and Nowadays, we are facing a proliferation of heterogeneous multilingual. For instance, the Systematized Nomenclature of biomedical data sources accessible through various knowledge- Medicine- -Clinical Terms (SNOMED-CT), a multiaxial, based applications. These data are annotated by more and more hierarchical classification system that is used by physicians and large and disseminated knowledge organization systems ranging other health care providers for encoding clinical health from simple terminologies and structured vocabularies to very information, contains more than 300,000 concepts which are formal ontologies. In order to solve the interoperability issue regularly evolving. Each concept designated sometimes by which arises due to the heterogeneity of these ontologies, an several synonymous terms. Another example is the International alignment task is usually performed. However, while a Classification of Diseases (ICD), the World Health Organization significant effort has been undertaken to provide tools that (WHO) standard diagnostic tool for epidemiology, health automatically align ontologies containing hundreds of entities, a management and clinical purposes which is used to monitor the little attention has been paid to the matching of large size incidence and prevalence of diseases and other health problems. ontologies as it uses to be the case in the life sciences domain. The current ICD-10 version contains more than 12,000 concepts We present in this paper ServOMap, a fast and efficient high designated with terms in 43 different languages including precision system able to perform matching ontologies containing English, Spanish and French. hundreds of thousands of entities. The system participated in the In many cases, there is a need for establishing mappings 2012 edition of the Ontology Alignment Evaluation Initiative between these different KOS in order to make interoperable campaign and achieved very good performance, among the top systems that use them. For instance, the EU-ADR project (1) three systems for the Large Biomedical Ontologies Track. developed a computerized system that exploits data from eight European healthcare databases and electronic health records for the early detection of adverse drug reactions (ADR). As these Categories and Subject Descriptors databases use different medical terminologies (ICD9, ICD10, I.2.4 [Artificial Intelligence] Knowledge Representation Read Code, ICPC) to encode their data, some mappings are Formalisms and Methods– representation languages; H.3.1 needed to translate query posed to the global system into queries [Information Storage And Retrieval] Content Analysis and understandable by the different data sources. Performing manual Indexing - Indexing methods, Thesauruses; J.3 [Life And mappings between all the mentioned resources is not feasible in Medical Sciences]: Medical information systems. a reasonable time. Generally speaking, the data integration domain and the semantic browsing of information domains (2) General Terms are areas where matching ontologies is usually performed. Algorithms, Performance, Design. There is, therefore, a crucial need for tools which are able to perform fast and automated correspondences computation between entities of different KOS and which can scale to large Keywords ontologies and mapping sets. There is also a need of tools which Life Sciences Ontology Matching, Ontology Repository, provide support for multi-ontologies based applications. Semantic Interoperability Regarding the first issue, a significant effort has been conducted in the ontology alignment/matching domain (3) and 1. INTRODUCTION the Ontology Alignment Evaluation Initiative campaign has With the wide adoption of Semantic Web technologies, the played an important role (4). In this context, it has been noticed increasing availability of knowledge based applications in the during the 2011.5 edition of this campaign that few systems, life sciences domain raises the issue of finding possible including GOMMA (5) and LogMap (6), was able to match the correspondences between the underlying knowledge whole Foundational Model of Anatomy (FMA) and the National organization systems (KOS). Indeed, various terminologies, Cancer Institute (NCI) Thesaurus with a good F-measure in a structured vocabularies and ontologies are used for annotating reasonable time. data and the linked open data initiative is increasing this activity. Regarding the second issue, several initiatives have been One of the key roles played by these KOS is to provide a conducted in order to provide systems for facilitating accessing support for data exchange based not only on a common syntax multiple and various knowledge artifacts within the semantic but on also on a shared semantic. This particular issue makes web infrastructure (e.g. Swoogle (7), Watson (8), Ontology them a central component within the Semantic Web and the Lookup Service (OLS) (9) and the BioPortal initiative (10)). emerging e-science and e-health infrastructure. However, they follow a centralized approach. Embedding them These KOS which are independently developed at the in an application is not easy as they are not designed with such a discretion of the various projects are heterogeneous in nature. purpose. The work described in this paper falls within the above functionalities that can be embedded within a knowledge-based mentioned research area and presents the ServOMap approach, a application for accessing the managed ontologies. large scale ontology matching system which is able to deal with It provides functionalities to meet the following set of large ontologies associated with multilingual terminologies. requirements: ServOMap deals with ontologies described in the RDF(S)1 and • allowing building and maintaining decentralized OWL2 W3C standard languages. It relies on the ServO Ontology repositories and make them communicating Repository (OR) system (11) (12) which is able of managing • providing the ability to dynamically index a set of multiple KOS and provides indexing and retrieving features. ontologies in a single repository that can be later updated as Thanks to the use of the ServO OR, ServOMap follows needed Information Retrieval (IR) based techniques for computing • be able to overcome the difference in the languages similarity between entities. Contrary to most of the existing large used for describing ontologies scale matching systems, it is knowledge background free ontology matching system. From now on, an ontology repository is an index that could be maintained in the memory or in the system files and which store a “representation” of several KOS which are later used for performing some meta-operations including searching similarity between entities. The notion of ontology repository described here differs from the notion represented by system such as OWLIM (13) and more generally Ontology-Based Databases systems (14) and RDF repositories such as Sesame (15). It is more related to the work described in (16). The rest of the paper is structured as follows. In section 2 we briefly outline the ServO OR on which relies ServOMap and we present its main features. In section 3 we detail the Figure 1: The Servo Kernel and Business Components (meta ServOMap ontology matching approach and discribe the operations) different steps for similarity computing. We present in section 4 Thus, the approach adopted is based on the adaptation of IR the evaluation performed on the Large BioMedical dataset tested and validated methods. And the following choices have provided by the 2012 edition of the OAEI campaign. We been made (figure 1). First, a common meta-model is defined for conclude in section 5 and give some perspectives as future work. representing any ontology regardless its language or format. This meta-model is instantiated by processing the input ontology 2. Background on the ServO Ontology with the JENA framework (17). Then, an Ontology Processing Repository and loading module is designed and implemented. Finally, an ServO is a system which provides decentralized ontology Ontology Indexing Module (OIM) and an Ontology Retrieving repository for managing heterogeneous knowledge resources Module (ORM) are designed. (11). Its design principle is guided by the analogy that could be The OIM and the ORM use the high-performance scalable made between semantic resources retrieval available within an information retrieval library Apache Lucene3. These components Figure 2: Overview of the ServOMap approach ontology and traditional information retrieval (IR) techniques are detailed in (11). over a corpus of documents. ServO provides an OR and the The model for the OR defines the two main functionalities of the repository: indexing and retrieving resources according to some criteria. An indexing and retrieval model specifies how 1 http://www.w3.org/TR/rdf-schema/ 2 3 http://www.w3.org/TR/owl-features/ http://lucene.apache.org documents and queries must be represented. Also it details the average number of sub-concepts for a concept, the different retrieval function to be used. Moreover it determines the notion languages used to denote entities labels or annotations, the most of relevance. The relevance can be binary (the case of the frequent terms within the ontology, the longest set of synonyms Boolean model) or continuous (a ranked list of results). labels used to describe a concepts, etc. Some metrics are ServO allows querying the repository by combining necessary for optimizing the use of the Lucene backend. Boolean terms (a.k.a the labels of the entities) and both datatype and object properties. This requirement allows comparing in a 3.2 Lexical and Contextual Indexing structured basis several concepts from different ontologies. As we have already pointed out, ServOMap relies on IR Following the functionality offered by the Lucene API, we techniques for ontologies matching. Therefore, an ontology is adopted an approach which combines both the Boolean and the seen as a corpus of document to process. Each entity (concepts, Vectorial space models (VSM) of IR to compute the relevance properties including both object properties and data type between the queries and the entities of the ontologies within the properties) is a document to process. repository. To do so, ServOMap constructs an inverted index (an In the VSM, each document or query is represented by a ontology repository) from the input ontologies. Thus, for each vector in a space where each dimension is associated to an ontology, ServOMap uses the Ontology Processing Module of indexing term. The similarity between the query q and the ServO to retrieve all entities (concepts and properties). Then, concept c is computed as (11): according to the parameters computed during the previous step (Computing Ontology Metrics) a dynamic generation of entity description is performed. This process is dynamic as each entity is described according to the features it holds. Thus, some Where: concepts may have synonyms in several languages or may have • tf(t in c) correlates to the term's frequency, defined as comments where other may only have English terms. Though, the number of times term t appears in the currently scored some concepts may have declared properties (either object concept c. tf(t in c) = √frequency properties or datatype properties), etc. During the dynamic • icf(t) stands for Inverse Concept Frequency. This description process, the retrieved labels from a concept are value correlates to the inverse of ConceptFreq (the number of passed to a set of filters: stop words removal, normalization concepts in which the term t appears). (upper case to lower case), punctuations removal, completion of • coord(q,c) is a score factor based on how many of the labels by the permutations of their terms and so on. It is also query terms are found in the specified concept. possible to indicate whether ServOMap uses label stemming or • queryNorm(q) is a normalizing factor used to make not. Moreover, the words of a term can be concatenated as in the scores between queries comparable. It attempts to make scores Table 1. from different queries (or even different indexes) comparable. • t.getBoost() is a search time boost of term t in the TABLE I. EXAMPLE OF AVAILABLE FIELDS WITHIN THE INDEX AND query q as specified in the query text. THEIR TERM COUNTS FOR THE FOUNDATIONAL MODEL OF ANATOMY • norm(t,c) encapsulates a few (indexing time) boost ONTOLOGY and length factors such as Concept boost and Field boost. Term Field Name Example Finally, the different functionalities offered by the ServO Counts OR are: dDomain 15 spatialassocirelat • Mapping users query terms to concepts from previously indexed ontologies (Term2Concept) dRange 5 string • Ontology matching and semantic similarity computing accessorilobarvein between entities for different ontologies (ServOMap) directLabelCEn 152,088 veinaccessorilobar • Ontology searching in order to provide a KOS or a set veinlobaraccessori of KOS suitable for a particular task (ServOSearch) directNameC 78,884 accessorilobarvein • Change detection between different versions of the same KOS (ServOChangeDetect). directNameP 52 percentag In the following section, we detail the ontology matching http://bioontology.org/#Acces process ServOMap which is based on the use of the ServO OR. uri 79,042 sory_lobar_vein 3. Large scale ontology matching with Table 1 gives an example of available fields and their term ServOMap counts within the index for the Foundational Model of Anatomy In this section, we detail the overall process that ServOMap ontology (FMA). Term counts are provided by the Lucene follows for computing similarity between entities of two given backend. FMA contains 79,042 entities, among them 78,884 are ontologies and more generally two given knowledge concepts. As we can see, the value of the dDomain field (the organization systems. The approach is depicted in Figure 2. domain of a property) is spatialassocirelat which is the term There are 5 steps that are described below. “spatial association relation”. And the concept with id #Accessory_lobar_vein has as directLabelCEn the set 3.1 Computing Ontology Metrics {accessorilobarvein veinaccessorilobar veinlobaraccessori} for The first step after parsing and loading input ontologies is “Accessory lobar vein” and its permutations. All spaces are to compute a set of metrics that are later used as parameters for removed within labels. the systems. These metrics include for any input ontology: the In ServoMap we make the assumption that two concepts concepts and not on properties. And, it is restricted to only the similar have likely their surrounding concepts similar. Thus, the concepts that have not been yet mapped to any other concepts. description of a concept is completed by contextual descriptions. This is again based on the assumption that if two concepts are The first one is the SubConcept strategy where a concept is mapped by the previous strategy, it is likely to be correct. completed by the information about all its sub-concepts. The The same process as previously is followed for dynamically second strategy is the SupConcept strategy where each concept generating the description of the concepts. The resulting query is is completed by the description of its super-concepts. The third sent to the index for retrieving the possible mappings. The same one is the SibConcept strategy. In this case the description of a process is repeated for SubConcept, SupConcept, SibConcept. concept is completed by the description of all its siblings. After the complete process, we have three sets of mappings A flag is used to indicate whether the two input ontologies according to the three strategies. These three sets are then have to be indexed or only the smallest one. This flag is combined and duplicates mappings are removed. exploited latter during the similarity computing phase. As our approach is mainly lexical based, we realized during our experiments that this strategy generates a lot of noise. We then defined a refinement strategy to select the best mappings 3.3 Compute lexical based similarity among the set obtained during the context based mapping. This After the indexing phase, ServOMap proceeds to the lexical strategy is briefly described in the following section. based similarity computing. This step relies on the Ontology Retrieval Module of the ServO Ontology Repository and use the 3.5 Refinement strategy for context-based similarity function described in section 2. mappings Depending on the flag indicating the indexed ontologies, During the context mappings refinement we try to keep the Ontology Processing Module is called for retrieving the only the couples obtained and that do not contradict the concepts to use for searching over the built index. Thus, if both input ontologies are indexed, the first one, let’s say O1 is used as search ontology over the index on the second ontology I2. And, Algo Refinement_SubSupSib vice versa, the ontology O2 is used to perform search over the input: vector ContextM, LexicalM index of the first ontology I1. If the flag indicates that only one output: vector CleanContextM ontology is indexed, then ServOMap performs only a one way search. Begin As in the lexical and contextual indexing phase, a dynamic For each couple (C1, C2) in ContextM generation of entity description if performed for any entity to use in order to search the index. A Boolean query is constructed with all the available fields for the entity (label, comments, If C1 OR C2 exists in LexicalM Then properties, etc.). Please note that the same string processing task 1. If C1 is LexMappedWith Sup(C2) or Sub(C2) Or is performed for all the components of the entity in order to have the same level of description than the indexing phase. C2 is LexMappedWith Sup(C1) or Sub(C1)) Then Again, ServOMap relies on the ServO OR. Each Boolean removeCouple(C1,C2) query represented as a vector of terms is searched over the 2. If C1 is LexMappedWith Sib (C2) Then index. A ranked list of entities is retrieved. SeroMap keeps as a possible mapping the couple constituted of the entity to search removeCouple(C1,C2) and the entity having the highest similarity (vectorial similarity) 3. If C2 is LexMappedWith Sib (C1) Then with the entity to search. It can happen that several entities have the same similarity with the entity to search. In this case, in removeCouple(C1,C2) order to keep the most relevant, the local names of the entities 4. If Sub(C1) isMappedWith (Sib(C2) OR Sup(C2) are compared using the Levenshtein Distance. At the end of this process, a first set of mappings between Then removeCouple(C1,C2) the two ontologies is made available. 5. If Sup(C1) isMappedWith (Sib(C2) OR Sub(C2) 3.4 Compute context-based similarity Then removeCouple(C1,C2) Usually the mappings computed previously are considered Do 4.) and 5.) for C2 high precision based mapping. Indeed, as it is almost a strict equality that is used between entities to compare, and only the EndIf direct description is used, the mapping is likely to be correct. EndFor However, this high-level accuracy is relativized by the relatively return CleanContextM ; low recall. And, as the objective is to return as many mappings as possible, there is a need to complete the set of mappings End obtained previously. To do so, a contextual based similarity is performed. The idea is based on the assumption that when two entities are mappings that are already found with the lexical based similar, there is a big chance that the concepts that surround it mappings. Again, here, this is based on the assumption that the are also similar. Here, by surrounding concepts we mean super- lexical-based similarity is highly accurate. In order to filter out concepts, sub-concepts and siblings concepts. Thus, in the the results provided by the SubConcept, SibConcept, context based similarity, the description of a concept is based on SupConcept strategies we use the refinement algorithm the strategies outlined previously (i.e. SubConcept, SupConcept, described above and illustrated in figure 3. In this figure, SibConcept). This contextual strategy is applied only on ContextM is the set of mappings provided by the context-based In the following section we present the evaluation of ServOMap that has been performed on a set of various dataset. 4. Evaluation In this section, we report the performance achieved by our system on the large biomedical track of the OAEI 2012 campaign. To do so, we will describe first OAEI and the dataset that has been used in our evaluation. 4.1 The Ontology Alignment Evaluation Initiative Figure 3: Refinement strategy. If C1, C2 is obtained from the The Ontology Alignment Evaluation Initiative known as lexical mapping, all the contextual-based mappings which the OAEI campaign is an international campaign for the contradict C1, C2 are removed systematic evaluation of ontology matching systems. A matching strategy; LexicalM is the set of mappings computed by the system is defined by OAEI as a software programs capable of lexical based strategy. The idea is to avoid keeping a couple finding correspondences (called alignments) between the obtained from the context based similarity where one of the vocabularies of a given set of input ontologies (3). The entries is already mapped during the lexical process by another campaign started in 2004 and is mainly motivated by the need to concept. This strategy takes into account the worst case and establish a consensus for the evaluation of the ever increasing allows removing several unwanted mappings and increase the number of methods available for schema matching or ontology recall at the same time. However, it generates noise, and the integration. It is usually associated with Ontology Matching precision obtained with lexical-based mappings is then reduced. (OM) Workshop of the International Semantic Web Conference (ISWC). For the 2012 edition4 of the campaign there were 23 3.6 Processing Disjoints Concepts participating systems for six entity matching problems and three Some knowledge organization systems are described in others for the instance matching problem. This edition was formal languages allowing expression complex axioms and aiming at automated evaluation to a large extent with new test constraints. In particular, declared disjoints concepts can be sets that have been made available. This is the case with the Large Biomedical ontologies track referred to as LargeBio described in the next section. The SEALS platform (18) is used for the automated evaluation of all the systems. The SEALS project is dedicated to the evaluation of semantic web technologies. It created a platform5 for easing this evaluation, organizing evaluation campaigns, and building the community of tool providers and tool users around this evaluation activity. The different participant systems are wrapped according to the SEALS specification before to be uploaded to the platform. The overall process for the OAEI 2012 campaign using this platform is Figure 4: Strategy for processing disjoints concepts described in the campaign web site6. found in certain KOS. As our approach is mainly based the 4.2 The OAEI 2012 LargeBio dataset lexical description of the features of entities, it is possible to find The LargeBio track is one of the most challenging tasks in two concepts lexically similar while they are semantically term of scalability and complexity. The ontologies in this dataset declared as disjoint. In order to avoid such a situation, we have are semantically rich and contain tens of thousands of classes. taken into account these cases during both indexing and Indeed, the track consists of finding alignments between the retrieving phases. Foundational Model of Anatomy (FMA) which contains 78,989 Let’s assume that C1 and C2 are two disjoints OWL concepts, the SNOMED-CT which contains 306,591 concepts, concepts belonging to an ontology O1 and C3 and C4 two other and the National Cancer Institute Thesaurus (NCI) which disjoints concepts belonging to the ontology O2 (figure 4). In contains 66,724 concepts. order to compute the similarity between C1 and C3, we proceed The FMA is a domain ontology that represents a coherent as follows: body of explicit declarative knowledge about human anatomy. It • If it is O2 which is indexed, then C3 must have a field is integrated in the distributed framework of the Anatomy Disjoint_Concept which contains all the generated description Information System developed and maintained by the Structural terms of C4. ServOMap proceeds inversely if O1 is indexed Informatics Group at the University of Washington It is • During the similarity computing phase, when the score concerned with the representation of classes or types and between C1 and C3 is computed, the query is built taking into relationships necessary for the symbolic representation of the account the fact no terms from the field Disjoint_Concept of C1 (i.e. C2) appears in the generated description of C3. Similarly, no 4 terms from the Disjoint_Concept field of C3 (i.e. C4) appears in http://oaei.ontologymatching.org/2012/ 5 the generated description of C1. Thus, we ensure a set of http://www.seals-project.eu/ coherent mappings regarding disjointnes. 6 http://oaei.ontologymatching.org/2012/seals-eval.html phenotypic structure of the human body in a form that is 1:1 mappings and does not use stemming. The two versions are understandable to humans and is also navigable, parseable and freely available for download online7. interpretable by machine-based systems. 4.4 Results SNOMED CT is a clinical healthcare terminology which The evaluation is performed in a server with 16 CPUs and provides a core general terminology for the electronic health allocating 15 Gb RAM. 15 out of 23 participating record (EHR) and contains currently more than 311,000 active systems/configurations have been able to cope with at least one concepts with unique meanings and formal logic-based of the tasks of the LargeBio track matching problems. definitions organized into hierarchies. It is owned, maintained and distributed by the International Health Terminology TABLE III. SERVOMAP-LT PERFORMANCE ON THE LARGEBIO Standard Development Organization (IHTSDO). DATASET The NCI Thesaurus covers vocabulary for clinical care, translational and basic research, and public information and Task Precision Recall F 1- Time (s) administrative activities. It provides reference terminology for measure many National Cancer Institute of the US National Institutes of Health and other systems. FMA-NCI 0.931 0.8 0.86 366 The LargeBio track consisted of three matching problems: FMA-NCI matching problem, FMA-SNOMED matching problem and SNOMED-NCI matching problem. Each matching FMA- 0.956 0.60 0.802 790 problem is divided in three tasks involving different fragments SNOMED of the considered ontologies, i.e. a small fragment of the ontologies, a big fragment and the whole ontologies. This leads SNOMED-NCI 0.875 0.593 0.706 1,248 to 9 sub-tasks. The 2009AA version of the Unified Medical Language System (UMLS) Metathesaurus is used as the basis for the track reference alignments (19). AVERAGE 0.890 0.699 0.780 2,405 4.3 The configurations used for ServOMap The performance of the two versions of the ServOMap As ServOMap is highly flexible, it participated in the system is depicted on Table 3 and 4. We have averaged the campaign with two configurations. They differ by the parameters results obtained on the entire sub-tasks (small, big, and whole). that are used to tune the matching process. These parameters are We refer the reader to the OAEI 2012 LargeBio web page for depicted on Table 2. The first version of the system that we refer the complete results of the evaluation8. Thus, each matching to as ServOMap-lt uses the same processing technique for the problem (FMA-NCI, FMA-SNOMED, SNOMED-NCI) is terms of the entities being matched regardless their language presented in one row. The last entry gives the average of the (English, French, etc.). entire LargeBio track. The last column gives the total computation times. TABLE II. TABLE 1: PARAMETERS USED TO TUNE THE TWO VERSIONS OF THE SYSTEM TABLE IV. SERVOMAP PERFORMANCE ON THE LARGEBIO DATASET ServOMap-lt ServOMAP Terms According to Task Precision Recall F 1- Time (s) The same for processing the language measure all languages of the labels Entities FMA-NCI 0.945 0.747 0.834 327 taken into Only Concepts All Entities account FMA- 0.953 0.656 0.777 893 SNOMED Ontologies One Both indexed SNOMED-NCI 0.901 0.554 0.687 1,089 Searching One way Two ways strategy AVERAGE 0.903 0.657 0.758 2,310 Stemming Yes No Arity of the 1:n 1:1 The best precision is obtained for the FMA-SNOMED mappings matching problem with 95.6% and 95.3% for ServOMap-lt and In addition, only concepts are taken into account contrary to the second version, which we refer to as ServOMap. Also, 7 http://code.google.com/p/servo/ only one of the input ontology is indexed with ServOMap-lt, the 8 second one being used for searching over the index. Finally, ServOMap-lt uses stemming techniques for the labels and it http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2012/results2 performs 1:n mappings while ServOMap takes into account only 012.html ServOMap respectively. The best recall is obtained for the FMA-NCI matching problem. ServOMap-lt obtained 80% while ServOMap obtained 83.4%. We can notice on average that 5. Conclusion and Perspectives We have presented in this paper the main component of the ServOMap-lt provides the best recall (65.7%) while ServOMap ServO Ontology Repository and detailed its ServOMap achieves the best precision (90.3%). Clearly, these results show component for large scale ontology matching. We have reported that ServOMap-lt benefited from 1:n mappings by providing the performance obtained by this component on the LargeBio more correspondences that can be found in the reference track during the 2012 edition of the OAEI campaign. The two alignment. However, this decreased its precision. Another versions of ServOMap achieved very good results both in term explanation of the lower precision is the use of stemming of F-measure and computation times by finishing among the top- techniques which lead to grouping to the same index entry 3 systems and providing mappings with the best precision. We different labels having the same stem. In contrast, ServOMap notice, however, that so far our approach relies heavily on the thanks to the 1:1 mapping strategy was able to provide the most richness of the description of the input ontologies, which used to precise correspondences, but with a lower recall. be the case in the life sciences domain. The efficiency is reduced From the computation time point of view, the SNOMED- for KOS whose mappings must be based more on the structure. NCI task was the longest to complete with respectively 1,248 There is a room of improvement of this research work. seconds (20.8mn) and 1,089 seconds (18.15mn) for ServOMap First, we plan to improve the algorithm used for filtering out the and ServOMap-lt. In contrast, the FMA-NCI matching problem mappings provided by the context-based matching in order to was the fastest to complete. ServOMap-lt performed the task in increase recall without reducing the precision. ServOMap does 366 seconds (6.1mn) while ServOMap finished in 327 seconds not use any external resource in the similarity computing (5.45mn). These results are in line with the size of the ontologies process. We intend to use the UMLS resource for better to match. The SNOMED-NCI is the largest task to process in discarding wrong mappings for the ontologies presents in this term of involved entities. resource. Moreover, the current version does not take into Now let’s compare our system to the other participating account the mapping of two ontologies described in two systems which completed the LargeBio track. According to the different languages. For instance, an ontology with terms in official OAEI results, we have presented the summary of the English to compare with an ontology with terms in German. An top-8 systems in Table 5. According to these figures, improvement of the system is then to implement a cross lingual ServOMap-lt provided the best results in terms of F-measure ontology matching. Finally, we plan introducing logic and precision for the FMA-SNOMED task while ServOMap assessment of computed mappings (21) and implementing a generated the most precise mappings when all the task are user-friendly interface. averaged, with 90.3%. ServOMap-lt finished overall second in term of F-measure with 78% closely behind the YAM++ system 6. Acknowledgment (78.2%) (20). For the computation times, ServOMap finished We thank the organizers of the OAEI evaluation campaigns the entire 9 tasks in 2.310 seconds (38.5 mn) at the second for providing us the test data and Seals infrastructure and the position behind the LogMaplt system (711 seconds) (14) while LargeBio track organizers for their valuable feedback. YAM++ completed them in 18 hours. We mention that GOMMA, YAM++ and LogMap systems use different kinds of background knowledge. LogMap uses normalisations and 7. References spelling variants from the UMLS Lexicon while use the general purpose background knowledge provided by WordNet and 1. Avillach P, Mougin F, Joubert M, Thiessard F, GOMMA reuses mappings from FMA-UMLS and NCI-UMLS. Pariente A, Dufour J-C, et al. A semantic approach for the Please note that the last column of table 5 (Incoherence) homogeneous identification of events in eight patient databases: reports the number of unsatisfiabilities when reasoning using the a contribution to the European eu-ADR project. Stud Health HermiT reasoner with the input ontologies together with the Technol Inform. 2009;150:190–4. computed mappings. The logic assessment of computed 2. Diallo G, Khelif K, Corby O, Kostkova P, Madle G. mappings is not a feature implemented yet into ServOMap. Semantic Browsing of a Domain Specific Resources: The LogMap was the system which provides the cleanest mappings. Corese-NeLI Framework. Web Intelligence/IAT Workshops. 2008. p. 50–4. TABLE V. SUMMARY RESULTS OF THE LARGEBIO TOP 8 SYSTEMSTRACK 3. Shvaiko P, Euzenat J. Ten Challenges for Ontology Matching. In: Meersman R, Tari Z, editors. On the Move to Meaningful Internet Systems: OTM 2008 [Internet]. Springer Berlin / Heidelberg; 2008. p. 1164–82. Available from: http://dx.doi.org/10.1007/978-3-540-88873-4_18 4. Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, Santos CT dos. Ontology Alignment Evaluation Initiative: Six Years of Experience. J. Data Semantics. 2011;15:158–92. 5. Kirsten T, Gross A, Hartung M, Rahm E. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. Journal of Biomedical Semantics. 2011;2(1):6. 6. Ruiz EJ, Grau BC, Zhou Y, Horrocks I. Large-scale Interactive Ontology Matching: Algorithms and Implementation. Proceedings of the 20th European Conference on Artificial 421–5. Available from: http://dblp.uni- Intelligence (ECAI). IOS Press; 2012. p. 444–9. trier.de/db/conf/ekaw/ekaw2012.html#NgoB12 7. Finin T, Peng Y, Scott R, Joel C, Joshi SA, Reddivari 21. Meilicke C, Stuckenschmidt H, Sváb-Zamazal O. A P, et al. Swoogle: A search and metadata engine for the semantic Reasoning-Based Support Tool for Ontology Mapping web. In Proceedings of the Thirteenth ACM Conference on Evaluation. ESWC. 2009. p. 878–82. Information and Knowledge Management. ACM Press; 2004. p. 652–9. 8. d’ Aquin M, Motta E, Sabou M, Angeletou S, Gridinoc L, Lopez V, et al. Toward a New Generation of Semantic Web Applications. IEEE Intelligent Systems. 2008;23:20–8. 9. Côté RG, Jones P, Apweiler R, Hermjakob H. The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. 2006;7:97. 10. Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, et al. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Research. 2009 May 29;37(Web Server):W170–W173. 11. Diallo G. Efficient Building of Local Repository of Distributed Ontologies. IEEE; 2011 [cited 2012 Oct 6]. p. 159– 66. Available from: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber =6120644 12. Diallo G. Towards decentralized and cooperative repositories of distributed ontologies. Proceedings of SWAT4LS 2011. 2011. p. 8–9. 13. Kiryakov A, Damova M. The Semantic Web: Semantic Repositories. Semantic Web Handbook. Springer Verlag, Heidelberg Germany. 2011. 14. Fankam C, Jean S, Pierra G, Bellatreche L, Ait-Ameur Y. Towards Connecting Database Applications to Ontologies. IEEE Computer Society, Conference Publishing Service; 2009. p. 131–7. 15. Schenk S, Petrak J. Sesame RDF Repository Extensions for Remote Querying. Znalosti2008 [Internet]. 2008. Available from: http://znalosti2008.fiit.stuba.sk/download/articles/znalosti2008- Schenk.pdf 16. Ghoula N, Falquet G. Towards an ontology based large repository for managing heterogeneous knowledge resources. E-LKR’12. 2012. 17. Carroll JJ, Dickinson I, Dollin C, Reynolds D, Seaborne A, Wilkinson K. Jena: implementing the semantic web recommendations. Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters [Internet]. New York, NY, USA: ACM; 2004. p. 74–83. Available from: http://doi.acm.org/10.1145/1013367.1013381 18. Esteban-Gutiérrez M, Garcıa-Castro R, Gómez-Pérez A. Executing Evaluations over Semantic. Technologies using the SEALS Platform. IWEST 2010. 2010. 19. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research. 2004;32(Database-Issue):267–70. 20. Ngo D, Bellahsene Z. YAM++ : A Multi-strategy Based Approach for Ontology Matching Task. In: ten Teije A, Völker J, Handschuh S, Stuckenschmidt H, d’ Aquin M, Nikolov A, et al., editors. EKAW [Internet]. Springer; 2012. p.