FCA-Map Results for OAEI 2016 Mengyi Zhao1 and Songmao Zhang2 1,2 Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China 1 myzhao@amss.ac.cn, 2 smzhang@math.ac.cn Abstract. FCA-Map is an automatic ontology matching system based on For- mal Concept Analysis (FCA), which is a well developed mathematical model for analyzing individuals and structuring concepts. More precisely, we construct three types of formal contexts and extracts mappings from the lattices derived. Firstly, token-based formal context describes how class names, labels and syn- onyms share lexical tokens, leading to lexical mappings (anchors) across ontolo- gies. Secondly, relation-based formal context describes how classes are in taxo- nomic or disjoint relationships with the anchors, leading to positive and negative structural evidence for validating the lexical matching. Lastly, after incoherence repair, positive relation-based context can be used to discover additional structural mappings. In this paper, we briefly introduce FCA-Map and its results of three tracks (i.e., Anatomy, Large Biomedical Ontologies, Disease and Phenotype) on OAEI 2016. 1 Presentation of the system Among the first batch of OM algorithms and tools proposed in the early 2000s, FCA- Merge [4] distinguished in using Formal Concept Analysis (FCA) formalism to de- rive mappings from classes sharing textual documents as their individuals. Proposed by Wille [5], FCA is a well developed mathematical model for analyzing individuals and structuring concepts. FCA starts with a formal context consisting of a set of objects, a set of attributes, and their binary relations. Concept lattice, or Galois lattice, can be computed based on formal context, where each node represents a formal concept com- posed of a subset of objects (extent) with their common attributes (intent). The extent and the intent of a formal concept uniquely determine each other in the lattice. Further, a concept hierarchy can be derived where one formal concept becomes sub-concept of the other if its objects are contained in the latter. FCA can be naturally applied to on- tology construction [3], and is also widely used in data analysis, information retrieval, and knowledge discovery. Following the steps of FCA-Merge, several OM systems continued to use FCA as well as its alternative formalisms, exploiting different entities as the sets of objects and attributes for constructing formal contexts [1, 2, 6]. Different types of formal contexts decide the information used for ontology matching, and we observed that some intrinsic and essential knowledge of ontology has not been involved yet, including both textual information within classes (e.g., class names, labels, and synonyms) and relationships among classes (e.g., ISA, sibling, and disjointedness relations). In order to empower FCA with as much as ontological information as possible, we proposed FCA-Map, which generates three types of formal contexts and extracts mappings from the lattices derived. The next sub-sections provide more details about FCA-Map and then discuss our results of OAEI. 1.1 State, purpose, general statement Given two ontologies, FCA-Map builds formal contexts and uses the derived concept lattices to cluster the commonalities among ontology classes, at lexical level and struc- tural level, respectively. Concretely, FCA-Map performs step-by-step as follows. 1. Acquiring anchors lexically. The token-based formal context is constructed, and from its derived concept lattice, a group of lexical anchors A across ontologies can be extracted. 2. Validating anchors structurally. Based on A , the relation-based formal context is constructed, and from its derived concept lattice, positive and negative structural evidence of anchors can be extracted. Moreover, an enhanced alignment A0 without incoherences among anchors is obtained. 3. Discovering additional matches. Based on A0 , the positive relation-based for- mal context is constructed, and from its derived concept lattice, additional matches across ontologies can be identified. 1.2 Specific techniques used The process of our system consists of the following successive steps. Step 1: Constructing the token-based formal context to acquire lexical anchors. The token-based formal context Klex := (Glex , Mlex , Ilex ) is described as follows. Names of ontology classes as well as their labels and synonyms, when available, are exploited after normalization that includes inflection, tokenization, stop word elimina- tion, and punctuation elimination. In Klex , Glex is the set of strings each corresponding to a name, label, or synonym of classes in two ontologies, Mlex is the set of tokens in these strings, and binary relation (g, m) ∈ Ilex holds when string g contains token m, or a synonym or lexical variation of m. For the derived formal concepts, we restrict our attention to formal concepts whose simplified extent or class-origin extent contains ex- actly two strings or classes across ontologies, and extract two types of lexical anchors, namely Type I anchor for the exact match, and Type II anchor for the partial match, respectively. Sept 2: Constructing the relation-based formal context to validate lexical an- chors. Structural relationships of ontologies are exploited to validate the matches ob- tained at the lexical level. [7] proposed using positive and negative structural evidence among anchors for the purpose of validation. In this step, we build the relation-based formal context to obtain both positive and negative structural evidence for lexical an- chors. The relation-based formal context Krel := (Grel , Mrel , Irel ) is described as follows. Classes in two source ontologies are taken as object set Grel , and lexical an- chors prefixed with different relational labels are taken as attribute set Mrel . For exam- ple, relationships ISA, SIBLING-WITH, PART-OF, and DISJOINT-WITH are labeled by “(ISA)”, “(SIB)”, “(PAT)”, and “(I-D)” (or “(D-I)”), respectively. Binary relation (g, m) ∈ Irel holds if g has the corresponding relationship (as in the prefix of m) with the class from the same source ontology as g in the anchor of m. Formal concepts whose extents include both classes in some anchors indicate structural evidence. Such anchors are positive evidence to anchors with label“(ISA)”, “(SIB)” or “(PAT)” in the intent, and vice versa. On the other hand, they are negative evidence to anchors with label “(I-D)” or “(D-I)” in the intent, and vice versa. In this way, positive and negative structural evidence set of each anchor a can be obtained, denoted by P (a) and N (a), respectively. Then we utilize all the positive evidence sets P and negative evidence sets N to eliminate incorrect lexical anchors and retain the correct ones. Setp 3: Constructing the positive relation-based formal context to discover ad- ditional matches. After incoherence repair and screening, anchors retained are those supported both lexically and structurally. Based on the enhanced alignment, FCA-Map goes further to build the positive relation-based formal context aiming to identify new, structural mappings. The way positive relation-based formal context K0rel constructed is similar to Krel , i,e., using classes in two source ontologies as object set and anchors prefixed with relationship labels as attribute set, where disjointedness relationship is no longer necessary. For the derived formal concepts, we restrict our attention to those with exactly two classes across ontologies in the simplified extent. 1.3 Link to the system and parameters file SEALS wrapped version of FCA-Map for OAEI 2016 is available at https://drive.google. com/open?id=0B810qAwN1CIoM0NMV3ZJMzVsTlk. 1.4 Link to the set of provided alignments The results obtained by FCA-Map during OAEI 2016 are available at https://drive.google. com/open?id=0B810qAwN1CIodGdPUjVWY0M3U0U. 2 Results In this section, we present the results of FCA-Map achieved on OAEI 2016. Our system mainly focuses on Anatomy, Large Biomedical Ontologies, Disease and Phenotype. 2.1 Anatomy Track The Anatomy track consists of finding an alignment between the Adult Mouse Anato- myand a part of the NCI Thesaurus describing the human anatomy. The results are shown in Table 1. The evaluation was run on a server with 3.46 GHz (6 cores) and 8GB RAM allocated. FCA-Map ranked fifth in Anatomy track. Matcher Precision Recall F-Measure Runtime (s) AML 0.95 0.936 0.943 47 CroMatcher 0.949 0.902 0.925 573 XMAP 0.929 0.865 0.896 45 LogMapBio 0.888 0.896 0.892 758 FCA-Map 0.932 0.837 0.882 117 Table 1: Results for Anatomy track 2.2 Large BioMed Track The Large BioMed track consists of finding alignments between the Foundational Mod- el of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (N- CI). The results obtained by FCA-Map for the small fragments of the FMA, NCI and SNOMED CT ontologies are summarize in Table 2. The evaluation of first two tasks was run on a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and 15Gb RAM allocated with 2 hours timeout. And the last task was run on a PC with Intel i7-4790 CPU @ 3.60GHz and 8GB RAM allocated. FCA-Map ranks second in the first two tasks. Task Precision Recall F-Measure Runtime (s) FMA-NCI (small) 0.954 0.917 0.935 236 FMA-SNOMED (small) 0.936 0.803 0.865 1,865 SNOMED-NCI (small) 0.914 0.666 0.771 13,542 Table 2: Results of FCA-Map for the Large BioMed Track 2.3 Disease and Phenotype Track The Pistoia Alliance Ontologies Mapping project team organises this track based on a real use case where it is required to find alignments between disease and phenotype ontologies. Specifically, the selected ontologies are the Human Phenotype Ontology (H- PO), the Mammalian Phenotype Ontology (MP), the Human Disease Ontology (DOID), and the Orphanet and Rare Diseases Ontology (ORDO). The evaluation was run on a Ubuntu Laptop with an Intel Core i7-4600U CPU @ 2.10GHz x 4 and 15Gb RAM allocated. Precision Recall F-Measure Sum F-Measure Precision Recall F-Measure Sum F-Measure Matcher Task Silver 2 Silver 2 Silver 2 Silver 2 Silver 3 Silver 3 Silver 3 Silver 3 HP-MP 0.9354 0.9125 0.9238 0.7732 0.9729 0.8617 LogMap 1.8372 1.7828 DOID-ORDO 0.9520 0.8779 0.9134 0.9052 0.9375 0.9211 HP-MP 0.9836 0.7543 0.8539 0.9421 0.9244 0.9332 FCA-Map 1.8162 1.8706 DOID-ORDO 0.9662 0.9586 0.9624 0.8880 0.9926 0.9374 HP-MP 0.9305 0.7998 0.8602 0.8536 0.9446 0.8968 AML 1.7684 1.7714 DOID-ORDO 0.8532 0.9708 0.9082 0.7784 0.9981 0.8747 HP-MP 0.7568 0.9164 0.8290 0.6292 0.9452 0.7555 PhenoMF 1.7149 1.6905 DOID-ORDO 0.9498 0.8301 0.8859 0.9472 0.9233 0.9351 Table 3: Results against silver standard with vote 2 and 3 Table 3 shows the results against the silver standard which is automatically built by voting the outputs of the participating systems. LogMap is the system closer to the mappings voted by at least 2 systems, and FCA-MAP produces results very close to the silver standard with vote 3. 3 General comments This is the first time FCA-Map system participates in the OAEI campaign. It is compet- itive with other systems in some tracks such as Anatomy, Large Biomedical Ontologies, Disease and Phenotype. Three types of formal contexts are constructed one-by-one, and their derived concept lattices are used to cluster the commonalities among classes at lex- ical and structural level, respectively. The tokens shared by two classes in these map- pings are unique to their names. The lexical matching method of FCA-Map is suitable for domain ontologies having class names, labels, or synonyms from domain-specific vocabulary. 4 Conclusions In this paper, we have presented FCA-Map and its results of three tracks (i.e.,Anatomy, Large Biomedical Ontologies, Disease and Phenotype) on OAEI 2016. The evaluation results show the good performance of FCA-Map. Future work would introduce more elements of ontology into FCA-Map including properties, individuals, and logical con- structors and axioms. Optimization techniques for handling large-scale FCA contexts will also be worth exploring. Acknowledgements. This work has been supported by the National Key Research and Development Program of China under grant 2016YFB1000902, the Natural Science Foundation of China under No. 61232015, the Knowledge Innovation Program of the Chinese Academy of Sciences (CAS), Key Lab of Management, Decision and Informa- tion Systems of CAS, and Institute of Computing Technology of CAS. References 1. de Souza, K.X.S., Davis, J.: Aligning ontologies and evaluating concept similarities. In: OT- M Confederated International Conferences” On the Move to Meaningful Internet Systems”, Springer (2004) 1012–1029 2. Guan-yu, L., Shu-peng, L., et al.: Formal concept analysis based ontology merging method. In: Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on. Volume 8., IEEE (2010) 279–282 3. Obitko, M., Snsel, V., Smid, J.: Ontology design with formal concept analysis. CLA 128(3) (2004) 1377–1390 4. Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: IJCAI. Vol- ume 1. (2001) 225–230 5. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Ordered sets. Springer (1982) 445–470 6. Xu, X., Wu, Y., Chen, J.: Fuzzy fca based ontology mapping. In: 2010 First International Conference on Networking and Distributed Computing, IEEE (2010) 181–185 7. Zhang, S., Bodenreider, O.: Experience in aligning anatomical ontologies. International jour- nal on Semantic Web and information systems 3(2) (2007) 1