Fine-TOM Matcher Results for OAEI 2021 Leon Knorr1[0000−0003−4117−2629] and Jan Portisch1,2[0000−0001−5420−0663] 1 SAP SE, Walldorf, Germany {leon.knorr, jan.portisch}@sap.com 2 Data and Web Science Group, University of Mannheim, Germany jan@informatik.uni-mannheim.de Abstract. In this paper, the Fine-Tuned Transformes for Ontology mat- ching (Fine-TOM) matching system is presented along with the results it achieved during its first participation in the Ontology Alingment Eval- uation Initiative (OAEI) campaign (2021). The system uses the publicly available albert-base-v2 model, which has been fine-tuned with a training dataset that includes 20% of each reference alignment from the Anatomy, Conference, and Knowledge Graph track, as well as a wide variety of gen- erated false examples. The model is then used by a separate matching pipeline which calculates a confidence score for each correspondence. In the submitted docker container, only the matching pipeline with an al- ready fine-tuned model is included.3 Keywords: Ontology Matching · Ontology Alignment · Language Mod- els · Transformers · Fine-Tuning. 1 Presentation of the System 1.1 State, purpose, general statement Fine-Tuned Transformers for Ontology Matching (Fine-TOM) is a transformer- based matching system. It consists of two separate pipelines, a pipeline for gener- ating training data and model training, and a matching pipeline which performs the actual matching task. Both can be executed individually or in a row. Each pipeline uses predefined components, which are included in the Matching Eval- uation Toolkit (MELT) [6], a framework for ontology matching and evaluation. In particular, the new transformer extension of MELT [7] is used. For the submission, only the matching pipeline was packaged in a docker con- tainer using the Melt Web Interface 4 , where a fine-tuned albert-base-v2 model is included. This model was fine-tuned beforehand with a training set that included 20% of the reference alignments of the Anatomy, Conference, and Knowledge Graph track, as well as generated negative examples. This year’s submission marks the first introduction of the Fine-TOM system to the OAEI. 3 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 4 https://dwslab.github.io/melt/matcher-packaging/web# web-interface-http-matching-interface 2 L. Knorr et al. 1.2 Specific Techniques Used Transformer-based language models One possible solution to solving NLP problems is the use of transformers. The initial transformer was introduced by Google in 2017 and uses a, so called, Self-Attention Architecture [13], which is said to be more parallelizable and requires significantly less time to train. Today, the NLP domain mostly adapted the use of transformers and they became the de facto standard for most NLP tasks like text translation and classification [13,4]. As a result, today, there are many different transformer models available, e.g. bert-base-cased [4] and gpt-2 [11]. All of them are using different variations of the initial self-attention architecture. Fine-Tuning In order to achieve good results, a transformer needs to be ini- tially trained on a large amount of training data. This process is also called pre-training. As it requires a vast amount of data as well as processing power to pre-train a transformer model, most models are pre-trained on a specific task, like next sentence prediction and then uploaded to huggingface5 [14] where they are available for download as well and can be tested in web demos. This initial training process has a great impact on how the selected model will perform later on. As most transformers are trained for conventional tasks like text summa- rization, next sentence prediction, or review classification [14,13], they are not suitable for other tasks, in this case ontology matching, right out of the box. Therefore, transformers can be re-trained or fine-tuned to perform other or sim- ilar tasks. This process is usually computationally cheaper than the pre-training process. However, quality training data is needed, which has to consist of positive as well as negative examples. Because training data is currently not available, Fine-TOM includes a training pipeline, which generates training data based on a fraction of already known reference alignments. During the development of Fine-TOM, different BERT models were fine-tuned and evaluated on the Anatomy [1], Conference [2], and Knowledge Graph [8,5] track. Based on the data gathered, the best performing configuration was de- termined which uses the albert-base-v2 model and is further explained in the following. 1.3 Fine-TOM architecture The Fine-TOM matching system consists of two individual pipelines, as shown in Figure 1: – A trainging pipeline, which handles the Fine-Tuning process of a transformer and saves it to the disk – a matching pipeline, which will perform the actual matching task and is based on the architecture presented in the TOM paper [9]. This architecture can also be used to run transformers with a zero shot approach, by only executing the matching pipeline with a pre-trained model. 5 https://huggingface.co Fine-TOM Matcher Results for OAEI 2021 3 Fig. 1. High-level view of the Fine-TOM matching process. Training Pipeline The Training Pipeline, shown in Figure 2, consists of sev- eral predefined components of the MELT [7] framework. First, a recall matcher will create an alignment between the two ontologies O1 and O2 , which acts as a the basis for generating training data. It usually does not feature a high pre- cision score, but a good recall. Thus, many correspondences included are not a correct match. Therefore, it marks a good starting point for generating training data. After that, a mechanism for generating negatives will create the actual training dataset, by sampling a configurable fraction f from a already known reference alignment. Internal experiments showed that 20-40% of a reference alignment have the best work-to-performance ratio. Thus, the model included in Fine-TOM has been trained with a sampling rate of 20%. These sampled correspondences mark the positive examples that a training set has to include. In order to add negatives examples to this training set, the mechanism takes the alignment generated by the Recall Matcher as an input. On the assumption that the perfect solution is of a one-to-one parity, and since for some entities the correct match is known through sampling the reference alignment, negative examples can now be picked from the alignment of the recall matcher, thus re- sulting in a training set that includes positive as well as negative examples. This training set is then passed on to the transformer fine-tuning component of the MELT Framework [7], which will then fine-tune the selected model and save it to the disk. 4 L. Knorr et al. Fig. 2. High-level view of the Fine-TOM training process. Matching Pipeline The Matching Pipeline, shown in Figure 3, also consists of several predefined components of the MELT Framework. As in the Training Pipeline, a recall matcher is used as a starting point, thus, marking the theoreti- cally highest recall that can be achieved with this matching system. The resulting alignment will then be processed by a confidence splitter, which will delete all correspondences that are simple string matches and have a confidence level of 1.0, as well as their entities from the alignment returned by the recall matcher. These correspondences are then saved temporarily into a separate alignment, so they will not get reclassified by the transformer model. Then the cleaned up alignment is passed on to a transformer filter, which will load the previously fine-tuned transformer model from the disk and add another confidence score to each correspondence in the alignment. In order to make use of this newly added confidence level, and to eliminate correspondences the transformer classified as a bad match by a low confidence score, a confidence filter is used. It will “cut off” the alignment by a certain threshold which can be configured. Fine-TOM uses the same threshold of 0.8 as proposed by the TOM paper [9]. After all matches with a lower confidence score have been removed from the processed alignment, the previously removed correspondences with a confidence score of 1.0 are added to the alignment again. Since most OAEI datasets are typically of one-to-one arity, an efficient implementation of the Hungarian method, known as Maximum Weight Bipartite Matching (MWBM) [3] was used to create the final alignment and therefore the final result. All matching components are explained in more detail below. Fig. 3. High-level view of the Fine-TOM Matching process. Fine-TOM Matcher Results for OAEI 2021 5 Recall Matcher The recall matcher uses a variety of string comparisons in order to generate an alignment, which marks a high recall on the expense of a rather low precision. It includes a simple string matching mechanism which compares each textual representation of an entity character by character, if a match is found, it is added to the result alignment and a confidence of 1.0 is assigned to this correspondence. Besides this mechanism it also counts how often each word of a text representation is included in the other one, if this similarity surpasses a configurable threshold, the correspondence is also added to the result alignment but only with a low confidence of 0.1. Confidence Splitter As described earlier, the confidence splitter takes an alignment as input and removes every correspondence with a confidence score of 1.0, as well as every other correspondence of the entities included in the removed correspondence. This is done in order to prevent a reclassification of these rather “save” matches by another component in the pipeline. Therefore, the confidence splitter is also able to add the alignment, which was saved during the splitting process, to an alignment that has been passed on to it as an input. Transformer Filter The transformer filter iterates over the alignment, which has been passed on to it as an input, and processes each correspondence in- dividually by calling a separate Python server which is running locally in the background. This is needed because the transformer models themselves are im- plemented in Python, where as the matching components and pipeline is im- plemented in Java. Each pair of textual representations received by the Python server is processed by the selected model, which can either be loaded from the disk or it can be sourced from the huggingface library. This transformer model will then provide a confidence level, which is send back to the transformer fil- ter class and added to the actual correspondence in the alignment, therefore classifying each correspondence. Confidence Filter The confidence filter will exclude every correspondence with a confidence score lower than a configurable threshold. This is needed since the transformer filter itself does not remove any correspondences from the alignment, it just reclassifies them. Therefore, in order to exclude matches that have been marked as a bad match by a low confidence, the confidence filter is needed. Max Weight Bipartite Extractor The alignment generated by matching components can include multiple correspondences for an ontology element. How- ever, the assumption was made earlier that the solution for the posed ontology matching problem is of a one-to-one arity. Therefore, the alignment provided as an input to the max weight bipartite extractor needs to be converted into an alignment with a one-to-one arity. In order to do that, an efficient implementa- tion of the Hungarian method, known as Maximum Weight Bipartite Matching (MWBM) [3] was used. 6 L. Knorr et al. 2 Results This section discusses the results of Fine-TOM during the OAEI 2021 campaign. Only the Anatomy [1], Conference [2], and Knowledge Graph [8,5] tracks are included, since the matching system was only designed and trained for them. 2.1 Anatomy Precision Recall F-Measure StringEquiv 0.997 0.622 0.766 TOM 0.933 0.808 0.866 Fine-TOM 0.916 0.794 0.851 Table 1. Results on the Anatomy track according to the OAEI 2021 campaign The results6 of Fine-TOM on the Anatomy track are depicted in Table 1. As shown, Fine-TOM was able to outpeform the OAEI StringEquiv matcher in terms of recall and the f-measure, although its precision was lower. This proves that the Fine-TOM matching system is able to find matches that can not be found by checking for string equivalence. However, if compared to the TOM matching system, which is strongly related to Fine-TOM as they share a similar architecture with regards to the matching pipeline, Fine-Tom achieved slightly lower scores (˜1-1.5%) for all measures shown. That is a rather interest- ing result, as the transformers used in the TOM paper are not re-trained with domain specific data, nor were they pre-trained with data of an ontology match- ing task. Nevertheless, TOM has one advantage: it uses the Sentence-BERT transformer model paraphrase-TinyBERT-L6-v2 [12], whereas Fine-TOM uses a fine-tuned version of the albert-base-v2 model. These Sentence-BERT models are pre-trained and designed to find semantic textual similarities between input sequences [12]. The albert-base-v2 model on the other hand, is a variation of the BERT model, and was trained for masked language modelling [10], which is a completely different task compared to ontology matching. Therefore, it is remarkable that Fine-TOM was able to achieve such a similar score to TOM. This demonstrates the impact the fine-tuning process has on the performance of a matching system that includes a transformer model. Since MELT did not support Sentence-BERT transformers at the time of Fine-TOMs development, they could not be evaluated in time for Fine-TOMs OAEI 2021 submission. 2.2 Conference As shown in Table7 2, Fine-TOM, did achieve a higher F-Measure and recall on the Conference track than the OAEI StringEquiv matcher and TOM. It was, therefore, able to find more correct correspondences than both systems. 6 official result page: http://oaei.ontologymatching.org/2021/results/anatomy/index.html 7 official result page: http://oaei.ontologymatching.org/2021/results/conference/ Fine-TOM Matcher Results for OAEI 2021 7 Precision Recall F-Measure StringEquiv 0.76 0.41 0.53 TOM 0.69 0.48 0.57 Fine-TOM 0.64 0.53 0.58 Table 2. Results on the Conference track according to the OAEI 2021 campaign 2.3 Knowledge Graph On Knowledge Graph, Fine-TOM was able to achieve slightly better results as the OAEI baseline, as shown in Table 3. Precision Recall F-Measure BaselineLabel 0.95 0.71 0.81 Fine-TOM 0.92 0.75 0.83 Table 3. Results on the Conference track according to the OAEI 2021 campaign 3 General Comments We thank the OAEI organizers for their support and commitment. 4 Conclusion In this paper, the Fine-TOM matching system has been presented. First, a new pipeline architecture that includes a dedicated training pipeline and a match- ing pipeline was introduced. This training pipeline first generates a training set based on reference alignments and a high recall matcher, which is then used to re-train a selected model. The model is then injected in a so called matching pipeline. It then performs the actual matching process by using different filters. The results showed that transformers can improve the overall performance of matching systems in terms of recall and the f-measure. Besides that, the simi- lar results of TOM and Fine-TOM proved that fine-tuning has a great impact on the performance of transformer models, since the model used by Fine-TOM has not been pre-trained for ontology matching or to find semantic similarities between input sequences. Therefore, the presented approach promises a lot of potential for further increases in performance in the future, by using a differ- ent model, e.g. a Sentence-BERT model, or by improving or changing different pipeline components like the high recall matcher. In addition to that, this year’s submission marks the first participation for the Fine-TOM matching system in an OAEI campaign and the results reported are promising and motivate further research in the area of transformer-based ontology and instance matching. 8 L. Knorr et al. References 1. Bodenreider, O., Hayamizu, T.F., Ringwald, M., de Coronado, S., Zhang, S.: Of mice and men: Aligning mouse and human anatomies. In: AMIA 2005, Ameri- can Medical Informatics Association Annual Symposium, Washington, DC, USA, October 22-26, 2005. AMIA (2005) 2. Cheatham, M., Hitzler, P.: Conference v2.0: An uncertain version of the OAEI conference benchmark. In: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II. Lecture Notes in Computer Science, vol. 8797, pp. 33–48. Springer (2014) 3. Cruz, I.F., Antonelli, F.P., Stroe, C.: Efficient selection of mappings and automatic quality-driven combination of matching methods. In: Proceedings of the 4th In- ternational Workshop on Ontology Matching (OM-2009) collocated with the 8th International Semantic Web Conference (ISWC-2009) Chantilly, USA, October 25, 2009. CEUR Workshop Proceedings, vol. 551. CEUR-WS.org (2009) 4. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirec- tional transformers for language understanding. CoRR abs/1810.04805 (2018) 5. Hertling, S., Paulheim, H.: The knowledge graph track at OAEI - gold standards, baselines, and the golden hammer bias. In: The Semantic Web - 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Pro- ceedings. Lecture Notes in Computer Science, vol. 12123, pp. 343–359. Springer (2020) 6. Hertling, S., Portisch, J., Paulheim, H.: MELT - matching evaluation toolkit. In: Semantic Systems. The Power of AI and Knowledge Graphs - 15th International Conference, SEMANTiCS 2019, Karlsruhe, Germany, September 9-12, 2019, Pro- ceedings. Lecture Notes in Computer Science, vol. 11702, pp. 231–245. Springer (2019) 7. Hertling, S., Portisch, J., Paulheim, H.: Matching with transformers in MELT. CoRR abs/2109.07401 (2021) 8. Hofmann, A., Perchani, S., Portisch, J., Hertling, S., Paulheim, H.: Dbkwik: To- wards knowledge graph creation from thousands of wikis. In: Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, October 23rd - to - 25th, 2017. CEUR Workshop Proceedings, vol. 1963. CEUR-WS.org (2017) 9. Kossack, D., Borg, N., Knorr, L., Portisch, J.: TOM matcher results for OAEI 2021. In: OM@ISWC 2021 (2021), to appear 10. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: AL- BERT: A lite BERT for self-supervised learning of language representations. CoRR abs/1909.11942 (2019) 11. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018) 12. Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert- networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (11 2019) 13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. CoRR abs/1706.03762 (2017) 14. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Brew, J.: Huggingface’s transformers: State- of-the-art natural language processing. CoRR abs/1910.03771 (2019)