Neural Class Expression Synthesis N’Dah Jean Kouagou1,* , Stefan Heindorf1 , Caglar Demir1 and Axel-Cyrille Ngonga Ngomo1 1 Department of Computer Science, Paderborn University, Warburger Str. 100, Paderborn, 33098, Germany Abstract Many applications require explainable node classification in knowledge graphs. Towards this end, a popular “white-box” approach is class expression learning: Given sets of positive and negative nodes, class expressions in description logics are learned that separate positive from negative nodes. Most existing approaches are search-based approaches generating many candidate class expressions and selecting the best one. However, they often take a long time to find suitable class expressions. In this paper, we cast class expression learning as a translation problem and propose a new family of class expression learning approaches which we dub neural class expression synthesizers. Training examples are “translated” into class expressions in a fashion akin to machine translation. Consequently, our synthesizers are not subject to the runtime limitations of search-based approaches. We study three instances of this novel family of approaches based on LSTMs, GRUs, and set transformers (STs), respectively. An evaluation of our approach on four benchmark datasets suggests that it can effectively synthesize high-quality class expressions with respect to the input examples in approximately one second on average. Moreover, a comparison to state-of-the-art approaches suggests that we achieve better F-measures on large datasets. For reproducibility purposes, we provide our implementation as well as pretrained models in our public GitHub repository at https://github.com/dice-group/NeuralClassExpressionSynthesis Keywords Neural network, Concept learning, Class expression learning, Learning from examples, NCES Class expression learning (CEL) approaches learn a class expression that describes individuals provided as positive examples. They are applied in a wide range of domains, including ontology engineering, bio-medicine, and Industry 4.0. Several methods have been proposed to address CEL. The state of the art consists of approaches based on refinement operators [1, 2], and evolutionary algorithms [3]. However, the majority of these approaches suffer from scalability issues because they explore an infinite conceptual space for each learning problem. We propose a new family of self-supervised neuro-symbolic approaches dubbed neural class expression synthesis (NCES) approaches for CEL. NCES [4] instances view CEL as a machine translation problem by translating from the language of example embeddings to that of description logics (or any other logic for that matter). Overall, neural class expression synthesizers work as follows: First, a given knowledge base is converted into a set of triples (𝑠, 𝑝, 𝑜) and then embedded into a continuous vector space such NeSy 2023, 17th International Workshop on Neural-Symbolic Learning and Reasoning, Certosa di Pontignano, Siena, Italy * Corresponding author. $ ndah.jean.kouagou@upb.de (N. J. Kouagou); heindorf@upb.de (S. Heindorf); caglar.demir@upb.de (C. Demir); axel.ngonga@upb.de (A. N. Ngomo)  0000-0002-4217-897X (N. J. Kouagou); 0000-0002-4525-6865 (S. Heindorf); 0000-0001-8970-3850 (C. Demir); 0000-0001-8970-3850 (A. N. Ngomo) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 Evaluation results per approach and dataset. The star (*) indicates the Wilcoxon statistical significance between NCES and the best search-based approach. NCES uses the ensemble of GRU, LSTM, and ST. F1 (%) ↑ Runtime (sec.) ↓ Carcinogen. Mutagenesis Sem. Bible Vicodi Carcinogen. Mutagenesis Sem. Bible Vicodi CELOE 37.92±44.25 82.95±33.48 93.18±17.52* 35.66±42.06 239.58±132.59 92.46±125.69 135.30±139.95 289.95±103.63 EvoLearner 91.48±14.30 93.27±12.95 91.88±10.14 92.74±10.28 54.73±25.86 48.00±31.38 17.16±9.20 213.78±81.03 NCES 97.06±13.06* 91.39±22.91 87.11±24.05 95.51±12.14* 0.27*±0.00 0.31*±0.00 0.15*±0.00 0.15*±0.00 as R𝑑 . Next, learning problems are generated automatically from the input knowledge base using a refinement operator and an instance checker. Finally, the synthesizers are trained to translate the embeddings of positive/negative examples to the corresponding class expressions in the training data. If necessary, the second and third step can be iterated until a stopping criterion (e.g., convergence) is fulfilled. We compared NCES to state-of-the-art algorithms for CEL, including CELOE [1] and Ev- oLearner [3], on the popular benchmarks Carcinogenesis, Mutagenesis, Semantic Bible, and Vicodi. We measure the performance of each approach in terms of runtime and number of positive/negative examples covered/ruled out by the computed solution. Table 1 gives the results of our experiments on a total of 380 learning problems. These results suggest that post training, NCES instances are over 300 times faster on average than search-based approaches. Moreover, they perform particularly well on the largest datasets Carcinogenesis and Vicodi with up to 5.5% absolute improvement in F-measure. Acknowledgments This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant No 860801 and the European Union’s Horizon Europe research and innovation programme under the grant No 101070305. This work has also been supported by the Ministry of Culture and Science of North Rhine-Westphalia (MKW NRW) within the project SAIL under the grant No NW21-059D and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): TRR 318/1 2021 – 438445824. References [1] J. Lehmann, S. Auer, L. Bühmann, S. Tramp, Class expression learning for ontology engineering, J. Web Semant. 9 (2011) 71–81. [2] N. J. Kouagou, S. Heindorf, C. Demir, A. N. Ngomo, Learning concept lengths accelerates concept learning in ALC, in: ESWC, volume 13261 of LNCS, Springer, 2022, pp. 236–252. [3] S. Heindorf, L. Blübaum, N. Düsterhus, T. Werner, V. N. Golani, C. Demir, A. Ngonga Ngomo, Evolearner: Learning description logics with evolutionary algorithms, in: WWW, ACM, 2022, pp. 818–828. [4] N. J. Kouagou, S. Heindorf, C. Demir, A.-C. Ngonga Ngomo, Neural class expression synthesis, in: ESWC, volume 13870 of LNCS, Springer, 2023, pp. 209–226.