-

Proceedings of the 3rd International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data co-located with the Extended Semantic Web Conference 2025

Rima Dessi

0 3

Joy Jeenu

joy.jeenu@fiz-karlsruhe.de 0 2

Danilo Dessi

0 1

Francesco Osborne

francesco.osborne@open.ac.uk 0 4

Hidir Aras

hidir.aras@fiz-karlsruhe.de 0 2 0 - Rima Dessi, Higher Colleges of Technology , Sharjah, UAE. - Joy Jeenu, FIZ Karlsruhe, Germany. - Danilo Dessi , College of Computing and Informatics, University of Sharjah , Sharjah, UAE. - Francesco Osborne , The Open University , Milton Keynes , United Kingdom. - Hidir Aras, FIZ Karlsruhe , Germany 1 College of Computing and Informatics, University of Sharjah , Sharjah, UAE 2 FIZ Karlsruhe - Leibniz Institute for Information Infrastructure , Germany 3 Higher Colleges of Technology , Sharjah, UAE 4 Knowledge Media Institute, The Open University , Milton Keynes , United Kingdom

Committee The rapid expansion of publicly available scientific, technical, and legal documents, such as patents, reports, and research articles, has made large-scale processing and analysis increasingly vital. Every day, researchers, patent analysts, inventors, and other professionals (e.g., information scientists, legal experts) contribute to this growing body of data through publications, technical reports, and patent filings. However, analyzing and navigating these documents presents significant challenges

Rima Dessi, Joy Jeenu, Danilo Dessi, Francesco Osborne, and Hidir Aras due to their considerable length, the use of specialized vocabulary, and their coverage of diverse scientific domains. These documents are often semi-structured, combining unstructured text with structured elements such as tables, mathematical expressions, diagrams, and domain-specific content like chemical compounds or biological sequences. This combination of formats and domain complexity significantly increases the dificulty of processing them efectively.

To benefit from the scientific and technical knowledge present in such documents, such as for decision-making or professional search and analytics, there is an urgent need to analyze, enrich, and link this data by employing state-of-the-art Semantic Web technologies and AI methods. However, as they are heterogeneous and are written using domain-specific terminology, applying the existing semantic technologies is not straightforward. To address the challenges mentioned above, Semantic Web Technologies, Natural Language Processing (NLP) techniques, and Deep Neural Networks (DNN) must be leveraged to provide eficient and efective solutions for creating easily accessible and machine-understandable knowledge of science and industry.

To this end, the goal of the SemTech4STLD workshop5 was to provide a meeting forum for academics and industry professionals to come together and discuss topics such as the application of Semantic Web Technologies and Deep Learning Models to scientific, technical, and legal data. Further, the primary objective of the workshop was to promote collaboration among the participants and exchange ideas. The workshop started with a keynote entitled “Evaluation Challenges in Using Generative AI for Science and Technical Contents” by Prof. Dr. Paul Groth.

Overall, the workshop’s success can be demonstrated by the high number of participants. Further, during the workshop, many participants joined the discussions, asked questions, and exchanged ideas about the application of Semantic Web Technologies and Machine Learning models on Scientific, Technical, and Legal Data. We believe this workshop has helped participants build a new network and has encouraged future projects. We definitely plan to organize the 4th edition of this workshop. Keynote Talk Prof. Dr. Paul Groth Keynote on Evaluation Challenges in Using Generative AI for Science & Technical Content .

Keynote Abstract:

Foundation Models show impressive results in a wide range of tasks on scientific and legal content from information extraction to question answering and even literature synthesis. However, standard evaluation approaches (e.g., comparing to ground truth) often do not seem to work. Qualitatively, the results look great, but quantitative scores do not align with these observations. In this talk, I discuss the challenges we have faced in our lab in evaluation. I then outline potential routes forward. Papers presented at SemTech4STLD Evaluating LLMs for Named Entity Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning. Davide Buscaldi, Danilo Desı,s` Francesco Osborne, Davide Piras and Diego Reforgiato Recupero 5 https://semtech4stld.github.io/ June 2025 Contents

Rima Desı,s` Joy Jeenu, Danilo Desı,s` Francesco Osborne, and Hidir Aras

Taming Hallucinations: A Semantic Matching Evaluation Framework for LLM-Generated Ontologies. Nadeen Fathallah, Stefen Staab and Alsayed Algergawy Benchmarking Large Language Models for Sustainable Development Goals Classification: Evaluating In-Context Learning and Fine-Tuning Strategies. Andrea Cadeddu, Alessandro Chessa, Vincenzo De Leo, Gianni Fenu, Enrico Motta, Francesco Osborne, Diego Reforgiato Recupero, Angelo Salatino and Luca Secchi Enabling Natural Language Access to BIM Models with AI and Knowledge Graphs. Andrea Ibba, Ruebn´ Alonso and Diego Reforgiato Recupero