<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Proceedings of the 2nd International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data co-located with the Extended Semantic Web Conference 2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rima Dessi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Dessi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Osborne</string-name>
          <email>francesco.osborne@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hidir Aras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>- Rima Dessi, FIZ Karlsruhe, Germany. - Danilo Dessi, GESIS Leibniz Institute for the Social Sciences</institution>
          ,
          <addr-line>Cologne, Germany. - Francesco Osborne</addr-line>
          ,
          <institution>The Open University</institution>
          ,
          <addr-line>Milton Keynes</addr-line>
          ,
          <country country="UK">United Kingdom.</country>
          <addr-line>- Hidir Aras, FIZ Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>- Ruben ́ Alonso, Teel ́c ́om Paris, R2M Solution Srl</institution>
          ,
          <addr-line>Italy. - Ahmad Alrifai, FIZ Karlsruhe, Germany. - Davide Buscaldi</addr-line>
          ,
          <institution>Sorbonne Paris North University, France. - Pablo Calleja, Polytechnic University of Madrid</institution>
          ,
          <addr-line>Spain. - Mathieu D A</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>GESIS Leibniz Institute for the Social Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Knowledge Media Institute, The Open University</institution>
          ,
          <addr-line>Milton Keynes</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Committee</p>
    </sec>
    <sec id="sec-2">
      <title>Preface</title>
      <p>The rapid growth of online available scientific, technical, and legal data such as patents, reports,
articles, etc. has made the large-scale analysis and processing of such documents a crucial task. Today,
scientists, patent experts, inventors, and other information professionals (e.g., information scientists,
lawyers, etc.) contribute to this data every day by publishing articles, writing technical reports, or
patent applications. It is a challenging task to process, analyze, and explore these documents due
to their length, the use of domain-specific vocabulary, and the complexity introduced by targeting
various scientific fields and domains. These semi-structured types of documents cover unstructured
textual parts and structured parts such as tables, mathematical formulas, diagrams, and
domainspecific information such as chemical names, bio-sequences, etc. Such kind of information brings
complexity in processing such documents.</p>
      <p>In order to benefit from the scientific-technical knowledge present in such documents, e.g., for
decision-making or for professional search and analytics, there is an urgent need for analyzing,
enriching, and linking such data by employing state-of-the-art Semantic Web technologies and AI
methods. However, as they are heterogeneous and are written using domain-specific terminology
applying the existing semantic technologies is not straightforward.</p>
      <p>To address the challenges mentioned above, Semantic Web Technologies, Natural Language
Processing (NLP) techniques, and Deep Neural Networks (DNN) must be leveraged in order to provide
eficient and efective solutions for creating easily accessible and machine-understandable knowledge
of science and industry.</p>
      <p>To this end, the goal of the organized workshop4 was to provide a meeting forum for people from
academia as well as industry to come together and discuss topics such as the application of Semantic
Web Technologies and Deep Learning Models to scientific, technical, and legal data. Further, the
primary objective of the workshop was to promote collaboration among the participants and
exchange ideas. The workshop started with a keynote entitled “Understanding Scientific and Societal
Adoption of Scientific Knowledge and Resources Through NLP and Knowledge Graphs” by Prof.
Dr. Stefan Dietze. An invited talk was also given on “Semantic Web and Machine Learning Systems
for Intelligent Systems in Complex Domains” by Prof. Dr. Marta Sabou. These talks led to very
insightful discussions within the community.</p>
      <p>Overall, the workshop’s success can be demonstrated by the high number of participants and
submissions. Further, during the workshop, many participants joined the discussions, asked questions,
and exchanged ideas about the application of Semantic Web Technologies and Machine Learning
models on Scientific, Technical, and Legal Data. We believe this workshop helped participants build
a new network and encourage future projects related to the mentioned topics. We definitely plan to
organize the 3rd edition of this workshop.</p>
      <sec id="sec-2-1">
        <title>Keynote on Understanding Scientific and Societal Adoption of Scientific Knowledge and</title>
      </sec>
      <sec id="sec-2-2">
        <title>Resources Through NLP and Knowledge Graphs .</title>
        <sec id="sec-2-2-1">
          <title>Keynote Abstract:</title>
          <p>Scientific discourse is scattered across unstructured scholarly publications and increasingly takes
place online, e.g. in news or social media. Understanding the state-of-the-art in specific research fields,
involved data, software, or methods, and their impact on both science and society requires substantial
eforts and has become increasingly challenging. At the same time, societal debates about topics such
as COVID or climate change have demonstrated the impact of science discourse on public opinion,
policies, and society as a whole. This talk will provide an overview of a range of works that use deep
learning-based NLP, such as PLMs and LLMs, to construct and use knowledge graphs about scientific
discourse. These include, on the one hand, approaches that extract metadata about scholarly entities,
such as code, data, tasks or machine learning models from scientific publications to enable
machineinterpretable research information and understand dependencies between scholarly artefacts. On
the other hand, we introduce NLP methods and knowledge graphs that enable an understanding
4 https://semtech4stld.github.io/
May 2024</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Contents</title>
      <sec id="sec-3-1">
        <title>Keynote Talk</title>
        <p>Keynote by Prof. Dr. Stefan Dietze</p>
        <sec id="sec-3-1-1">
          <title>Rima Desı,s` Danilo Desı,s` Francesco Osborne, and Hidir Aras</title>
          <p>Title Suppressed Due to Excessive Length
of societal discourse about science, e.g. on Twitter/X, and facilitate interdisciplinary research into
(mis-)representation and -information of scientific claims and findings in societal debates.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Invited Talk</title>
        <p>Invited Talk by Prof. Dr. Marta Sabou</p>
      </sec>
      <sec id="sec-3-3">
        <title>Invited Talk on Semantic Web and Machine Learning Systems for Intelligent Systems in</title>
      </sec>
      <sec id="sec-3-4">
        <title>Complex Domains.</title>
        <p>Invited Talk Abstract: Creating intelligent applications that valorize complex domain data such as in
the scientific, technical, and legal domain often calls for solutions that combine learning and symbolic
artificial intelligence (AI) methods. In line with such developments, in the first part of this talk, we
focus on describing a new sub-area of AI that focuses on combining Machine Learning components
with techniques developed by the Semantic Web community—Semantic Web Machine Learning
(SWeML). We report on the results of a systematic mapping study during which we analysed nearly
500 papers published in the past decade in this area, where we focused on evaluating architectural
and application-specific features of such systems. In the second part of the talk, we describe the
development and evaluation of a concrete SWeML system that aims to extract key elements from
oficial Austrian permits, including the Issuing Authority, the Operator of the facility in question,
the Reference Number, and the Issuing Date. We hope that our lessons learned both about this area
as a whole (through the survey of SWeML systems) and the concrete system we built will provide
inspiration for researchers and practitioners working with such complex data as in the legal domain
and beyond.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Paper Session I</title>
      </sec>
      <sec id="sec-3-6">
        <title>GerPS-NER: A Dataset for Named Entity Recognition to Support Public Service Process Creation in Germany .</title>
        <p>Leila Feddoul, Sarah T. Bachinger, Clara Lachenmaier, Sebastian Apel, Pirmin Karg, Norman
Klewer, Denys Forshayt, Robin Erd and Marianne Mauch</p>
      </sec>
      <sec id="sec-3-7">
        <title>Automating Citation Placement with Natural Language Processing and Transformers.</title>
        <p>Davide Buscaldi, Danilo Desı,s` Enrico Motta, Marco Murgia, Francesco Osborne and Diego
Reforgiato Recupero</p>
      </sec>
      <sec id="sec-3-8">
        <title>Combining Knowledge Graphs and Large Language Models to Ease Knowledge Access in Software Architecture Research.</title>
        <p>Angelika Kaplan, Jan Keim, Marco Schneider, Anne Koziolek and Ralf Reussner</p>
      </sec>
      <sec id="sec-3-9">
        <title>Paper Session II</title>
      </sec>
      <sec id="sec-3-10">
        <title>Extracting license information from web resources with a Large Language Model.</title>
        <p>Enrico Daga, Jason Carvalho and Alba Catalina Morales Tirado</p>
      </sec>
      <sec id="sec-3-11">
        <title>ChatGPT vs. Google Gemini: Assessing AI Frontiers for Patent Prior Art Search Using</title>
      </sec>
      <sec id="sec-3-12">
        <title>European Search Reports.</title>
        <p>Renukswamy Chikkamath, Ankit Sharma, Christoph Hewel and Markus Endres</p>
      </sec>
      <sec id="sec-3-13">
        <title>Bridging the Innovation Gap: Leveraging Patent Information for Scientists by Constructing a Patent-centric Knowledge Graph .</title>
        <p>Hidir Aras, Rima Dessi, Farag Saad and Lei Zhang</p>
      </sec>
      <sec id="sec-3-14">
        <title>Investigating Environmental, Social, and Governance (ESG) Discussions in News: A</title>
      </sec>
      <sec id="sec-3-15">
        <title>Knowledge Graph Analysis Empowered by AI .</title>
        <p>Simone Angioni, Sergio Consoli, Danilo Desı,s` Francesco Osborne, Diego Reforgiato Recupero and
Angelo Salatino.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>