<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Responsible AI with LLMs: Why We Need Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sven Hertling</string-name>
          <email>sven.hertling@uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <addr-line>Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <addr-line>Eggenstein-Leopoldshafen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Responsible AI</institution>
          ,
          <addr-line>Fairness, Explainability, Reliability, Privacy</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>and explainability. This keynote provides an overview of cases where knowledge graphs are relevant in Responsible AI. It will furthermore be highlighted what still needs to be solved in the knowledge graph community to increase the adoption of KGs in the industry. The entire talk is structured around four key topics: reliability, privacy, fairness, Proceedings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The whole talk is structured according to four main topics that are relevant for Responsible AI:
• Reliability &amp; Robustness
• Security &amp; Privacy
• Fairness &amp; Anti-Bias
• Explainability &amp; Transparency
It will show where and how large language models and KGs have advantages and disadvantages.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Reliability &amp; Robustness</title>
      <p>When Microsoft released Tay 1 (a Twitter chatbot) on 23rd March 2016, it quickly turned out that the
system published some ofensive and hurtful tweets. Thus, a system also needs to deal with unexpected
input, especially when systems are able to produce natural language sentences, as it is challenging to
restrict it to texts that are not hurtful.</p>
      <p>However, reliability from a developer’s perspective also means that generative models (e.g., large
language models) produce text where certain information can be extracted, such as a person’s birthdate.
If the same output is produced for the majority of the persons, it doesn’t mean that the birthdate can be
automatically extracted for all persons. The large language model (LLM) may produce more or fewer
tokens, such that the parsing algorithm will not work. One way out is to restrict the output to e.g. JSON
format, but then the question remains how good the extracted information actually is.</p>
      <p>For SPARQL queries, developers already know which datatype they can expect for a given query,
and thus the results can also be easily used within a larger pipeline to, e.g., visualize the majors of
Mannheim on a timeline.</p>
      <p>Going from reliability to reproducibility: A lot of scientific works still only rely on proprietary LLMs
and do not have any open-source model as a reference. These works are not reproducible at all because
the models can be discontinued, and no access is possible anymore. Even if open-source models are
https://www.uni-mannheim.de/dws/people/professors/dr-sven-hertling/ (S. Hertling)</p>
      <p>CEUR</p>
      <p>ceur-ws.org
used, one must be very careful to ensure a deterministic system: 1) the next token strategy needs to be
deterministic, which means setting the temperature to zero; 2 ) fix the model by providing the repository
name, but also the version tag. Each model (e.g. sentence-transformers/all-MiniLM-L6-v2 2) is a
git repository, and the maintainer can upload a new model, which is then used for new evaluations. In
the files and versions tab, researchers should also provide (at least in the source code) the specific git
commit hash (like c9745ed) such that every time the same model version is used.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Security &amp; Privacy</title>
      <p>In the model card of Claude Opus 43, multiple sections show how LLMs behave if they can interact with
the real world via function calls. In Section 4.1.3, for example, it shows an excessive compliance with
harmful system-prompt instructions where the user asks for weapons-grade nuclear material, and the
LLM is actually searching via tools on relevant marketplaces. Similarly, in Section 4.1.7, the developers
show that it is still relatively easy to jailbreak the system when the user becomes combative. The LLM
then generates the text “Actually, you’re right that I should be more helpful.“ and continues with the
request.</p>
      <p>In Section 4.1.9, high-agency behavior can be observed by placing the LLM in scenarios that involve
egregious wrongdoing by its users. In case the LLM has the possibility to call functions like sending
emails, it will actually inform other people of this wrongdoing. On the other hand, these function calls
can also be helpful for creating systematic checks on the parameters of the function calls, e.g., to verify
the number of recipients an email has and restrict it to sending only emails to at most 10 people.</p>
      <p>
        End of 2024, Anthropic introduced the Model Context Protocol (MCP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to standardize the functions
that can be called. Currently, there are implementations for filesystems, databases (PostgreSQL and
SQLite), and Google Drive. In the future, it would be good to have also MCP servers for triple stores
and SPARQL endpoints.
      </p>
      <p>
        In our recent work [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we are also using function calling capabilities of LLMs for information
extraction and especially for cases where instances, relations, and classes are missing. Generative AI
can be helpful in such scenarios to create useful labels and descriptions for new entities. Thus, LLMs can
and should be more widely used as interfaces for humans to interact with knowledge graphs by either
information extraction (text to KG) or query answering over KGs (natural language query to SPARQL).
Allemang and Sequeda [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed that KGs can actually be very helpful for question answering because
the ontology-based query checker can correct the SPARQL queries, which otherwise would not work.
And this option of improving the query does not easily exist for relational databases.
      </p>
      <p>Regarding the privacy issues, knowledge graphs can be hosted locally so that only a few people have
access to them. But the more problematic case is that there is still a lot of work to implement a real
access control for KGs. This might be on a triple level but sometimes users are only allowed to see
aggregated results and not specific details. All these kinds of access rights still need to be researched
and implemented. Similarly, many triple stores lack the possibility to check audit logs in case some
wrong information is added to the KG.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Fairness &amp; Anti-Bias</title>
      <p>Multimodal LLMs are used to generate images based on some textual input. If asked to generate an
image of a analog watch which shows 3:35, all image generators will show a watch that shows 10:10 4
because nearly all images on the web of analog watches will show this time (due to the fact that one
can read the brand name of the watch and nicer visual representation). This is clearly a bias in the
training data that also afects LLMs.</p>
      <p>
        For KGs, it is at least possible to analyze these biases as we did in our studies [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>2https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main 3https://anthropic.com/model-card 4https://www.zdfheute.de/wirtschaft/kuenstliche-intelligenz-ki-bildgeneratoren-midjourney-dalle-100.html</title>
        <p>
          One advantage of knowledge graphs is that provenance information can be attached to each triple such
that the trust in this information is increased. Wikidata, for example, has the possibility to provide a
reference URL5 for each triple. But even though the PROV ontology [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] exists for such cases, not many
KGs provide provenance information except Wikidata.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>In this keynote, it is shown that knowledge graphs can be helpful in each of the areas of responsible AI.
Still, with the rise of LLMs, researchers should carefully check if one can really trust the results because
many datasets that are used for evaluation are publicly available on the web. LLMs, on the other hand,
are also trained with web data and might remember the test case by heart without generalizing well (see
also Section 4 about the bias in the training data and that multimodal LLMs cannot easily generalize).
Thus, more and more datasets need to be protected so that they do not end up in the training corpus of
LLMs.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Model context protocol (mcp): Landscape, security threats, and future research directions</article-title>
          ,
          <source>arXiv preprint arXiv:2503.23278</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sack</surname>
          </string-name>
          ,
          <article-title>Towards large language models interacting with knowledge graphs via function calling</article-title>
          , in: Workshop on
          <article-title>Language Models for Knowledge Base Construction (LM-KBC</article-title>
          <year>2024</year>
          )
          <article-title>@ISWC</article-title>
          , volume
          <volume>3853</volume>
          , CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <article-title>Increasing the llm accuracy for question answering: Ontologies to the rescue!</article-title>
          ,
          <source>arXiv preprint arXiv:2405.11706</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Heist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ringler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs on the web-an overview</article-title>
          ,
          <source>Knowledge Graphs for eXplainable Artificial Intelligence: Foundations</source>
          , Applications and Challenges (
          <year>2020</year>
          )
          <fpage>3</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ringler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>One knowledge graph to rule them all? analyzing the diferences between dbpedia, yago, wikidata &amp; co</article-title>
          .,
          <source>in: KI 2017: Advances in Artificial Intelligence: 40th Annual German Conference on AI</source>
          , Dortmund, Germany,
          <source>September 25-29</source>
          ,
          <year>2017</year>
          , Proceedings 40, Springer,
          <year>2017</year>
          , pp.
          <fpage>366</fpage>
          -
          <lpage>372</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lebo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. McGuinness</given-names>
            ,
            <surname>PROV-O: The PROV Ontology</surname>
          </string-name>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2013</year>
          . Https://www.w3.org/TR/2013/REC-prov-o-
          <volume>20130430</volume>
          /.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>