<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing Data Precision with Large Language Models: Analyzing Failures and Innovating Database Curation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Georg Gottlob</string-name>
          <email>georg.gottlob@unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georg Gottlob is a Professor of Computer Science at the University of Calabria. Until recently</institution>
          ,
          <addr-line>he</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oxford, a Fellow of St John's College</institution>
          ,
          <addr-line>Oxford, and an Adjunct Professor at TU Wien. His interests</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Calabria</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>was a Royal Society Research Professor at the Computer Science Department of the University of</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>of the Keynote The advent of Large Language Models (LLMs) such as ChatGPT represents a significant milestone in the AI revolution. This talk commences with an exploration of text-based generative AI tools, highlighting exemplary performances in producing elegantly crafted texts. However, LLMs often fail, particularly when tasked with generating precise data absent from established databases like Wikipedia. This phenomenon is critically examined through a “psychoanalysis” of LLMs that identifies fundamental causes for such failures and hallucinations. In response to these challenges, the second part of the talk introduces the Chat2Data method and system, an innovative framework designed to harness the capabilities of LLMs for the automatic generation, enrichment, and verification of databases and data sets. Chat2Data automatically generates sophisticated workflows that incorporate problem decomposition, strategic LLM querying, and meticulous analysis of responses. To refine reliability and accuracy, the system integrates supplementary technologies such as Retrieval-Augmented Generation (RAG), rulebased knowledge processors, and data-graph analysis. This comprehensive approach not only mitigates the pitfalls identified but also significantly advances the utility of LLMs in complex data environments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
Austrian National Science Fund and the Ada Lovelace Medal (UK). He is a Fellow of the Royal
Society, and a member of the Austrian Academy of Sciences, the German National Academy of
Sciences, and the Academia Europaea. He was a founder of Lixto, a web data extraction firm
acquired in 2013 by McKinsey &amp; Company. In 2015 he co-founded Wrapidity, a spin out of Oxford
University based on fully automated web data extraction technology developed in the context
of an ERC Advanced Grant. Wrapidity was acquired by Meltwater, an internationally operating
media intelligence company. Gottlob then co-founded the Oxford spin-out DeepReason.AI,
which provided knowledge graph and rule-based reasoning software to customers in various
industries. DeeoReason.AI was also acquired by Meltwater.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>