=Paper=
{{Paper
|id=Vol-3741/keynote01
|storemode=property
|title=Enhancing Data Precision with Large Language Models: Analyzing Failures and Innovating Database Curation
|pdfUrl=https://ceur-ws.org/Vol-3741/keynote01.pdf
|volume=Vol-3741
|authors=Georg Gottlob
|dblpUrl=https://dblp.org/rec/conf/sebd/Gottlob24
}}
==Enhancing Data Precision with Large Language Models: Analyzing Failures and Innovating Database Curation==
Enhancing Data Precision with Large Language Models: Analyzing Failures and Innovating Database Curation Georg Gottlob1 1 University of Calabria, Italy Abstract On 25th June 2024, Georg Gottlob delivered a keynote talk at the 32nd Symposium on Advanced Database Systems in Villasimius (Sardinia, Italy). The following is the abstract of his talk and a short biography Abstract of the Keynote The advent of Large Language Models (LLMs) such as ChatGPT represents a significant mile- stone in the AI revolution. This talk commences with an exploration of text-based generative AI tools, highlighting exemplary performances in producing elegantly crafted texts. However, LLMs often fail, particularly when tasked with generating precise data absent from established databases like Wikipedia. This phenomenon is critically examined through a “psychoanalysis” of LLMs that identifies fundamental causes for such failures and hallucinations. In response to these challenges, the second part of the talk introduces the Chat2Data method and sys- tem, an innovative framework designed to harness the capabilities of LLMs for the automatic generation, enrichment, and verification of databases and data sets. Chat2Data automatically generates sophisticated workflows that incorporate problem decomposition, strategic LLM querying, and meticulous analysis of responses. To refine reliability and accuracy, the system integrates supplementary technologies such as Retrieval-Augmented Generation (RAG), rule- based knowledge processors, and data-graph analysis. This comprehensive approach not only mitigates the pitfalls identified but also significantly advances the utility of LLMs in complex data environments. Short Biography Georg Gottlob is a Professor of Computer Science at the University of Calabria. Until recently, he was a Royal Society Research Professor at the Computer Science Department of the University of Oxford, a Fellow of St John’s College, Oxford, and an Adjunct Professor at TU Wien. His interests include knowledge representation, database theory, query processing, web data extraction, and (hyper)graph decomposition techniques. Gottlob has received the Wittgenstein Award from the SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy Envelope-Open georg.gottlob@unical.it (G. Gottlob) Orcid 0000-0002-2353-5230 (G. Gottlob) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Austrian National Science Fund and the Ada Lovelace Medal (UK). He is a Fellow of the Royal Society, and a member of the Austrian Academy of Sciences, the German National Academy of Sciences, and the Academia Europaea. He was a founder of Lixto, a web data extraction firm acquired in 2013 by McKinsey & Company. In 2015 he co-founded Wrapidity, a spin out of Oxford University based on fully automated web data extraction technology developed in the context of an ERC Advanced Grant. Wrapidity was acquired by Meltwater, an internationally operating media intelligence company. Gottlob then co-founded the Oxford spin-out DeepReason.AI, which provided knowledge graph and rule-based reasoning software to customers in various industries. DeeoReason.AI was also acquired by Meltwater.