<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Retrieval, Debate, and Verification for Robust Table‐to‐Knowledge‐Graph Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Koby Bar</string-name>
          <email>barkoby@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomer Sagi</string-name>
          <email>tsagi@cs.aau.dk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Retrieval Augmented Generation</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Cell Entity Annotation, Table to Knowledge Graph Matching, Large Language Models Reasoning, Entity Matching,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Aalborg University</institution>
          ,
          <addr-line>Aalborg</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information Systems, University of Haifa</institution>
          ,
          <addr-line>Haifa</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Tabular data is one of the most common data sources on the internet and is widely used in various data analytics tasks. Identifying semantic concepts within tables is often a critical component of these pipelines, yet it remains a challenging task to automate. To address this problem, we present RAGDify, a large language model (LLM)-based system designed for the Cell Entity Annotation (CEA) task. Our system employs a three-step pipeline inspired by Retrieval-Augmented Generation (RAG) and advanced reasoning techniques: (1) retrieving context-aware candidate entities, (2) engaging in a debate-like evaluation to compare top candidates, and (3) applying chain-ofverification-inspired prompting to validate the final entity match. We propose RAGDify as a solution for the SemTab'25 challenge, targeting the key challenges inherent in automating the CEA task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, we have witnessed a significant increase in the availability and dissemination of data,
particularly tabular data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Many processes have become data-driven, requiring large volumes of
structured data to train algorithms or support decision-making. Tabular formats such as CSV have
become the de facto standard for representing and exchanging data due to their compact,
humanreadable structure.
      </p>
      <p>
        Automatically processing tabular data is a fundamental step in a wide range of applications, including
schema matching, entity linking, question answering, and knowledge graph construction. A common
approach to facilitate this processing is to link table elements, such as cells, or columns, to entities
and concepts within a Knowledge Graph (KG) or ontology. This linkage creates a semantic layer that
enables higher-level reasoning and integration across heterogeneous datasets—a process commonly
referred to as Semantic Table Interpretation (STI) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Despite its widespread use and simplicity, the tabular format poses several challenges. Tables often
lack explicit contextual information, exhibit semantic ambiguities, and are susceptible to inconsistencies
and noise in their data. Consequently, automating the semantic interpretation of tables remains an
open and complex research problem [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        One promising direction to address these challenges is the incorporation of Large Language Models
(LLMs) into the Semantic Table Annotation pipeline [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. LLMs, trained on vast amounts of textual data,
have demonstrated impressive abilities in solving tasks beyond their specific training objectives, even
in settings with limited annotated data (few-shot learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) or no task-specific data at all (zero-shot
learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>However, directly applying LLMs to STI tasks introduces several key obstacles. First, LLMs are not
explicitly trained on structured KG data, and thus struggle with complex entity disambiguation tasks</p>
      <p>CEUR
Workshop</p>
      <p>
        ISSN1613-0073
where numerous KG entities share identical or highly similar surface forms. Second, the inherent lack
of context within tabular structures exacerbates the dificulty of semantic interpretation. Third, LLMs
are prone to hallucinations, producing confident but factually incorrect outputs, which can severely
undermine the accuracy of entity annotations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        To mitigate these challenges, a Retrieval-Augmented Generation (RAG) architecture has emerged as
a promising solution [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. RAG architectures enhance LLMs by grounding their responses in external
knowledge, limiting the candidate space to a set of relevant entities retrieved from a KG, and guiding
the LLM through the entity matching process. This retrieval-augmented approach not only constrains
the model’s generation space but also improves its factual reliability.
      </p>
      <p>
        Moreover, integrating advanced reasoning techniques has shown potential in boosting LLM
factual accuracy and decision-making capabilities. Recent advancements include multi-agent debate
frameworks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], chain-of-thought prompting [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], self-consistency mechanisms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and
chain-ofverification [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] strategies that iteratively validate generated outputs against factual sources. These
reasoning techniques have been efective in guiding LLMs through complex multi-step tasks, fostering
both robustness and interpretability.
      </p>
      <p>In this paper, we present our approach to the Cell Entity Annotation (CEA) task for the SemTab’25
Challenge—RAGDify. Our system leverages LLMs within a RAG-based architecture, enriched with
a reasoning mechanism inspired by multi-agent debate, self-consistency, and chain-of-verification
techniques.</p>
      <p>The remainder of this paper is organized as follows: Section 2 provides an overview of the task and
foundational approaches; Section 3 details our proposed methodology; Section 5 reviews related work
in the field; finally, Section 6 highlights key challenges, summarizes our contributions and limitations,
and outlines directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Task</title>
      <sec id="sec-2-1">
        <title>2.1. Overview of the Challenge</title>
        <p>
          The SemTab challenge started in 2019 with the goal to promote research in STI and provide a venue for
benchmarking diferent solutions [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Over the years a wide range of solutions have been suggested for
SemTab tasks. With the emergence of large language models (LLMs), special interest arose, leading
to a dedicated STI vs. LLMs Track in 2024, and in 2025 all participants are expected to use LLM-based
methods, either via fine-tuning or RAG.
        </p>
        <p>We focus on the CEA task, which involves linking each table cell to its corresponding entity in
a knowledge base (e.g., a KG or ontology). Figure 1 illustrates this process. Systems are evaluated
using standard precision, recall, and F1-score metrics. In addition, the challenge requires that solutions
address several key challenges, such as disambiguation, homonymy, alias resolution, NIL detection,
noise robustness, and collective inference, to reflect the complexities of real-world table data.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. MammoTab Track and Dataset</title>
        <p>
          We competed in the MammoTab track, which leverages the most recent version of the MammoTab
dataset [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. MammoTab comprises 870 heterogeneous tables—sourced from Wikipedia, collectively
contains 84, 907 verified cell–entity annotations. This benchmark simulates the challenges of real-world
table interpretation: tables lack explicit schema definitions, and cell contexts exhibit varying degrees of
noise, ambiguity, and sparsity. For the MammoTab track, Wikidata (v. 20240720) serves as the target KG.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>The RAGDify system formulates table‐to‐knowledge‐base matching as a four‐stage retrieval‐generation
pipeline (Figure 2), wherein each stage leverages a large language model (LLM) to balance high recall
with precise disambiguation.
col1 col2
California 33 Independent
col3
Bill Bloomfield
Colorado 5</p>
      <p>Independent</p>
      <p>Dave Anderson</p>
      <p>In the data cleansing stage, raw CSV tables are ingested and lightly cleaned via an LLM prompt that
corrects typographical or formatting errors, unifies casing, and removes noise, e.g., stray punctuation
or outlier tokens, while preserving the original row and column structure. This preprocessing yields a
consistent set of cell values for downstream retrieval.</p>
      <p>The candidate generation stage employs three sequential retrieval strategies over a full‐text inverted
index of Wikidata entity labels and descriptions, implemented in Elasticsearch. First, we perform an
index lookup using the cleaned cell string. Second, regardless of whether the first lookup succeeded,
we invoke a few‐shot LLM prompt to reformulate the query by incorporating column and table
metadata—capturing aliases, synonyms, and contextual nuances. These results are added to the candidate
list. Finally, if neither of the first two strategies returns any results, we execute a fuzzy search requiring
at least 75% string similarity. To bound LLM context and reduce latency, tables with more than ten rows
Candidate Generation
Direct lookup:
California State Route 33 (Q662907), California’s 33rd
congressional district (Q225706), Riverside County
(Q108111), and 2024 California Proposition 33
(Q130614755)
LLM-reformulated query: California 33 district:
California’s 33rd State Assembly district (Q5020024),
California’s 33rd congressional district (Q225706),
California’s 33rd State Senate district (Q5020025), and
San Joaquin County Sheriff’s Department (Q7414388).</p>
      <p>Debate and Select
URI: http://www.wikidata.org/entity/Q225706
Arguments:
1. The candidate label “California’s 33rd congressional
district” exactly matches the cell value “California
33,” indicating the same federal electoral district.
2. Column 2 (“Independent”) and Column 3 (“Bill</p>
      <p>Bloomfield”) refer to the party and representative of a
congressional district, consistent with this URI.
3. The population (146 660) and rank (“2nd”) in Columns
4–5 align with demographic metrics typically reported
for congressional districts, reinforcing the match.</p>
      <p>Verification
Verification: yes</p>
      <p>Winning candidate URI: http://www.wikidata.org/entity/Q225706
are truncated to the ten rows nearest the target cell.</p>
      <p>In the candidate ranking stage, we employ a debate‐style prompting strategy in which the LLM
receives a set of candidates and is asked to nominate the most plausible entites, accompanied by
three concise, evidence‐based arguments referencing the cell value and its surrounding context. This
argumentative framing, rather than relying solely on similarity scores, encourages the model to surface
the strongest semantic match. To balance cost and accuracy, the debate can be run over the entire
candidate set or restricted to a top‐ subset, and it may be iterated for multiple rounds, with each
iteration refining the argumentation and narrowing the pool until a final winner emerges.</p>
      <p>Finally, the validation stage tasks the LLM with targeted questions probing the chosen entity’s
consistency with the original cell value, compatibility with column and table context, and distinction
from alternative candidates, including an explicit NIL option. Based on these responses, the system
either confirms the selection or revises it and may iterate the validation prompts for several rounds
until a stable, final annotation is produced. Figure 4 shows the key LLM prompt templates used in the
pipeline’s candidate generation, candidate ranking, and validation stages. Figure 3 illustrates the main
stages applied to the CSV example from Figure 1.</p>
      <p>This pipeline tackles all core CEA challenges in one sweep: by normalizing and denoising cells up
front, it achieves noise robustness; its three‐stage retrieval (exact, contextual reformulation, fuzzy)
and debate‐style ranking enforce both disambiguation and homonymy resolution using column and
table level cues; contextual query rewriting surfaces aliases and nicknames; finally, because both
reformulation and validation draw on neighboring cells, our approach produces coherent, table-wide
annotations.</p>
      <p>Implementation Details The proposed pipeline is LLM‐agnostic and can be adapted with minimal
modifications to a variety of models. Given the relatively large size of the test set (see Section 2.2), we
prioritized runtime and cost eficiency. To this end, during ranking we generate supporting arguments
only for the top-ranked candidate, avoiding per-candidate debates, and perform a single verification
step. For our experiments, we chose OpenAI’s GPT-4.1 nano due to its favorable cost-to-performance
ratio.</p>
      <p>All experiments were conducted on an Ubuntu 20.04.6 Linux server equipped with two Intel Xeon
Gold 6326 CPUs (16 cores per socket, 2 sockets, 64 threads total, 2.90 GHz) and 256 GB of RAM. The
entire pipeline, including the LLM client, retrieval modules, and validation logic, was containerized
using Docker and orchestrated via Docker Compose. Elasticsearch was deployed in a dedicated Docker
container, and GPT API calls were parallelized with a 4-thread pool to maximize throughput while
Candidate Generation
Given a CSV table and a target cell
({row_id}, {col_id}, {value})
- Generate a search query for</p>
      <p>Wikidata. Few-shot examples:
{examples}
- Consider abbreviations,</p>
      <p>synonyms, and variations.</p>
      <p>Output only the search text</p>
      <p>Debate and Select
Given a target cell in a CSV table
({row_id}, {col_id}, {value}) and a list of
candidate entities ({candidates}) from
Wikidata
- Select the best match for the cell,
provide at least 3 strong arguments
supporting your choice
- Consider the table context
Output format:
URI: &lt;candidate URI&gt;
Arguments: &lt;arguments&gt;</p>
      <p>Verification
Re-evaluate the selected candidate using
the table, candidate list, and arguments
- Check fit with cell value, column</p>
      <p>values, and table context
- Revise if a better candidate exists;
otherwise confirm the choice. Use NIL
if no candidate fits
Output format:
Verification: &lt;yes/no&gt;
Winning candidate URI: &lt;candidate URI
or NIL&gt;
respecting rate limits. End-to-end processing of the SemTab’25 test set required approximately 26 hours
and incurred US$26.60 in API costs.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
    </sec>
    <sec id="sec-5">
      <title>5. Related Work</title>
      <p>
        Tabular data linking to a KG has been studied for over two decades. Early STI pipelines typically
comprised three sequential stages: pre-processing (cleaning and denoising), candidate generation via
keyword- or schema-based lookup, and iterative disambiguation to resolve noise and ambiguity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Initial LLM-based approaches to CEA focused on learning joint embeddings of table cells and KG
entities. TURL [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] extends TinyBERT with a structure-aware visibility matrix, jointly optimizing
masked language modeling and masked entity retrieval objectives. TableLlama [16], a decoder-only
variant of Llama 2-7B, employs LongLora to accommodate extended contexts (up to 8, 192 tokens) and
is instruction-tuned on a large TableInstruct corpus for CEA and other table tasks. TAPAS [17] adapts
BERT with table-specific position embeddings and aggregation heads to perform both cell classification
and entity linking, achieving strong in-domain performance without external KG queries.
      </p>
      <p>Replacing or unifying traditional STI stages, prompting over large generative models has become
popular. TSOTSA [18] leverages GPT-based prompts for candidate retrieval and ranking; Kepler-aSI [19]
integrates SPARQL query outputs into LLM inference to refine entity selection; CitySTI [ 20] applies
end-to-end prompting across all STI phases; and Adwan [21] demonstrates a retrieval-augmented
generation (RAG) pipeline enhanced with chain-of-thought and self-consistency prompting for robust
table metadata linking.</p>
      <p>While joint-representation models like TURL, TAPAS, and TableLlama excel at in-domain embedding
eficiency, they often require substantial labeled data or fine-tuning. Prompting-based methods simplify
deployment and achieve strong zero-shot performance but can incur higher API costs and latency. Our
RAGDify system builds on these paradigms by combining LLM-driven query reformulation, debate-style
ranking, and explicit validation to deliver a versatile, cost-efective solution across all CEA challenges.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This paper has presented RAGDify, a retrieval-augmented generation pipeline for cell entity annotation.
Our approach combines four key components: (1) lightweight LLM-driven table cleansing to correct
typos and normalize values; (2) multi-stage candidate retrieval via exact match, contextual
(LLMrewritten) queries, and fuzzy lookup to maximize recall; (3) debate-style ranking that prompts the
LLM to select a single top candidate with supporting arguments; and (4) explicit validation that probes
cell-, column-, and table-level consistency and allows for NIL assignments. By design, RAGDify is
LLM-agnostic, adapts with minimal changes to diferent model families, and maintains cost eficiency
through a single debate round and one verification step.</p>
      <p>Future Work Several directions merit further exploration. First, a controlled study of debate and
verification depths, varying the number of argumentation rounds and follow-up checks—could identify
the optimal balance between annotation accuracy and computational cost. Second, integrating a learned
semantic retrieval layer (e.g., dense embeddings) promises to boost candidate recall beyond syntactic
lookup without a significant runtime penalty. Third, access to a high-quality gold annotation set for the
SemTab’25 benchmark or another comparable dataset would enable rigorous evaluation and targeted
ifne-tuning. Such a dataset would also allow a detailed analysis of how each component (retrieval,
debate, verification) contributes to overall performance, potentially closing the gap with fully supervised
methods.</p>
      <p>Limitations Despite its advantages, the proposed pipeline has several limitations. First, it relies on
syntactic search over an Elasticsearch index, which may limit recall; integrating a robust semantic
search could substantially improve retrieval performance. Second, the method is entirely prompt-based,
primarily zero-shot, to maintain dataset agnosticism; this design choice can be suboptimal compared to
task-specific fine-tuning, which typically yields higher annotation accuracy. Third, to control cost, the
debate verification mechanism is limited to a single round; extending it to multiple stages could further
enhance matching quality.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This work was supported by the Data Science Research Center at the University of Haifa through the
Israel PBC grant Advancing Data Science to Serve Humanity and Protect the Global Environment (grant
no. 100009443).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the authors used OpenAI’s o4-mini to correct grammatical
errors and spelling mistakes. As described in Section 3, GPT-4.1 nano served as the primary LLM for
our method. Microsoft Copilot (powered by GPT-4o) was employed as a coding assistant during model
development. All outputs generated by these tools were critically reviewed and edited by the authors,
who take full responsibility for the content of this publication.
[16] T. Zhang, X. Yue, Y. Li, H. Sun, Tablellama: Towards open large generalist models for tables,
in: Proceedings of the 2024 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, NAACL 2024, volume 1, Association
for Computational Linguistics (ACL), 2023, pp. 6024–6044. doi:10.18653/v1/2024.naacl- long.
335.
[17] J. Herzig, P. K. Nowak, T. Müller, F. Piccinno, J. M. Eisenschlos, Tapas: Weakly supervised
table parsing via pre-training, in: Proceedings of the Annual Meeting of the Association for
Computational Linguistics, Association for Computational Linguistics (ACL), 2020, pp. 4320–4333.</p>
      <p>URL: https://aclanthology.org/2020.acl-main.398/. doi:10.18653/V1/2020.ACL- MAIN.398.
[18] J. P. Bikim, C. Atezong, A. Jiomekong, A. Oelen, G. Rabby, J. D. . Souza, S. Auer, Leveraging
gpt models for semantic table annotation, in: SemTab’24: Semantic Web Challenge on Tabular
Data to Knowledge Graph Matching 2024, co-located with the 23rd International Semantic Web
Conference (ISWC), 2024.
[19] W. Baazouzi, M. Kachroudi, S. Faiz, Kepler-asi : Semantic annotation for tabular data, in: SemTab’24:
Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 2024, co-located with
the 23rd International Semantic Web Conference (ISWC), 2024.
[20] D. Li, T. Yue, E. Jimenez-Ruiz, Citysti 2024 system: Tabular data to kg matching using llms,
in: SemTab’24: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 2024,
co-located with the 23rd International Semantic Web Conference (ISWC), 2024.
[21] N. Vandemoortele, B. Steenwinckel, S. V. Hoecke, F. Ongenae, Scalable table-to-knowledge graph
matching from metadata using llms, in: SemTab’24: Semantic Web Challenge on Tabular Data to
Knowledge Graph Matching 2024, co-located with the 23rd International Semantic Web Conference
(ISWC), 2024.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Benjelloun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <article-title>Google dataset search by the numbers</article-title>
          ,
          <source>Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12507 LNCS</source>
          (
          <year>2020</year>
          )
          <fpage>667</fpage>
          -
          <lpage>682</lpage>
          . URL: https://link.springer.com/chapter/10.1007/ 978-3-
          <fpage>030</fpage>
          -62466-8_
          <fpage>41</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -62466-8_
          <fpage>41</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <article-title>Survey on semantic interpretation of tabular data: Challenges and directions</article-title>
          , arXiv preprint (
          <year>2024</year>
          ). URL: https://arxiv.org/pdf/2411. 11891.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          ,
          <article-title>From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>76</volume>
          (
          <year>2023</year>
          )
          <article-title>100761</article-title>
          . URL: https://dl.acm.org/doi/10.1016/j.websem.
          <year>2022</year>
          .
          <volume>100761</volume>
          . doi:
          <volume>10</volume>
          .1016/J.WEBSEM.
          <year>2022</year>
          .
          <volume>100761</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>D'adda</article-title>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kruit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lobo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <source>Results of semtab</source>
          <year>2024</year>
          ,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3889</volume>
          /paper0.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Mccandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https: //proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI Blog</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          ). URL: https://dl.acm.org/doi/pdf/10.1145/3571730. doi:
          <volume>10</volume>
          .1145/3571730/ASSET/ CC5D3792-8BC0-
          <fpage>4675</fpage>
          -8584-
          <lpage>B507476E20EC</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W. tau Yih, T. Rocktäschel,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          . URL: https: //proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mordatch</surname>
          </string-name>
          ,
          <article-title>Improving factuality and reasoning in language models through multiagent debate</article-title>
          ,
          <source>CoRR abs/2305</source>
          .14325 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H. C.</given-names>
            <surname>Quoc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chainof-thought prompting elicits reasoning in large language models</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Selfconsistency improves chain of thought reasoning in language models</article-title>
          ,
          <source>11th International Conference on Learning Representations, ICLR</source>
          <year>2023</year>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dhuliawala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Komeili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raileanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Celikyilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Chain-of-verification reduces hallucination in large language models</article-title>
          ,
          <source>Proceedings of the Annual Meeting of the Association for Computational Linguistics</source>
          (
          <year>2024</year>
          )
          <fpage>3563</fpage>
          -
          <lpage>3578</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . findings-acl.
          <volume>212</volume>
          /. doi:
          <volume>10</volume>
          .18653/V1/
          <year>2024</year>
          .FINDINGS-ACL.
          <year>212</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marzocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Mammotab: a giant and comprehensive dataset for semantic table interpretation, in: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          ,
          <source>SemTab</source>
          <year>2022</year>
          ,
          <article-title>co-located with the 21st International Semantic Web Conference (ISWC), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          . URL: http://ceur-ws.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>SemTab</given-names>
            <surname>Challenge Organizers</surname>
          </string-name>
          ,
          <year>Semtab 2025</year>
          leaderboard, https://sem-tab-challenge.github.io/2025/ #leaderboard,
          <year>2025</year>
          . Accessed: October 21,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Turl: Table understanding through representation learning</article-title>
          ,
          <source>SIGMOD Record 51</source>
          (
          <year>2022</year>
          )
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          . doi:
          <volume>10</volume>
          .1145/3542700.3542709.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>