<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging GPT Models For Semantic Table Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jean Petit Bikim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carick Atezong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Azanzi Jiomekong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allard Oelen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gollam Rabby</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jennifer D'Souza</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sören Auer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Yaounde I</institution>
          ,
          <addr-line>Yaounde</addr-line>
          ,
          <country country="CM">Cameroon</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>L3S Research Center, Leibniz University Hannover</institution>
          ,
          <addr-line>Hanover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TIB - Leibniz Information Centre for Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper outlines our contribution to the Accuracy Track and the Semantic Table Interpretation (STI) &amp; Large Language Models (LLMs) track of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab). Our approach involves using LLMs to address the various tasks presented in the challenge. Specifically, we employed zero-shot and few-shot prompting techniques for most of the tasks, which facilitated the LLMs ability to interpret and annotate tabular data with minimal prior training. For the Column Property Annotation (CPA) task, we took a diferent approach by applying a set of predefined rules, tailored to the structure of each dataset. Our method achieved notable results, with an  1 −  exceeding 0.92, demonstrating the efectiveness of LLMs in tackling the SemTab challenge. These results suggest that LLMs hold significant capabilities as a robust solution for semantic table annotation and knowledge graph matching, highlighting their potential to advance the field of semantic web technologies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Tabular Data</kwd>
        <kwd>Semantic Table Annotation</kwd>
        <kwd>Semantic Table Interpretation</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>SemTab</kwd>
        <kwd>Prompt Engineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Applying GPT-3 for Semantic Table Annotation</title>
      <p>This section details the methodology we employed during the SemTab’24 challenge to address the
various tasks set by the organizers. The challenge involved multiple stages, each with distinct objectives
requiring customized strategies. In Section 2.1, we present a comprehensive overview of the SemTab’24
challenge, outlining its goals, structure, and key requirements. Following that, Section 2 delves into the
specific approach we implemented to tackle the challenge’s diverse tasks, including data processing,
LLM selection, and performance optimization. Each component of our approach was carefully designed
to align with the challenge’s demands while maximizing accuracy and eficiency. Overall, our strategy
reflects a combination of innovative techniques and established methods, ensuring robust results across
all tasks.</p>
      <sec id="sec-2-1">
        <title>2.1. Overview of the Challenge</title>
        <p>The SemTab challenge [6], as described by the organizers, focuses on bench-marking datasets and
systems for semantic table annotation. The primary goal of this challenge is to assess and improve the
capabilities of automated systems in interpreting and annotating structured data, such as tables, by
linking them to relevant KGs. The SemTab challenge serves as an important platform for evaluating
advancements in semantic technologies and encouraging the development of novel approaches to
table annotation. Participants are required to apply their techniques across diverse tasks and datasets,
reflecting real-world scenarios. By setting standardized evaluation metrics and promoting reproducible
results, the SemTab challenge plays a crucial role in advancing the field of semantic data annotation.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. SemTab Challenge Tracks</title>
          <p>This year, the SemTab challenge introduced five distinct tracks, each designed to focus on specific aspects
of table annotation: the STI &amp; LLMs track, the accuracy track, the dataset track, the metadata-to-KG
track, and the IsGold? track. The STI &amp; LLMs track, alongside the accuracy track, includes a series
of critical tasks that highlight key table annotation processes, as illustrated in Fig. 1. The main tasks
within these tracks are as follows:
• Column Entity Annotation (CEA): This task involves linking the elements in a table’s cells to
their corresponding entities in a KG. For example, in Fig. 1, the entity "Kelso Township" in Table
(a) match to the QID "Q6386554" in Wikidata.
• Column Type Annotation (CTA): This task requires identifying the most specific semantic
type to be assigned to a column in the table. For instance, in Table (a) of Fig. 1, The Wikidata
entity type for "Kelso Township" and "Ohio Township" has the QID "Q17201685" (Township of
Indiana).
• Column Property Annotation (CPA): The objective here is to determine the property within
the KG that links two columns in a table. For example, in Table (a) of Fig. 1, the Wikidata property
that connects columns col0 and col1 is P2044 (elevation above sea level).
• Table Topic Detection (TD): This task focuses on assigning an overarching semantic type to an
entire table by identifying its primary subject within the KG. For instance, The Wikidata entity
that describes the topic of Table (b) in Fig. 1 has the QID Q16823610 (Blue Christmas).
• Row Annotation (RA): In this task, participants must link entire rows in the table to the
corresponding entities in the KG. For example, the first row of Table (c) in Fig. 1 has the Wikidata
QID "Q26689963".</p>
          <p>These tasks, while diverse, collectively assess the robustness and flexibility of participating systems in
accurately interpreting and annotating tabular data. Each track is designed to target diferent challenges
faced in real-world applications, ensuring that systems are tested comprehensively across a wide range
of scenarios.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. SemTab Datasets</title>
          <p>Our focus on semantic table annotation led us to benchmark various datasets from the SemTab challenges
published since 20192, allowing us to establish a system that will adapt to diferent datasets.</p>
          <p>Table 1 provides a detailed overview of the datasets we employed for the CEA task. The datasets
vary in size, complexity, and domain coverage, ofering a comprehensive range of challenges for
CEA systems. The datasets tfood [7] (entity, horizontal) and WikidataTableR1 from the 2023 edition,
along with Semantic_annotation3(a dataset automatically constructed from 15,000 entities on Wikidata
retrieved through API queries and their descriptions as context), were primarily used before the
challenge. They served as the foundation for our various experiments and also enriched our training
data during the actual challenge phase. Additionally, the training data contained in tbiomed [8], tbiodiv
[9] and SuperSemtab24 [10] were used to further enhance our models.</p>
          <p>For the CTA, CPA, RA, and TD tasks, we used the datasets proposed by the challenge organizers
for the 2024 edition. These datasets cover a diverse range of domains and tasks, which allows for a
more comprehensive evaluation of diferent semantic table annotation techniques. Table 2 summarizes
the statistics of these datasets, indicating the number of valid and test data for each task. Each dataset
provides both validation and test sets to ensure rigorous evaluation and to facilitate fine-tuning during
the development process.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Fine-tuning GPT-3 for LLM and Accuracy Track</title>
        <p>In this experiment, we focused on leveraging the capabilities of the GPT-3 model, which contains
175 billion parameters, for addressing various semantic table annotation tasks. Fine-tuning LLMs like
GPT-3 can be approached in two main ways: probing and prompt engineering. Probing involves deeper
adjustments of the LLMs weights for task-specific learning, while prompt engineering optimizes the
input format to guide the model’s responses. For our experiments, we primarily relied on prompt
engineering techniques.
2https://orkg.org/comparison/R642266
3https://huggingface.co/datasets/yvelos/semantic_annotation</p>
        <p>WikidataTableR1</p>
        <p>tfood Entity
tfood Horizontal
Semantic_annotation</p>
        <p>WikidataTableR1
WikidataTableR1
tbiodiv entity
tbiodiv entity
tbiodiv horizontal
tbiodiv horizontal
tbiomed entity
tbiomed entity
tbiomed horizontal
tbiomed horizontal</p>
        <p>SuperSemtab24
SuperSemtab24</p>
        <p>Specifically, few-shot prompting was employed to address the CEA task within the accuracy track, as
well as the task in the LLM track. Few-shot prompting allows the model to learn patterns from a small
set of examples provided during inference. On the other hand, we adopted zero-shot prompting for the
CTA, RA, and TD tasks. Zero-shot prompting does not require any training examples; instead, it relies
solely on the LLMs pre-trained knowledge to interpret the prompts. To facilitate these approaches,
the datasets were structured such that the diferent SemTab tasks could be efectively interpreted and
solved by GPT-3.</p>
        <p>For the CPA task, instead of using GPT-3, we used a symbolic rule-based method. The CPA task often
Year
requires precise identification of relationships between table columns, which can be more efectively
handled by deterministic rules. This hybrid strategy allowed us to exploit the strengths of both LLMs
and symbolic methods.</p>
        <p>The architecture used in this experiment is illustrated in Fig. 2. It involves several key modules, each
serving a specific function in the overall system:
• Pre-processing Module: This module takes as input a set of tables and applies various
cleaning operations such as removing blank spaces, stripping HTML tags, and eliminating special
characters. An example of how a cell is processed through this module is shown in Fig 3.
• table2vect Module: The table2vect module, as described by Algorithm 1, processes the cleaned
dataset and generates task-specific vectors for CEA, CTA, CPA, RA, and TD tasks. These vectors
are structured based on the requirements of each annotation task. The Fig. 4 show an example of
table2vect process.
• Table dataset Modules: This module accepts a vector as input, along with a target file if provided,
and then maps the vector elements to their corresponding targets. The output is a new table that
represents our dataset.
• Prompt Generation Modules (ceaPrompt, ctaPrompt, raPrompt, tdPrompt): These
modules transform the rows of Table dataset into a set of questions and answers tailored for each
task. For example, in the CEA task, a table cell and its context are framed as a question, while
the corresponding entity serves as the answer. Examples of these question-answer pairs are
embedded in Fig. 6.
• Fine-Tuning Base GPT model: The generated questions and answers are used to fine-tune
GPT-3 or GPT-4, ensuring that the model can accurately perform the semantic annotation tasks
across diferent datasets.</p>
        <p>This modular architecture allows for a flexible and scalable approach to semantic table annotation,
enabling the system to adapt to diferent tasks by simply modifying the input prompts and vectors.
While GPT-3 handles most of the annotation tasks, the use of a rule-based approach for CPA underscores
the importance of integrating symbolic reasoning in cases where relationship extraction is critical.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Annotating the test data using the model fine-tuned</title>
        <p>After fine-tuning the GPT-3 model for semantic table annotation tasks, the resulting model was employed
to annotate the test data. The annotation workflow closely follows the first three steps of the
finetuning process, as outlined in Fig. 1. This process is structured to handle the diferent annotation tasks
eficiently by leveraging the pre-processing pipeline and vector generation approach discussed earlier.</p>
        <p>The annotation process begins with inputting the set of tables to be annotated. These tables go
through a pre-processing phase, which involves removing irrelevant characters, normalizing formats,
and cleaning the data to ensure consistency. Following this, the table2vect algorithm is applied to
convert the tables into a set of task-specific vectors. These vectors capture the essential elements needed
for annotation, such as table cells and their context. However, for all tasks, the vectors includes a
URI cell that is initially left blank. This placeholder will be populated with the correct URI during the
inference stage, using the fine-tuned GPT-3.</p>
        <p>The fine-tuned LLMs, when performing inference, processes the task-specific prompts generated
from these vectors and fills in the blank spaces with the corresponding URIs or semantic labels. For
example, in the CEA task, the model identifies the most relevant entity from a knowledge graph, while
in the CTA task, it assigns the appropriate semantic type. The transformation from vectors to answers
is handled seamlessly by GPT-3, which was trained on similar tasks during fine-tuning.</p>
        <p>It is important to note that while GPT-3 was primarily used for tasks such as CEA, CTA, RA, and TD,
the CPA task required a diferent approach. The CPA task involves determining the property that links
two columns in a table, a challenge that often benefits from deterministic logic rather than generative
language models. Therefore, a rule-based method was applied to solve this task, as illustrated in Fig. 7.
This rule-based approach relies on predefined relationships and patterns in the data, making it highly
efective for capturing the structured nature of properties in knowledge graphs.</p>
        <p>By integrating both the generative power of GPT-3 for complex annotation tasks and symbolic
methods for rule-based tasks, this hybrid architecture ensures a robust and adaptable annotation
pipeline. The resulting annotated datasets maintain high accuracy across all tracks, leveraging the
strengths of both AI-driven models and traditional symbolic techniques.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>This section presents the evaluation results for the SemTab’24 challenge, focusing on both the STI &amp;
LLMs track (see Section 3.1) and the accuracy track (see Section 3.2). The outcomes are discussed in
detail, highlighting the strengths and limitations observed during the testing phase.</p>
      <sec id="sec-3-1">
        <title>3.1. LLMs Track</title>
        <p>In the LLMs track, we fine-tuned the GPT-3 as outlined in Section 2. GPT-3 was also evaluated on the
test data by the challenge organizers. Table 3 presents the results, focusing on the CEA task.</p>
        <p>The results in Table 3 demonstrate the LLMs ability to perform entity annotation tasks with high
accuracy. The fine-tuned LLM achieved an  1 −  of 0.899 for the CEA task, which aligns closely
with its precision score, indicating a balanced performance. The success in this track can be attributed
to efective few-shot prompting and careful data pre-processing, which allowed the LLM to grasp the
complex semantic relationships present in the tables.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Accuracy Track</title>
        <p>For the accuracy track, the results cover a broader range of tasks, including CEA, CTA, CPA, RA, and
TD, across multiple datasets. The results are summarized in Table 4.</p>
        <p>During the challenge, the fine-tuned model was primarily evaluated on the following datasets and
tasks:
• CEA: WikidatableR1, tbiodiv Entity, tbiodiv Horizontal, tbiomed Entity, tbiomed Horizontal
• CTA: WikidataTableR1, tbiodiv Horizontal, tbiomed Horizontal</p>
        <p>The results indicate that the model performed well on the CEA task, particularly for the tbiodiv
Entity and tbiomed Entity datasets, achieving an  1 −  above 0.92. The tbiodiv Horizontal dataset,
with its unique table structure, saw a slightly lower performance, with an  1 −  of 0.74. This
decline is likely due to the complexity introduced by the horizontal orientation of the data, which poses
challenges in capturing relationships between entities.</p>
        <p>For the CTA task, the model delivered strong results with an  1 −  greater than 0.7 for the
WikidataTableR1 and tbiomed Horizontal datasets, while scoring 0.648 for tbiodiv Horizontal. The TD
task showed a range of  1 − , from 0.78 for tbiodiv Horizontal to 0.621 for tbiomed Horizontal,
reflecting the varying dificulty levels of semantic topic detection across datasets.</p>
        <p>The RA task produced a high  1 −  of 0.719 for tbiodiv Horizontal but a lower  1 −  of
0.411 for tbiomed Horizontal. The disparity in performance for these tasks can be attributed to the
limited availability of high-quality training data, which likely hindered the model’s ability to generalize
efectively.</p>
        <p>Lastly, the CPA task sufered from incomplete test data runs, particularly for the WikidataTableR1
dataset, where only 80% of the test data was covered. The incomplete data coverage explains the lower
 1 − , as the model had less data to work with, leading to reduced precision and recall.</p>
        <p>Overall, while the results show promising performance in several areas, they also highlight the
challenges posed by diverse table structures, limited training data, and incomplete test coverage.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper presented an exploration of utilizing GPT-3 for addressing the SemTab challenge, which
involves a series of complex tasks related to entity annotation and classification. To approach this,
we employed the base GPT-3 model and refined its capabilities through both few-shot and zero-shot
prompting techniques. The model demonstrated promising performance when applied to the complete
dataset, achieving commendable results across various tasks. Specifically, for the CEA task, we observed
an impressive  1 −  exceeding 0.92 when the model was tested on the tbiodiv Entity and Tbiomed
Entity datasets. This indicates a high level of accuracy and reliability in the model’s ability to correctly
annotate entities within these datasets. However, for other tasks such as CTA and TD , the  1 − 
ranged between 0.6 and 0.8. This variability in performance can be attributed to the limited size of
the training data, which constrained the model’s ability to fully generalize and optimize its predictions
across these tasks. Moving forward, future work will focus on completing the remaining annotations
that were not finalized before the deadline of this study. Once these annotations are completed, the
results will be submitted to the SemTab challenge organizers for formal evaluation. This subsequent
evaluation will provide further insights into the model’s performance and its applicability to similar
challenges in the field.
[5] M. I. Sander Schulhof, The prompt report: A systematic survey of prompting techniques, arxiv
(2024). URL: https://arxiv.org/abs/2406.06608.
[6] O. Hassanzadeh, Semantic tabular data annotation to knowledge graph matching, in: Semtab
challenge, 2024. URL: https://sem-tab-challenge.github.io/2024/.
[7] N. Abdelmageed, tfood: Semantic table annotations benchmark for food domain, Zenodo (2023).</p>
      <p>URL: https://zenodo.org/records/10048187.
[8] N. Abdelmageed, tbiomed: Semantic table annotations benchmark for biomedical domain, Zenodo
(2024). URL: https://zenodo.org/records/10996334.
[9] N. Abdelmageed, tbiodiv: Semantic table annotations benchmark for biodiversity domain, Zenodo
(2024). URL: https://zenodo.org/records/10996688.
[10] Cremaschi, Semtab 24: Semantic table annotations benchmark for llm-based approaches, Zenodo
(2024). URL: https://zenodo.org/records/11031987.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Azanzi Jiomekong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hippolyte</given-names>
            <surname>Tapamo</surname>
          </string-name>
          ,
          <article-title>An ontology for tuberculosis surveillance system</article-title>
          ,
          <source>SpringerLink</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y. L. Ruizhe</given-names>
            <surname>Ma</surname>
          </string-name>
          , f
          <article-title>-kgqa: A fuzzy question answering system for knowledge graphs</article-title>
          ,
          <source>ScienceDirect</source>
          (
          <year>2024</year>
          ). URL: https://www.sciencedirect.com/science/article/abs/pii/S016501142400263X.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jiomekong</surname>
          </string-name>
          ,
          <article-title>Towards an approach based on knowledge graph refinement for tabular data to knowledge graph matching, CEUR-WS (</article-title>
          <year>2022</year>
          ). URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3320</volume>
          /paper12.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Knowledge graph refinement: A survey of approaches and evaluation methods</article-title>
          ,
          <source>Semant. Web</source>
          <volume>8</volume>
          (
          <year>2017</year>
          )
          <fpage>489</fpage>
          -
          <lpage>508</lpage>
          (
          <year>2016</year>
          ). URL: http://dx.doi.org/10.3233/SW-160218.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>