<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Strings to Semantics: A Graph-based Reranking Approach for Annotating Tables using Domain Ontologies⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nan Liu</string-name>
          <email>nan.liu@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed-Anis Koubaa</string-name>
          <email>mohamed.koubaa@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Suess</string-name>
          <email>wolfgang.suess@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veit Hagenmeyer</string-name>
          <email>veit.hagenmeyer@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As one of the most widely used data storage and exchange formats, tabular data can be challenging to be integrated, interpreted, and reused when they lacks accurate semantic annotations, particularly when data come from heterogeneous sources. However, the annotation process is often time-consuming and requires a deep understanding of the internal structure of the target ontology. Therefore, developing eficient and accurate semi-automatic or fully automatic annotation tools is very important. Most existing approaches often rely on textual similarity to match column headers to ontology terms, and fail to efectively leverage the rich relational semantics representation within the ontology. To address this issue, we propose a reranking approach that combines semantic similarity with ontology structure. Specifically, we first generate a set of candidate ontology terms based on semantic similarity. For each source table header and its candidate ontology terms, we construct subgraphs and train a lightweight Graph Neural Network (GNN) model on these graphs to learn structure-aware representations. These representations are then used to improve the ranking of candidate ontology terms. To validate our approach, we performe experiments on the OAEI dataset. The results demonstrate that our approach improves Hit@1 by 4% compared to a baseline model that only relies on lexical similarity. This result shows that learning on local subgraphs is a promising direction for ontology alignment and schema matching.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Graph Neural Networks</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Reranking</kwd>
        <kwd>Semantic Annotation</kwd>
        <kwd>Ontology Matching</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Interoperability and knowledge integration between heterogeneous data sources have always been key
challenges in the semantic web domain. A large amount of tabular data are often generated and stored in
separate databases across diferent infrastructures. The semantics of such data are always ambiguous and
non-standardized, which impedes the implementation of the FAIR principles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, annotating
tabular data is not a simple task. It is time-consuming, error-prone, and requires a deep understanding
of the target ontology. The task of mapping table headers to ontology terms can be treated as a data
matching problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Previous research has proposed various approaches, such as [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]. Recently,
Large Language Models (LLMs) and Pre-trained Language Models (PLMs) like Sentence-Bidirectional
Encoder Representation Transformer (SBERT) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been widely used for data matching tasks. These
models can capture contextual meaning and have shown promising results [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. However, most
of them rely only on lexical or contextual similarity and are therefore incapable of reasoning about
complex relationships defined in OWL axioms, such as hierarchies, subclass relations, and property
dependencies. This limitation becomes more significant in domain-specific tasks. In addition, some
LLM-based methods [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] have demonstrated strong performance in zero-shot annotation tasks, but
their decision-making processes are dificult to explain due to their black-box nature [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Inspired by recent research in the application of Graph Neural Networks (GNNs) to knowledge graph
completion and reranking tasks [11, 12, 13], we propose a lightweight reranking approach that integrates
ontology structure into the matching process. We construct a subgraph for each source table header
and its candidate ontology terms, and train a GNN model on these graphs. By passing and aggregating
messages among nodes, the model evaluates both the semantic and structural similarity and generates
the final ranking of candidate terms. The proposed approach significantly reduces computational cost
and improves annotation accuracy. The main contributions of this paper are as follows:
• We propose an approach for dynamically constructing context graphs for semantic annotation and
reranking tasks, which improves the matching accuracy and enhances computational eficiency;
• We perform our approach on several real-world datasets, the approach achieves significant
performance gains over the baseline model and is able to generate higher quality semantic
annotations.</p>
      <p>The remainder of this paper is structured as follows: Section 2 shows related work on
graphbased reranking techniques. Section 3 introduces the proposed methodology. Section 4 describes the
experimental setup and results. Section 5 concludes the paper with future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Previous work [12] shows that graph-learning methods can be efectively used for reranking and
Retrieval Augmented Generation (RAG) tasks. In graph-based reranking approaches, candidate documents
are modeled as nodes, and candidate-candidate edges are constructed from semantic similarity and
external knowledge. Then, the message passing or aggregation will be used for structured reasoning
within the candidate set to generate more reliable candidates. The training methods of graph-based
reranking models can be categorized into three types [14], they are point-wise [15, 16], pair-wise [17, 18],
and list-wise [19]. Motivated by these works, we adapt this idea to Column Type Annotation (CTA) tasks.
We construct subgraphs for each table header, where nodes of the subgraph are the top-K candidate
ontology terms, and edges are derived from semantic similarity between candidates and structural
relations in the target ontology (such as subClassOf, part_Of, has_quality). A graph-based reranking
model then scores the nodes on this subgraph to get the final ranking.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In this section, we provide a brief description of our approach and its implementation details. As shown
in Figure 1, the proposed approach can be divided into two stages. In the first stage, we use an SBERT
to retrieve the top-K candidate ontology terms based on semantic similarity. In the second stage, we
construct a local subgraph for each table header and its candidate ontology terms. These subgraphs are
then used as input to a GNN, which learns structure-aware representations to rerank the candidate
ontology terms. In the following, we first define the problem formally and then describe in detail how
the subgraphs are constructed.</p>
      <sec id="sec-3-1">
        <title>3.1. Problem Formulation</title>
        <p>Our task can be defined as: Given a target ontology  that contains a set of terms  = {1, 2, . . . , }
and an input table header ℎ, we first apply an SBERT bi-encoder (·) to obtain embeddings ℎ = (ℎ)
and () = (), compute cosine similarities (ℎ, ) = ⟨ℎ, ()⟩/(‖ℎ‖ ‖()‖), and return a
top- candidate list cand = [︀ (1 , 1 ), . . . , ( ,  )︀] sorted by similarity score . Then for
each table header ℎ we construct a header-specific candidate subgraph ℎ = (, ,  ): the
node set  = {ℎ, 1 , . . . ,  } contains the candidates for that header ℎ, edges and weights
(,  ) capture pairwise relatedness, for example, semantic similarity or ontology relations. Each
node has features  = [︀ ( );  ︀] and embeddings ℎ. Then we define a reranking function
rerank based on a pre-trained GNN model. This function takes the table header ℎ and the
candidate list  =</p>
        <p>{1 , 2 , ...,  } as input, and outputs final results as  =
{1 , 2 , ...,  }.
ontology terms with SBERT. Stage 2 uses GNN to rerank candidate ontology terms.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Graph Construction</title>
        <p>To enable the reranking process, we construct a subgraph for each source header ℎ and its candidate
ontology terms  = {1 , 2 , ...,  }, which are shown in Algorithm 1. We represent
ℎ and  = {1 , 2 , ...,  } as nodes in a graph. To connect the nodes, we add edges
between the ℎ and each candidate . The edge weights are the semantic similarity score calculated
from the first-stage retrieval. In order to include structural information of the target ontology
,
we search for the relations between candidate terms  = {1 , 2 , ...,  } in the target
ontology  (such as subClassOf, part_Of, has_quality). If the relation  exists and (, ,  ) is
true, we add an edge between  and  . Each edge contains two features: a similarity score and the
binary value of whether it is a structural edge of the ontology. In addition, we add self-loop edges to all
nodes with a fixed weight of 1. These self-loops help preserve their own node features during message
propagation.</p>
        <p>Algorithm 1: Graph Construction</p>
        <sec id="sec-3-2-1">
          <title>Input: Header text ℎ, ontology , candidate list cand</title>
          <p>Output: Graph  = (, ) with node features and edge weights
// Step 1: Graph Nodes</p>
          <p>ontology terms.
 ← {ℎ</p>
          <p>, 1, . . . ,  }, where ℎ is the source table header and {1, ..., } are the candidate
// Step 2: Add Edges to Graph</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>1. Add source-to-candidate edges (ℎ, ) with edge feature [, 0]</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>2. Add candidate-to-candidate edges (,  ) with feature [, is_ontology]</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>3. Add self-loops (ℎ, ℎ) with feature [1, 0]</title>
          <p>return  = (, ) with node features and edge weights</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model Training</title>
        <p>To learn structural representation for re-ranking candidate terms, we train a lightweight Graph Attention
Model (GAT) based on GATv2 [20]. The GAT model consists of two GATv2 convolutional layers, followed
by a linear classifier. In the training process, we use the RankNet loss [21]:
 = log 1 + −( − )︁)
︁(
(1)
For each graph, we sample all positive  and negative  candidate pairs and compute the average
pairwise ranking loss. The goal is to rank the correct term as high as possible in the final reranking list
.
Baseline (SBERT only)</p>
        <p>Rerank with MMR</p>
        <p>Rerank with CE
Rerank with GCN
Rerank with GAT</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment Setup</title>
        <p>We conduct experiments on the Bio-ML track 1 of the OAEI (Ontology Alignment Evaluation Initiative)
benchmark in 2024, which focuses on ontology alignment tasks in the biomedical domain. The dataset2
used in our experiments consists of three parts:
• Source Header: Each class label in the source ontology is treated as a source header to be
annotated. We use the NCIT ontology as the source ontology in this experiment.
• Target Ontology: The complete target ontology is used for candidate retrieval. We select DOID
as the target ontology.
• Ground Truth Dataset: The oficial reference alignment file with a unique correct match in the
target ontology for each source header.</p>
        <p>We use two standard ranking metrics to evaluate the performance of the model: Hit@K to evaluate
topK accuracy and Mean Reciprocal Rank (MRR) to evaluate the overall quality of the reranked results [22].
We evaluate all methods on the same candidate set generated by the first-stage SBERT bi-encoder. The
systems compared are as follows: SBERT-only, using the first-stage similarity score as the final score; A
non-graph reranker based on Maximal Marginal Relevance (MMR) that post-processes the SBERT list to
balance relevance and diversity; A lightweight Cross-Encoder (CE) that concatenates the table header
with candidate terms and inputs them into a single transformer and rescoring relevance to generate the
ifnal score; And two graph-based reranking models, Graph Convolutional Neural Network (GCN) and
GAT that operate on the candidate subgraph.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results and Analysis</title>
        <p>Preliminary experimental results are shown in Table 1. The proposed GAT model achieves the best
overall performance. It reaches a Hit@1 of 0.824, with an accuracy improvement of 4% over the
SBERTonly baseline model (Hit@1 of 0.782), and GCN model (Hit@1 of 0.629). In addition, the GAT model also
achieves the highest MRR score of 0.863. The results demonstrate the efectiveness of incorporating
ontology structure into the reranking process and highlight the significant potential for enhancing
schema matching tasks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Works</title>
      <p>In this paper, we propose a graph-based reranking approach, which improves the performance of
semantic annotation tasks. By constructing a local subgraph for each table header and its candidate
ontology terms, our method efectively integrates lexical semantic similarities with structural knowledge.
Experiments on the OAEI Bio-ML track dataset show that our approach results in a Hit@1 of 0.824
1https://krr-oxford.github.io/OAEI-Bio-ML/
2https://zenodo.org/records/13119437
and a 4% improvement compared to the baseline model. These results provide a new perspective on
performing eficient annotation solutions with reduced computational and cost demands.</p>
      <p>For future work, we plan to enrich the representation of the constructed graphs by adding additional
node and edge features beyond simple relations. Furthermore, we aim to extend the model to support
multiple ontologies, enabling it to better support annotation tasks in multi-domain scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the German Federal Government, the German State Governments,
and the Joint Science Conference (GWK) for their funding and support as part of the NFDI4Energy
consortium. The work was funded by the German Research Foundation (DFG) – 501865131 within the
German National Research Data Infrastructure (NFDI, www.nfdi.de).</p>
      <p>This work is supported by the Helmholtz Association Initiative and Networking Fund on the
HAICORE@KIT partition and the Helmholtz Metadata Collaboration (HMC).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4o and Grammarly for: Grammar and
spelling checks. After using these tool(s)/service(s), the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
[11] J. Dong, B. Fatemi, B. Perozzi, L. F. Yang, A. Tsitsulin, Don’t forget to connect! improving rag with
graph-based reranking, arXiv preprint arXiv:2405.18414 (2024).
[12] M. S. Zaoad, N. Zawad, P. Ranade, R. Krogman, L. Khan, J. Holt, Graph-based re-ranking: Emerging
techniques, limitations, and opportunities, arXiv preprint arXiv:2503.14802 (2025).
[13] H. Zhu, D. Xu, Y. Huang, Z. Jin, W. Ding, J. Tong, G. Chong, Graph structure enhanced pre-training
language model for knowledge graph completion, IEEE Transactions on Emerging Topics in
Computational Intelligence 8 (2024) 2697–2708.
[14] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, H. Li, Learning to rank: From pairwise approach to listwise
approach, volume 227, 2007, pp. 129–136. doi:10.1145/1273496.1273513.
[15] K. Reed, H. {Tayyar Madabushi}, Faster bert-based re-ranking through candidate passage extraction,
in: The Twenty-Ninth Text REtrieval Conference (TREC 2020), 2020, pp. 1–5.
[16] A. G. D. Francesco, C. Giannetti, N. Tonellotto, F. Silvestri, Graph neural re-ranking via corpus
graph, 2024. URL: https://arxiv.org/abs/2406.11720. arXiv:2406.11720.
[17] J. Luo, X. Chen, B. He, L. Sun, Prp-graph: Pairwise ranking prompting to llms with graph
aggregation for efective text re-ranking, 2024, pp. 5766–5776. doi: 10.18653/v1/2024.acl-long.313.
[18] L. Gienapp, M. Fröbe, M. Hagen, M. Potthast, Sparse pairwise re-ranking with pre-trained
transformers, in: Proceedings of the 2022 ACM SIGIR International Conference on Theory of
Information Retrieval, ICTIR ’22, ACM, 2022, p. 72–80. URL: http://dx.doi.org/10.1145/3539813.
3545140. doi:10.1145/3539813.3545140.
[19] M. Rathee, S. MacAvaney, A. Anand, Guiding retrieval using llm-based listwise rankers, 2025. URL:
https://arxiv.org/abs/2501.09186. arXiv:2501.09186.
[20] S. Brody, U. Alon, E. Yahav, How attentive are graph attention networks?, arXiv preprint
arXiv:2105.14491 (2021).
[21] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to
rank using gradient descent, in: Proceedings of the 22nd international conference on Machine
learning, 2005, pp. 89–96.
[22] Y.-M. Tamm, R. Damdinov, A. Vasilev, Quality metrics in recommender systems: Do we calculate
metrics consistently?, in: Proceedings of the 15th ACM conference on recommender systems,
2021, pp. 708–713.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Unicorn: A unified multi-tasking model for supporting matching tasks in data integration</article-title>
          ,
          <source>Proceedings of the ACM on Management of Data</source>
          <volume>1</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <article-title>A survey of approaches to automatic schema matching</article-title>
          ,
          <source>the VLDB Journal</source>
          <volume>10</volume>
          (
          <year>2001</year>
          )
          <fpage>334</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bellahsene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Duchateau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Velegrakis</surname>
          </string-name>
          ,
          <article-title>On evaluating schema matching and mapping</article-title>
          ,
          <source>in: Schema matching and mapping</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Aumueller</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Do</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Massmann</surname>
          </string-name>
          , E. Rahm,
          <article-title>Schema and ontology matching with coma++</article-title>
          ,
          <source>in: Proceedings of the 2005 ACM SIGMOD international conference on Management of data</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>906</fpage>
          -
          <lpage>908</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Ç. Demiralp,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W.-C. Tan,
          <article-title>Annotating columns with pretrained language models</article-title>
          ,
          <source>in: Proceedings of the 2022 International Conference on Management of Data</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1493</fpage>
          -
          <lpage>1503</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beigi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          , L. Cheng, H. Liu,
          <article-title>Large language models for data annotation and synthesis: A survey</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>930</fpage>
          -
          <lpage>957</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Parciak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vandevoort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Neven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Peeters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vansummeren</surname>
          </string-name>
          ,
          <article-title>Llm-matcher: A name-based schema matching tool using large language models</article-title>
          ,
          <source>in: Companion of the 2025 International Conference on Management of Data</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>203</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Freire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Feuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koutras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Peña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Silva</surname>
          </string-name>
          , E. Wu,
          <article-title>Large language models for data discovery and integration: Challenges and opportunities</article-title>
          .,
          <source>IEEE Data Eng. Bull</source>
          .
          <volume>49</volume>
          (
          <year>2025</year>
          )
          <fpage>3</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>