<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unsupervised Scoring⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuuki Tachioka</string-name>
          <email>tachioka.yuki@core.d-itlab.co.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yasunori Terao</string-name>
          <email>terao.yasunori@core.d-itlab.co.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cell Entity Annotation</institution>
          ,
          <addr-line>Entity Linking, Wikidata, Large Language Models, Transpose Strategy, Unsupervised</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Denso IT Laboratory</institution>
          ,
          <addr-line>13F Shintora Yasuda Bldg., 4-3-1 Shimbashi, Minato-ku, Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents a robust approach to Cell Entity Annotation (CEA) in the ISWC 2025 SemTab Challenge MammoTab task, where tables must be linked to Wikidata entities without gold labels. We propose a multi-stage Wikidata entity identifier candidate generation pipeline combined with an iterative process alternating between Column Type Annotation (CTA) and CEA. Candidate sets are refined through both the original and transposed table orientations, with final candidates taken as the union of both to leverage complementary contextual cues. We also introduce unsupervised evaluation metrics, consistency and entropy, which enable performance estimation and iteration control without labeled data. Experiments on 84,907 entities and 3,576 columns show that our method improves label coverage and semantic coherence, and the best-selection strategy achieved the highest scores. The results demonstrate that combining multi-orientation candidate generation with iterative refinement and unsupervised evaluation provides a practical and efective solution for large-scale, label-free entity linking in In recent years, the rapid development of Large Language Models (LLMs) has enabled the understanding of not only text but also multiple modalities such as images and tables. Tabular data, in particular, exhibit a high degree of structural clarity while often lacking explicit column names or contextual information, making it challenging to accurately capture their intended meaning. Understanding the semantics of table data plays an important role in data integration, knowledge discovery, information retrieval, and downstream decision-making tasks [1, 2, 3].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Evaluation</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>
        In this work, we propose a multi-stage QID candidate generation strategy combined with an iterative
process alternating between Column Type Annotation (CTA) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and CEA, with the aim of achieving
high accuracy and reduced noise[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In particular, NIL detection requires minimizing noise during the
candidate generation phase [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Our approach applies CTA and CEA in a loop to mutually reinforce the
CTA and the selection of QID candidates. In addition, recent studies have explored LLM-based table
generation and interpretation methods [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], demonstrating their strong potential to capture relational
semantics in complex tabular structures [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; therefore, we also explore the automatic estimation of
column names using ChatGPT. Furthermore, to maximize the benefits of collective inference, we utilize
both the original table and its transposed form, thereby capturing semantic relationships along both
rows and columns.
      </p>
      <p>Moreover, since no development set is provided for this task, unsupervised methods for performance
estimation are necessary. We propose calculating both consistency scores and entropy scores for
predicted labels and selecting outputs based on the principle of minimization of entropy [11].</p>
    </sec>
    <sec id="sec-3">
      <title>2. ISWC 2025 SemTab Challenge: MammoTab Task</title>
      <p>Tabular data, particularly in the CSV format, is widely used in data analysis pipelines. However, the
lack of an explicit semantic structure often hinders efective analysis. Tables available on the Web also
serve as valuable information sources and, by enriching them with semantic annotations, they can be
leveraged for applications such as search, question answering, and knowledge base construction.</p>
      <p>The SemTab Challenge provides a benchmark for fair evaluation and comparison of systems that
match knowledge graphs (KGs) and tables. In the MammoTab task, the focus is on CEA based on
Wikidata (version 20240720), with participating systems required to address the following challenges:
• Disambiguation
• Homonymy resolution
• Alias resolution
• NIL detection
• Noise robustness
• Collective inference
Moreover, the task restricts approaches to those based on LLMs, either through fine-tuning or
RetrievalAugmented Generation.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Proposed Method</title>
      <sec id="sec-4-1">
        <title>3.1. Overview</title>
        <p>Figure 1 illustrates the overall architecture of the proposed method. The process begins with a six-stage
candidate generation pipeline (3.2) that extracts potential QIDs from table cells. In the CTA stage, the
column-level semantics are estimated. The estimated column types serve as contextual cues for the CEA
stage, which links individual cells to their most appropriate QIDs. The CTA and CEA modules operate
in an iterative loop (3.4), allowing each to reinforce the other. CTA refines the context of columns, while
CEA updates the predictions of QIDs of cell entities based on improved column semantics. This mutual
refinement progressively enhances the overall semantic coherence of the annotations.</p>
        <p>Furthermore, the proposed framework applies not only to the original table, but also to its transposed
version (3.5). This dual-orientation processing allows the model to capture complementary contextual
cues along both rows and columns. To input tables into LLM, the tables are vectorized. For example, in
the case of TableLlama [12], which is a baseline model in this paper, the input of tables is in row-major
order and the entities that belong to the same column are remote for each row in the input prompt,
thus, TableLlama may struggle with long vertical columns; processing the transposed table mitigates
this issue by aligning semantically related cells closer in textual space.</p>
        <p>After performing the same candidate generation, iterative refinement of CTA-CEA and unsupervised
scoring (3.6) in both orientations, the results are compared and the best performing annotations are
selected. This orientation-based ensemble enhances robustness and consistency across diferent table
structures, efectively improving performance in various data layouts.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Multi-stage Entity Candidate Generation</title>
        <p>For each cell, QID candidates are generated from the latest Wikidata dump using a six-stage process
designed to minimize the inclusion of unrelated entities into the candidate set:
1. Year and Digit Detection:
• If the values in the column are numeric and within the range 1770–2030, the column is
classified as a year column.
• If the values form a sequential series, the column is classified as an ID column.</p>
        <p>• Appropriate QIDs are assigned to each type accordingly.
2. Exact Match: If the cell value exactly matches a Wikidata label string, the corresponding QID is
added to the candidate set.
3. Set Match: The cell value is tokenized in words, and if the resulting set exactly matches a Wikidata
label word set, the corresponding QID is added.
4. Set Match (Description Removal): Parenthetical text, comma-separated sufixes, and other
supplementary descriptions are removed from the cell value; if the resulting word set matches a
Wikidata label set, the QID is added.
5. Partial Set Match (label in cell value): Word sets partially matching a label’s word set are
considered, but restricted to sets of size at most min(2 × ||, || + 5) where  is the word set of the
original label. For example:
• If || = 2 , allow up to 4 words.</p>
        <p>• If || = 6 , allow up to 11 words.
6. Partial Set Match (cell value in label): Similar to step 5, but checking whether the cell value’s
word set is contained in the label’s word set, with the same size restrictions.</p>
        <p>This staged approach ensures high coverage while reducing the likelihood of introducing noisy QID
candidates.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Initial Column Type Annotation (CTA)</title>
        <sec id="sec-4-3-1">
          <title>3.3.1. Majority Voting</title>
          <p>Column labels are estimated from the generated candidate sets as follows:
• For each column, retrieve the instanceOf labels of its multi-stage QID candidates from Wikidata.
• Flatten the label set, and if the most frequent label exceeds a majority threshold, assign it as the
column’s representative label.</p>
          <p>• Aggregate the labels of all columns to form a pseudo-title for the table.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>3.3.2. ChatGPT-based Column Name Generation</title>
          <p>In addition, we prompt ChatGPT with the contents of each column to predict plausible column names,
which are then combined to form another pseudo-title for the table.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Iterative CEA and CTA</title>
        <p>Using the initial CTA results, we perform CEA with the state-of-the-art model, TableLlama [12]. Based
on the obtained QIDs, the CTA is updated to re-estimate column semantics. CEA is then performed
again using the updated CTA. By alternating CTA and CEA in this way, we aim for mutual performance
improvements.</p>
      </sec>
      <sec id="sec-4-5">
        <title>3.5. Transpose Table Strategy</title>
        <p>Some tables contain long vertical columns, which models based on row-major input such as TableLlama
may find challenging to interpret. By transposing the table before input, cell values that are semantically
related can be positioned closer together in text space, reinforcing contextual signals:
• This often improves the consistency of the type within columns.</p>
        <p>• For cases where row-wise relationships are important, the original orientation is preferable.
We perform a CTA on both the original and transposed versions of the table, improving QID accuracy
through complementary perspectives.</p>
      </sec>
      <sec id="sec-4-6">
        <title>3.6. Unsupervised Score Calculation</title>
        <p>Since no development set with labels is provided, the performance of the model must be estimated
without supervision, following the principle of minimization of entropy [11].</p>
        <p>Consistency Score: for each column, it computes the agreement ratio of the instanceOf label mode:
where ℒ is the set of predicted labels.</p>
        <p>Entropy Score: for each column, it computes the entropy of the instanceOf label distribution, where
a lower entropy indicates more consistent predictions:
mode ratio =
count(mode(ℒ ))</p>
        <p>|ℒ |
entropy = − ∑
∈ℒ</p>
        <p>count() log2
count()

where  is the number of cells in the column.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiment</title>
      <p>4.1. Setup
Experiments were conducted on the MammoTab task of the ISWC 2025 SemTab Challenge. The primary
objectives were to improve the accuracy of the CEA for tables and to evaluate the proposed unsupervised
scoring methods on unlabeled data. The experimental dataset contained a total of 84,907 unique entities
and 3,576 target columns to be annotated.</p>
      <p>For comparison, we prepared the following experimental variations:
• Initial CTA based on candidate sets
• Initial CTA based on ChatGPT-generated column names
• Comparison and integration of results from the original and transposed table representations
• Iterative CEA and CTA with 1 to 4 alternating steps</p>
      <sec id="sec-5-1">
        <title>4.2. LLM setup</title>
        <p>For LLM-based column name estimation, we used GPT-4.1 accessed via the OpenAI API. The model
was prompted with the entire table (converted to Markdown format) and instructed to infer meaningful
names for all columns labeled “Unknown.X”. In a single inference, the CTA for all columns was executed
simultaneously. The complete prompt used for the generation of ChatGPT-based column names is
shown in Listing 1. This prompt guides the model to infer meaningful names for columns labeled as
“Unknown.X” and return the results in a structured JSON format.</p>
        <p>You are given a table with column names, some of which are labeled as "Unknown.X"
(e.g., Unknown.1, Unknown.2, etc.). Your task is to infer the most likely meaning
of each "Unknown.X" column based on the data it contains. For each "Unknown.X"
column, provide:
1. A list of multiple possible column name candidates.
2. The single best column name you believe fits the data.</p>
        <p>For columns that are not labeled "Unknown.X", you may suggest a better name if it
is clearly more descriptive or accurate, but this is optional. If you do suggest
a new name, also provide candidates and the single best choice, similar to the
Unknown columns.</p>
        <p>The output must be in valid JSON format, using the following structure:
{
"columns": {
"&lt;original_column_name&gt;": {
"candidates": ["candidate_1", "candidate_2", "..."],
"best_guess": "best_candidate"
},
...
}</p>
        <p>}
Do not include any explanations outside of the JSON.</p>
        <p>Here is the table to analyze:
{{ table }}</p>
        <p>Listing 1: Prompt used for ChatGPT-based CTA generation.</p>
        <p>A total of 756 API calls were made, corresponding to the number of Markdown-converted tables. We
used the Batch API, with a pricing rate of $1.00 per 1 M input tokens and $4.00 per 1 M output tokens.
Given approximately 0.5M input tokens (2.1 M words) and 0.4M output tokens (1.4 M words), the total
cost was approximately $2.1. The relatively low computational cost demonstrates the practicality of
using LLM-based column name estimation at scale.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.3. Multi-stage Entity Candidate Generation</title>
        <p>The results in Table 1 show how the proposed multi-stage QID candidate generation pipeline
progressively improves coverage while controlling noise. Stage 1, which performs exact matching and
high-confidence heuristics, immediately recovers 70.1% of the entities (24,596 matches), leaving 10,482
unmatched. Stages 2 and 3 provide modest gains (0.6% and 2.7% additional coverage, respectively),
indicating that set-based matching and description-removed matching capture only a small fraction
of the remaining entities. A substantial improvement occurs at Stage 4, where controlled partial set
matching (label in cell value) produces a jump of 8.9 percentage points, reducing the unmatched set
to 5,690 entities. This suggests that carefully expanding match criteria is highly efective in bridging
coverage gaps. Stages 5 and 6 add further recall through reversed partial matching strategies, reaching
a final coverage of 92.3%.</p>
        <p>Overall, the stage-wise progression reflects a deliberate trade-of: early stages prioritize precision
with strict matching, while later stages boost recall by relaxing constraints in a controlled manner.
The large gain at Stage 4 underscores its central role in balancing coverage expansion against noise
suppression.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.4. Initial CTA Performance</title>
      </sec>
      <sec id="sec-5-4">
        <title>4.5. Iterative CTA and CEA Performance</title>
        <sec id="sec-5-4-1">
          <title>4.5.1. Labeling Coverage (Table 3, Table 4)</title>
          <p>Tables 3 and 4 show the number of columns for which labels could not be assigned at each step for
diferent initialization strategies. Significant improvements are observed in the early iterations (Step 1
and Step 2), after which the performance gains diminish, indicating that the iterative process quickly
approaches a stable state. The transpose-based approach (Step 2(⊤) in Table 3) is particularly efective
in reducing the number of unlabeled cells, especially in challenging cases where the most frequent
label is a Wikimedia disambiguation page. When starting with ChatGPT-based column name estimation
(Table 4), the initial steps also produce improvements; however, later iterations show limited additional
942
942
942
942
1002
1034
1015
1002
gains, and candidate set-based initialization ultimately achieves more stable and consistent performance
across iterations.</p>
          <p>Given that the evaluation target comprised a total of 3,576 columns, we also calculated the labeling
rate for each step, defined as the percentage of columns with assigned labels out of the total. As shown
in Table 3, the candidate set–based initialization labeled approximately 55.4% of columns at Step 1,
while the ChatGPT-based initialization (Table 4) achieved a slightly higher rate of 58.9% at the same
step. Interestingly, the labeling rate temporarily decreases at Step 2 in both settings. This drop can be
explained by the fact that the first CEA pass fixes entity linking decisions, which in turn reduces the
diversity of available candidates for subsequent CTA, leaving some columns without any dominant
type label. In practice, this means that low-confidence or noisy candidates are eliminated, which may
reduce coverage, but can also improve overall precision in later iterations. Thus, the Step 2 decrease
should not necessarily be interpreted as a performance degradation, but rather as a selective filtering
efect that prioritizes high-confidence assignments. In particular, the transpose-based approach in
Step 2(⊤) for candidate set–based initialization reached the highest coverage of 59.1%, indicating that
transposition can recover labels for some columns that remain unlabeled in the original orientation. In
our implementation, the set of candidates for each column is constructed as the union of candidates
obtained from the original and transposed table representations, thereby leveraging complementary
contextual cues from both orientations.</p>
        </sec>
        <sec id="sec-5-4-2">
          <title>4.5.2. Unsupervised Score (Tables 5–10)</title>
          <p>We evaluated the proposed unsupervised scores to quantitatively assess prediction stability and
uncertainty without explicit gold labels. For the original table orientation (Table 5), there was a substantial
improvement from Step 1 to Step 2 for both metrics, after which the scores plateaued, indicating that
the iterative process quickly reaches a stable state. When using the transposed table (Table 6), both the
consistency and entropy scores were generally higher than those of the original table, suggesting that
the transposed format facilitates more coherent type predictions across columns.</p>
          <p>In the case of ChatGPT-based initialization (Table 7), Step 1 achieved better scores than the
initialization based on the candidate set, showing the benefit of leveraging the LLM-generated column names for
the initial step. However, this advantage diminished in Step 2, where the scores decreased noticeably,
implying that the initial labels derived from LLM may introduce inconsistencies during subsequent
iterations. For the transposed setting (Table 8), Step 1 achieved slightly higher consistency and lower
entropy compared to the original ChatGPT start (Table 7), indicating modest gains from improved
contextual proximity. However, Step 2 did not yield further improvements, suggesting a limited benefit
from iterative refinement in this configuration.</p>
          <p>Finally, selecting the best score from the original or transposed results at each step (Table 9), we
were able to combine the strengths of both orientations, achieving the highest performance in all steps
(mean consistency = 0.563, mean entropy = 1.585). This indicates that the two orientations provide
complementary information that can be exploited to improve unsupervised performance estimation.
When applying best selection (Table 10), both consistency and entropy scores improved over
singleorientation ChatGPT start results. This confirms that, as with candidate set–based initialization,
combining complementary orientations improves robustness. However, even in the best selection
setting, the initialization of ChatGPT did not exceed the highest scores obtained from the best selection
based on the candidate set (Table 9), indicating that the LLM-derived column names, while helpful in
the early stages, require additional filtering to match the stability of candidate-based methods.</p>
        </sec>
        <sec id="sec-5-4-3">
          <title>4.5.3. Visualization of Unsupervised Score Distributions</title>
          <p>Visualizing the distributions of unsupervised scores allows for an intuitive comparison between steps
and methods. For Step 1 with the original table orientation (Figure 2), the scores are widely distributed
without clear mode, indicating a high variability and low consistency in the predictions. This reflects
the insuficient contextual information available when only the initial CTA and CEA are combined.</p>
          <p>In Step 2 (Figure 3), the score distribution becomes more concentrated and the mode agreement rate
improves. This suggests that the iterative reasoning process efectively enhances semantic coherence
between columns, enabling more stable label estimation for many columns. When using the transposed
table in Step 2 (Figure 4), the score distribution becomes even sharper, with an increased number of
high-scoring columns. This trend implies that transposition emphasizes textual proximity between
semantically related cells, improving the contextual understanding of the model, especially for columns
with strong context dependency.</p>
          <p>Finally, when selecting the better prediction between the original and transposed results for each
column in Step 2 (Figure 5), the score distribution reaches its highest concentration, with almost no
columns of low scores remaining. This shows that choosing the better orientation for each column
serves as an efective and practical ensemble strategy, maximizing overall consistency and prediction
reliability.</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>4.6. Discussion</title>
        <p>The experimental results clearly demonstrate that collective inference enhances column-wise
consistency and refines cell-level entity predictions. Through the iterative interaction between CTA and CEA,
even columns with insuficient initial information benefited from relationships with other columns,
leading to improved accuracy. This efect is particularly evident in the notable increase in the
consistency score from Step 1 to Step 2 (Figures 2 and 3), indicating that semantic coherence across the table
was strengthened in the early stages of iteration.</p>
        <p>The utility of the transpose strategy was also confirmed. By increasing textual proximity between
semantically related cells, the model’s contextual understanding was significantly enhanced, especially
when the model input is in row-major order. In Step 2 with the transposed table (Figure 4), the
score distribution reached its peak, with higher agreement rates and lower entropy compared to the
original orientation. This suggests that transposition acts as an efective form of context reinforcement,
particularly for columns requiring strong contextual cues.</p>
        <p>Moreover, selecting the better result between the original and transposed orientations for each column
proved to be a powerful ensemble-like strategy. This “best selection” approach (Figure 5) produced
the most concentrated score distribution, with a substantial reduction in low-scoring columns. The
outcome highlights that structural uncertainty in the interpretation of the table can be mitigated by
integrating complementary perspectives from diferent orientations, thus improving the robustness of
the model predictions.</p>
        <p>On the other hand, the use of ChatGPT-generated column names for initialization exhibited clear
limitations. Although this approach provided a temporary advantage in Step 1 over candidate set–based
initialization, it failed to adapt efectively in later iterations. The column names generated by ChatGPT
were sometimes ambiguous or overly specific, introducing noise during the iterative process. As
shown in Table 7, the initial benefit was not sustained and subsequent scores lagged behind those from
initialization according to the set of candidates.</p>
        <p>Finally, the proposed unsupervised evaluation metrics, consistency and entropy, proved to be reliable
indicators of model performance trends, even in the absence of gold labels. The marked improvement
from Step 1 to Step 2, followed by gradual stabilization, quantitatively supports the qualitative
enhancement of the label predictions. These metrics could be further applied for early stopping in iterative
processes or for detecting overfitting, ofering practical benefits for unsupervised table annotation
pipelines.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>This study presented a method for CEA in the ISWC 2025 SemTab Challenge MammoTab task, combining
multi-stage QID candidate generation with an iterative process of CTA and CEA. The approach was
designed to operate without gold labels, using unsupervised evaluation metrics based on consistency
and entropy to monitor and guide performance.</p>
      <p>Experimental results demonstrated that the iterative CTA–CEA framework substantially improved
column-wise semantic coherence, particularly between Step 1 and Step 2, and that the transpose
strategy further enhanced contextual understanding for table-data LLMs thanks to better table data
vectorization. Selecting the better result between the original and transposed tables yielded the highest
overall performance, confirming the value of integrating complementary structural perspectives.</p>
      <p>Although ChatGPT-based initialization provided a temporary advantage in early iterations, it proved
less efective in later stages due to the introduction of noise. The unsupervised metrics reliably reflected
performance trends, suggesting their applicability for early stopping and overfitting detection in
unsupervised annotation settings.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT-4 and Writefull for grammar and
spelling checks, as well as for paraphrasing and rewording. After using these services, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
GPT models for semantic table annotation, in: SemTab@ISWC, 2024, pp. 43–53. URL: https:
//ceur-ws.org/Vol-3889/paper3.pdf.
[11] Y. Grandvalet, Y. Bengio, Semi-supervised learning by entropy minimization, in: Proceedings of
the 18th International Conference on Neural Information Processing Systems, NIPS’04, MIT Press,
Cambridge, MA, USA, 2004, p. 529–536.
[12] T. Zhang, X. Yue, Y. Li, H. Sun, TableLlama: Towards open large generalist models for
tables, in: K. Duh, H. Gomez, S. Bethard (Eds.), Proceedings of the 2024 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language
Technologies (Volume 1: Long Papers), Association for Computational Linguistics, Mexico City, Mexico,
2024, pp. 6024–6044. URL: https://aclanthology.org/2024.naacl-long.335/. doi:10.18653/v1/2024.
naacl- long.335.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , S. Han,
          <string-name>
            <surname>D</surname>
          </string-name>
          . Zhang,
          <article-title>Table meets llm: Can large language models understand structured table data? a benchmark and empirical study</article-title>
          ,
          <source>in: The 17th ACM International Conference on Web Search and Data Mining (WSDM '24)</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Sengamedu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          ,
          <article-title>Large language models (LLMs) on tabular data: Prediction, generation, and understanding - a survey</article-title>
          ,
          <source>Transactions on Machine Learning Research</source>
          (
          <year>2024</year>
          ). URL: https://openreview.net/forum?id=
          <fpage>IZnrCGF9WI</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          , W. Xu,
          <article-title>Tabular data understanding with llms: A survey of recent advances and challenges, 2025</article-title>
          . URL: https://arxiv.org/abs/2508.00217. arXiv:
          <volume>2508</volume>
          .
          <fpage>00217</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <year>Semtab 2019</year>
          :
          <article-title>Resources to benchmark tabular data to knowledge graph matching systems</article-title>
          , in: A.
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kirrane</surname>
          </string-name>
          , A.
          <string-name>
            <surname>-C. Ngonga Ngomo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gentile</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Haase</surname>
          </string-name>
          , M. Cochez (Eds.),
          <source>The Semantic Web</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>514</fpage>
          -
          <lpage>530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marzocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Mammotab: A giant and comprehensive dataset for semantic table interpretation</article-title>
          , in: SemTab@ISWC,
          <year>2022</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>33</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3320</volume>
          /paper3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Bhagavatula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Noraset</surname>
          </string-name>
          , D. Downey,
          <article-title>TabEL: Entity linking in web tables</article-title>
          , in: M.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
            , E. Simperl,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Strohmaier</surname>
            , M. d'Aquin,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Heflin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Thirunarayan</surname>
          </string-name>
          , S. Staab (Eds.),
          <source>The Semantic Web - ISWC 2015</source>
          , Springer International Publishing, Cham,
          <year>2015</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>441</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kertkeidkachorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ichise</surname>
          </string-name>
          , H. Takeda,
          <article-title>MTab4D: Semantic annotation of tabular data with dbpedia</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>2613</fpage>
          -
          <lpage>2637</lpage>
          . URL: https://journals.sagepub.com/doi/abs/10.3233/SW-223098. doi:
          <volume>10</volume>
          .3233/SW-223098. arXiv:https://journals.sagepub.com/doi/pdf/10.3233/SW-223098.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ruas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          , NILINKER:
          <article-title>Attention-based approach to nil entity linking</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>132</volume>
          (
          <year>2022</year>
          )
          <article-title>104137</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/ S1532046422001526. doi:https://doi.org/10.1016/j.jbi.
          <year>2022</year>
          .
          <volume>104137</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zgraggen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Satyanarayan</surname>
          </string-name>
          , T. Kraska, c. Demiralp,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hidalgo</surname>
          </string-name>
          ,
          <article-title>Sherlock: A deep learning approach to semantic data type detection</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, KDD '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>1500</fpage>
          -
          <lpage>1508</lpage>
          . URL: https: //doi.org/10.1145/3292500.3330993. doi:
          <volume>10</volume>
          .1145/3292500.3330993.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Bikim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A. A.</given-names>
            <surname>Ymele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jiomekong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oelen</surname>
          </string-name>
          , G. Rabby,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Auer</surname>
          </string-name>
          , Leveraging
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>