<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Generative Semantic Table Interpretation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viet-Phi Huynh</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yoan Chabot</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphaël Troncy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Orange</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>EURECOM</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Semantic Table Interpretation (STI), or Semantic Table Annotation, is the process of understanding the semantics of tabular data with reference information identified in knowledge graphs (KG). In this paper, we first present insights gained from the design and implementation of DAGOBAH SL, a top performing STI system in state-of-the-art benchmarks, and we discuss the unsolved challenges that need to be addressed to make STI more efective in practice. Pre-trained generative Large Language Models (LLMs) have demonstrated their powerful versatility in tackling a broad spectrum of natural language understanding tasks. We envision their potential for improving STI systems. We describe several appealing research ideas that could lay the foundation for future development of Generative Semantic Table Interpretation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Semantic Table Interpretation</kwd>
        <kwd>DAGOBAH</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Large Language Model</kwd>
        <kwd>Generative Information Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        scenarios, or the choice of algorithmic backbone [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. BM25, and a page-rank score quantifying the popularity
DAGOBAH SL [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], the winning system of SemTab of an entity. DAGOBAH SL currently supports various
over the last two years, has shown to perform eficiently snapshots of DBpedia and Wikidata knowledge graphs.
on well-formed relational tables (like the one in Figure 1)
thanks to incorporating a rich set of match-based heuris- An iterative cell entity/column type/column pairs
relatics to evaluate the relevance between table context and tion disambiguation that leverages mutual interaction
entity graph. However, it still struggles with enterprise between table elements to optimize the re-ranking of
tables, heterogeneous tables that can be found in the wild candidate entities/column type/relations. For example,
Web or tables that have a low encyclopedic coverage (e.g. the types of a column can guide the ranking of entity
GitTables [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). candidates in cells associated with that column, and vice
      </p>
      <p>
        This paper has two objectives. First, we present versa. The compatibility between table context and entity
the insights gained when designing and implementing graph is evaluated thoroughly using a comprehensive set
DAGOBAH SL, and we discuss the remaining scientific of matching rules including: i) semantic context plays
barriers that need to be tackled to obtain a reliable and more important role than literal context, ii) in the table,
generic annotation system. Second, in light of advance- a neighboring column that is highly connected to
tarments in pre-trained generative Large Language Models get column should have higher contextual weight, iii)
(LLMs) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with emergent abilities and flexibility in solv- entity representation is expressed by a multi-hop graph
ing a wide range of natural language understanding tasks, centered around the entity, allowing the exploitation of
we envision several future steps we plan to experiment richer context, iv) the semantic correlation between
colin leveraging LLM to fuel our table annotation system. umn header and cell entity’s description is exploited via
Specifically, we rely on fine-tuning or few-shot learning a BERT-based cross-encoder.
techniques to adapt the model to table structure, inject Furthermore, DAGOBAH SL is packaged in a RESTful
and update knowledge within LLMs and use knowledge API [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and user-friendly Web UI [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to facilitate the
to generate auto-regressively the annotation in textual usage of STI framework.
form.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Lessons Learned and Challenges</title>
    </sec>
    <sec id="sec-3">
      <title>2. DAGOBAH SL</title>
      <sec id="sec-3-1">
        <title>Tabular data is highly heterogeneous. Relational</title>
        <p>
          DAGOBAH SL [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ] (SL standing for Semantic table, as shown in Figure 1, is not the sole type of table.
Lookup) is a framework for interpreting relational tables There are other types with diferent topology of the
seautomatically via a two-stage pipeline: i) preprocessing is mantic connection between the cells, such as entity table1
performed to clean the table and extract metadata, such or matrix table2. In view of layout structure, tables can
as orientation, header, key column, column primitive have multiple headers, splitted cells, merged cells or cells
typing (e.g. units, URL, email, etc). Importantly, this step containing multiple values (e.g. a list). A more detailed
can automatically detect {cells, columns, column pairs} table taxonomy is provided in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
targets that require annotation; ii) annotation follows Arguably, an annotation algorithm designed for a
spea retrieve-then-rerank strategy where the retrieve cific table type or table layout may not be suitable nor
phase searches for relevant KG entity candidates for a eficient for other types or layouts. DAGOBAH SL is built
target cell mention via a keyword-based entity lookup, upon intuitions and assumptions derived from relational
and then the rerank phase sorts out the most relevant tables where i) each row corresponds to the description
entity for the cell (CEA task). The cell annotations are of a specific entity with columns providing its attributes;
subsequently leveraged to predict the column types ii) a semantic cell (i.e. a cell that contains a mention that
(CTA) and column-pair relations (CPA). The strength of can be disambiguated) is fully represented by an entity;
DAGOBAH SL lies in two points: iii) tables have either no header or only a single header.
However, the former intuition does not hold for matrix
table and the two later assumptions hinder DAGOBAH
SL from handling splitted cells and multivalued cells.
        </p>
        <p>A powerful keyword-based entity lookup service, built
on Elasticsearch, indexing every label/alias of every
entity in the alias table into an inverted index. It
is capable of covering diverse surface forms of cell
mentions such as acronyms, synonyms or typos through
alias table enrichment, which results in a high recall
within a few retrieved candidates. It incorporates three
ranking factors: two similarity scores between mention
and entity label/aliases calculated using edit distance and</p>
      </sec>
      <sec id="sec-3-2">
        <title>Knowledge Base (KB) Indexing and Exploitation.</title>
        <p>
          As a closed Information Extraction application [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], STI
systems rely on a knowledge base (e.g. a knowledge
graph, an ontology or a catalog of entity definitions) to
        </p>
        <sec id="sec-3-2-1">
          <title>1See Fig. 1b in webtables for an example of entity table</title>
          <p>2See Fig. 1c in webtables for an example of matrix table
constrain and guide the annotation process. A typical guage models, the embedding approach via contrastive
KB consists of millions of entities, requiring a proper learning based on dual encoders is capable of capturing
indexing strategy for eficient retrieval and exploitation. both table and entity semantics, hence, can ofer
supeThe usage of a KB in information extraction tasks may rior disambiguation capability. However, to the best of
imply two aspects: i) entity attributes to be indexed and our knowledge, there has been no work on investigating
exploited: entity can be characterized by labels/aliases, the potential of dual encoders specifically on structured
description or contextual information within a graph. tabular data. Moreover, the construction of negative sets
These attributes are leveraged partly or fully to retrieve for contrastive learning is a non trivial task that impacts
and disambiguate relevant entity candidates, ii) skew- the quality of learned embeddings. In addition, likewise
ness of entity/relation distribution: entities and relations the target detection module, the candidate generation
vary significantly in term of popularity and expressive- exhibits the risk of propagating errors to the annotation
ness, which may challenge the consistency in model step. Hence, if the gold entity is not part of the retrieved
performance. Heuristic-based annotation system, like candidates, the corresponding table element will never
DAGOBAH SL, relying on inverted indexes associated be correctly annotated.
with flexible matching mechanism (e.g. exact matching
or fuzzy matching) can work efectively and eficiently on
tables that exhibit a high degree of literal similarity with 4. Towards Generative Semantic
entity’s attributes. On the other hand, representation Table Interpretation
learning-based models learn the underlying semantics of
table and entity through embeddings, making them more
robust to noise and ambiguous/incomplete context.
However, their performance depends strongly on the quality
of training data and can difer greatly between frequently
occurring entities/relations and rare ones.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Pre-trained generative LLMs (or foundational language</title>
          <p>models) have revolutionized numerous natural language
understanding tasks, including information extraction
(IE) tasks such as named entity recognition, entity
linking, relation extraction [19]. The present
state-of-theart IE models leverage the flexibility of decoder-only or
Error Propagation from Detection to Annotation. encoder-decoder LLM architectures for structured
predicLike many other STI systems [15, 16, 17, 18], DAGOBAH tion, allowing for the joint handling of diferent IE tasks
SL performs target detection (via preprocessing) and tar- in an end-to-end and unified manner [ 20, 21, 22, 23, 24].
get annotation independently, hence sufers from error While this approach is originally applied to unstructured
accumulation where error caused by the first stage will textual data, we argue it is still beneficial for structured
propagate to the later stage. Cases in which the system data. In line with [25, 26], we believe that LLMs will be
fails to distinguish cells containing literal mentions and more and more adopted to tackle this task. This paper
incells containing semantic mentions, will lead to missing troduces our vision towards generative close Information
or incorrect annotations. Moreover, most target detection Extraction tailored for tabular data, namely Generative
techniques are heuristic-based (e.g cells with string data Semantic Table Interpretation (GenSTI, Figure 2). As a
type are considered as CEA targets), or locally contex- stepping stone, we will aim to evaluate the potential of
tualized (e.g. using only the single column to determine LLMs in tackling challenges discussed in Section 3. The
whether the inner cells are linkable to KG entities). The desiderata are as follows:
efectiveness of these techniques in various table
scenarios remains uncertain.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Ability to handle simultaneously various table</title>
        <p>types and layouts. By framing the STI tasks within
Candidate Generation is challenging. Entity candi- a unified seq-2-seq framework [ 27], generative LLMs
date generation is critical for efective STI systems that can be prompted with diferent table types/layouts and
rely on the retrieve-then-rerank paradigm. Its goal is to can jointly solve CEA, CTA, CPA tasks. This framework
deal with huge number of entities and narrow down the reveals the common multi-task learning that has been
search space. DAGOBAH SL employs dictionary lookup successful in NLP. In particular, the model is fine-tuned
that computes the literal similarity between the table with a mix of table sets to perform multiple STI tasks at
mention (possibly with context) and entity’s attributes once. Accordingly, it can facilitate knowledge transfer
(labels, aliases or descriptions). While this approach is between tasks and table structures, leading to the
acquiappealing for handling various surface forms of mentions sition of more robust and generalizable representations.
(e.g. acronym, synonym, typos), it lacks semantic under- Inspired by generative information extractors for text, we
standing which can amplify the ambiguity within candi- investigate two sequence modeling strategies: (i) serialize
date sets and make the subsequent candidate re-ranking table input and annotation outputs as plain text
(text-2phase more challenging. With the rise of pre-trained lan- text) and solve with natural language LLMs (NL-LLMs)
(Figure 2-left) [20, 21, 23]. This flattening method ignores,
however, the structural information embedded in table concept has successfully inspired state-of-the-art models
and output, due to the discrepancy between the datasets in related applications such as entity linking [22, 30] or
used to finetune LLMs for STI tasks and the natural lan- information extraction [20, 23]. In the context of GenSTI,
guage corpora that LLMs are pretrained on. To alleviate there are two aspects related to arbitrary generation of
this issue, we could rely on [24, 28] to (ii) cast STI tasks as generative LLMs that need to be controlled to ensure a
recode generation (code-2-code) and solve with Code-LLMs liable end-to-end GenSTI: (i) Output structure consistency:
(Figure 2-right). Converting structured data into code is only generate valid tags and adhere to the predefined
easier and provides more informative representation than schema. For example, in Figure 2, regarding CEA task,
transforming it into free-form text. Hence, this technique NL-LLM has to follow the template: [CEA] | cell_mention
narrows the gap between pre-training and fine-tuning (entity_id) | ... where cell_mention is copied from the
in Code-LLMs. Interestingly, by programming a table table and is linked to the corresponding entity_id in the
as a two-dimensional list which is a common data type KB; (ii) Semantic consistency: entity_id must be a valid
enin code and is expected to occur frequently during the tity existing in the KB, [CPA] must generate a relation_id
pre-training, the model could better capture the table’s rather than an entity_id. A solution to both challenges
topology (i.e. facilitate the identification of the ℎ row, is to endow LLMs with a decoding scheme constrained
ℎ column or the cell at coordinates [, ]), compared by prefix-trees that forces the model to generate only
to NL-LLMs. [24, 28] have demonstrated the appealing legal tokens at each decoding step [22, 23]. Interestingly,
few-shot performance of Code-LLMs (i.e. Codex[29]) in even without such constraint decoding, we observe that
structured prediction task that involves no code at all [24] still reports good few-shot performance for IE tasks,
such as information extraction or argument graph gen- suggesting that, to some extent, LLMs, especially
Codeeration. We argue that using Code-LLMs to tackle table- LLMs, are capable of capturing the internal
representarelated downstream tasks could be a promising future tions of the task, and generate relevant outputs without
research direction. guidelines. [31] made a similar observation when
training a LLM to play Othello game by feeding it a naive
End-to-End Semantic Table Annotation. Instead of transcript recording interleaving moves of two players
performing the target detection and target annotation as without adding any knowledge of the game rules. The
separate stages, end-to-end STI takes into account the mu- model has efectively learned meaningful latent
representual dependence and cooperation between the two, which tations, enabling it to uncover the game and make legal
could lead to significant performance improvement. This disc moves on the board.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Eficient KB indexing and exploitation. Recently,</title>
        <p>Diferentiable Search Index (DSI) [32, 33] has emerged as
a novel generative retrieval, deviating from the common
retrieve-then-rerank paradigm. One of our objectives
is to investigate whether DSI can serve as a viable
solution for eficient KB indexing and exploitation within
the GenSTI system. Specifically, entities and its graphs
(e.g. [34]) are directly encoded (or stored) into LLM’s
parameters (aka. indexing). The model then leverages
injected knowledge to predict autoregressively the entity
identifier through softmax calculations over vocabulary’s
tokens. This indexing mechanism helps to relax the
Candidate Generation phase which consequently saves a
non-negligible computation cost and eliminate the need
of external space to store entity embeddings and the
need of meaningful negative samples for the learning, as
required by dual encoders-based candidate generation.
While working efectively on small KB, the behavior of
DSI when scaling to large KB (e.g. Wikidata with ∼ 100
millions entities) remains an open research challenge
[35], necessitating three key elements to be clarified: (i)
in the indexing phase, how many entities the model can
memorize [36]; (ii) can the model eficiently learn entities
and propagate its knowledge to support the generation
[37]; and (iii) the robustness to entity/relation skewness.
Generative Information Extraction [23] has shown to be
more robust than strong baselines for long-tail entities,
but is still far from being good. [38] proposes fine-tuning
the extractor on a more balanced dataset that remarkably
improves the macro performance.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>This paper has first reflected on the work carried out
when designing DAGOBAH-SL over the last few years
and the lessons we learned in implementing a top
performing system for STI. This work is also the first step
in our journey towards Generative Semantic Table
Interpretation (GenSTI). We plan to conduct intensive
experiments to uncover the challenges and gain a deeper
understanding of the contributions of LLMs to the STI
topic, as discussed in Sections 3 and 4.
Arevalo, Information extraction meets the semantic we have the best of both worlds with large language
web: a survey, Semantic Web 11 (2020) 255–335. models?, arXiv preprint arXiv:2304.13010, 2023.
[15] M. Cremaschi, F. De Paoli, A. Rula, B. Spahiu, A [27] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
fully automated approach to a complete semantic M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the
table interpretation, Future Generation Computer limits of transfer learning with a unified text-to-text
Systems 112 (2020) 478–500. transformer, Journal of Machine Learning Research
[16] P. Nguyen, I. Yamada, N. Kertkeidkachorn, R. Ichise, 21 (2020) 5485–5551.</p>
      <p>H. Takeda, Demonstration of MTab: Tabular Data [28] A. Madaan, S. Zhou, U. Alon, Y. Yang, G. Neubig,
Annotation with Knowledge Graphs, in: ISWC Language Models of Code are Few-Shot
Common(Posters/Demos/Industry Track), 2021. sense Learners, in: International Conference on
[17] R. Shigapov, P. Zumstein, J. Kamlah, L. Oberlän- Empirical Methods in Natural Language Processing
der, J. Mechnich, I. Schumm, bbw: Matching csv (EMNLP), Abu Dhabi, UAE, 2022.
to wikidata via meta-lookup, in: Semantic Web [29] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O.
Challenge on Tabular Data to Knowledge Graph Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph,
Matching (SemTab), 2020. G. Brockman, et al., Evaluating large language
mod[18] N. Abdelmageed, S. Schindler, JenTab: Matching els trained on code, arXiv preprint arXiv:2107.03374
Tabular Data to Knowledge Graphs, in: Seman- (2021).
tic Web Challenge on Tabular Data to Knowledge [30] N. Kolitsas, O.-E. Ganea, T. Hofmann,
End-toGraph Matching (SemTab), 2020, pp. 40–49. End Neural Entity Linking, in: 22nd
Confer[19] H. Ye, N. Zhang, H. Chen, H. Chen, Generative ence on Computational Natural Language Learning
knowledge graph construction: A review, in: In- (CoNLL), 2018, pp. 519–529.
ternational Conference on Empirical Methods in [31] K. Li, A. K. Hopkins, D. Bau, F. Viégas, H. Pfister,
Natural Language Processing (EMNLP), Associa- M. Wattenberg, Emergent World Representations:
tion for Computational Linguistics, 2022. Exploring a Sequence Model Trained on a Synthetic
[20] Y. Lu, Q. Liu, D. Dai, X. Xiao, H. Lin, X. Han, L. Sun, Task, in: 11th International Conference on Learning
H. Wu, Unified Structure Generation for Universal Representations (ICLR), 2023.</p>
      <p>Information Extraction, in: 60th Annual Meeting [32] Y. Tay, V. Tran, M. Dehghani, J. Ni, D. Bahri,
of the Association for Computational Linguistics H. Mehta, Z. Qin, K. Hui, Z. Zhao, J. Gupta, et al.,
(ACL), 2022, pp. 5755–5772. Transformer memory as a diferentiable search
in[21] G. Paolini, B. Athiwaratkun, J. Krone, J. Ma, dex, Advances in Neural Information Processing
A. Achille, R. Anubhai, C. N. dos Santos, B. Xiang, Systems 35 (2022) 21831–21843.</p>
      <p>S. Soatto, Structured Prediction as Translation be- [33] M. Bevilacqua, G. Ottaviano, P. Lewis, S. Yih,
tween Augmented Natural Languages, in: 9th Inter- S. Riedel, F. Petroni, Autoregressive search engines:
national Conference on Learning Representations Generating substrings as document identifiers,
Ad(ICLR), 2021. vances in Neural Information Processing Systems
[22] N. De Cao, G. Izacard, S. Riedel, F. Petroni, Au- 35 (2022) 31668–31683.</p>
      <p>toregressive entity retrieval, in: 9th International [34] F. Moiseev, Z. Dong, E. Alfonseca, M. Jaggi, SKILL:
Conference on Learning Representations (ICLR), Structured Knowledge Infusion for Large Language
2021. Models, in: Conference of the North American
[23] M. Josifoski, N. De Cao, M. Peyrard, F. Petroni, Chapter of the Association for Computational
LinR. West, GenIE: Generative information extrac- guistics (NAACL), 2022, pp. 1581–1588.
tion, in: Conference of the North American Chapter [35] R. Pradeep, K. Hui, J. Gupta, A. D. Lelkes, H. Zhuang,
of the Association for Computational Linguistics J. Lin, D. Metzler, V. Q. Tran, How Does
Genera(NAACL), Association for Computational Linguis- tive Retrieval Scale to Millions of Passages?, in:
tics, Seattle, United States, 2022. Generative Information Retrieval @ SIGIR, 2023.
[24] P. Li, T. Sun, Q. Tang, H. Yan, Y. Wu, X. Huang, [36] N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer,
X. Qiu, CodeIE: Large Code Generation Models are C. Zhang, Quantifying Memorization Across
NeuBetter Few-Shot Information Extractors, in: 61st ral Language Models, in: 11Th International
ConAnnual Meeting of the Association for Computa- ference on Learning Representations (ICLR), 2023.
tional Linguistics (ACL), 2023. [37] Y. Onoe, M. J. Zhang, S. Padmanabhan, G. Durrett,
[25] N. Tang, J. Fan, F. Li, J. Tu, X. Du, G. Li, S. Madden, E. Choi, Can LMs Learn New Entities from
DescripM. Ouzzani, RPT: relational pre-trained transformer tions? Challenges in Propagating Injected
Knowlis almost all you need towards democratizing data edge, in: 61st Annual Meeting of the Association
preparation, The VLDB Endowment (2021). for Computational Linguistics (ACL), 2023.
[26] W.-C. Tan, Unstructured and structured data: Can [38] M. Josifoski, M. Sakota, M. Peyrard, R. West,
Exploiting asymmetry for synthetic training data
generation: Synthie and the case of information extraction,
arXiv preprint arXiv:2303.04132, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          ,
          <article-title>From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>76</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Deuzé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>A Framework for Automatically Interpreting Tabular Data at Orange</article-title>
          , in: ISWC (Posters/Demos/Industry Track),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Tough tables: Carefully evaluating entity linking for tabular data</article-title>
          ,
          <source>in: 9th International Semantic Web Conference (ISWC)</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , K. Srinivas,
          <string-name>
            <surname>SemTab</surname>
          </string-name>
          <year>2019</year>
          :
          <article-title>Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems</article-title>
          ,
          <source>in: European Semantic Web Conference (ESWC)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <source>Results of SemTab</source>
          <year>2020</year>
          ,
          <article-title>in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , et al.,
          <source>Results of SemTab</source>
          <year>2021</year>
          ,
          <article-title>in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <source>Results of SemTab</source>
          <year>2022</year>
          ,
          <article-title>in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gathani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gale</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dillig</surname>
          </string-name>
          , P. Groth, Ç. Demiralp,
          <source>Making Table Understanding Work in Practice, arXiv preprint arXiv:2109.05173</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>From Heuristics to Language Models: A Journey Through the Universe of Semantic Table Interpretation with DAGOBAH, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab</article-title>
          ),
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Deuzé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>DAGOBAH: Table and Graph Contexts for Eficient Semantic Annotation of Tabular Data, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab</article-title>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          , Ç. Demiralp,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <article-title>Gittables: A large-scale corpus of relational tables</article-title>
          ,
          <source>Proceedings of the ACM on Management of Data</source>
          <volume>1</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yogatama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <source>Emergent Abilities of Large Language Models, Transactions on Machine Learning Research</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sarthou-Camy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jourdain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Deuzé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          , DAGOBAH UI:
          <article-title>a new hope for semantic table interpretation</article-title>
          ,
          <source>in: European Semantic Web Conference (ESWC)</source>
          ,
          <source>Satellite Events</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Martinez-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , I. Lopez-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>