<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Framework for the Generation of Training Examples from Tabular Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Flavien Bussotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Papotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donatello Santoro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enzo Veltri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Università degli Studi della Basilicata (UNIBAS)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Potenza</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>Biot</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>32</volume>
      <fpage>23</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Tabular data is becoming increasingly important in Tabular Natural Language Inference (TNLI), where the goal is to assess if a table supports or refutes a given hypothesis expressed in NL text. A major issue in TNLI is the lack of such training data. Existing approaches are based on manual annotation of new training data or simple augmentation techniques that lack data variety and complexity. We present a system, Tenet, that automatically generates new training examples for TNLI applications on diferent domains. Our framework exploits SQL queries to introduce new data variety through evidence-queries that identify new cell values over data exploiting diferent data patterns, and complexity using semanticqueries that describe the diferent ways such data can be identified through SQL queries. Description from the semantic-queries are used to verbalize the new cell values from the evidence-queries using a Pretrained Language Model (PLM). The verbalized sentence and the cell values can be used as a new training example in the target TNLI application. We show how Tenet generates human-like examples that are comparable with manually-written examples.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Tabular Natural Language Inference (TNLI)</kwd>
        <kwd>Natural Language Processing (NLP) for Databases</kwd>
        <kwd>Text Generation</kwd>
        <kwd>Query Generation</kwd>
        <kwd>Data Augmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A large class on natural language inference (NLI) problems aims at classifying a given hypothesis,
such as a textual statement, as true/false/unknown given some evidence. Recently it has
emerged a new class of applications that focus on inference with structured data as evidence,
i.e., tabular natural language inference (TNLI). Example applications are table understanding
and computational fact checking, where systems label text claims according to input structured
data [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7">1, 2, 3, 4, 5, 6, 7</xref>
        ].
      </p>
      <p>
        Most of the solutions in TNLI are supervised, where manually defined datasets for TNLI have
been proposed, such as Feverous [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], TabFact [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and Infotabs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, these datasets:
) cover only some generic topics from Wikipedia tables. For example, if there is a need for
      </p>
      <p>Name
t1 Barack
t2 Donald
t3 Nancy
Training data
Claim c: “Donald is married
to Michelle”
Label l: Refutes
Evidence cells E:
t2.Name: “Donald”
t2.Spouse: “Melania”</p>
      <p>Party Spouse
Dem Michelle
Rep Melania
Dem Paul
Claim c: “Barack and Nancy
are in the same party”
Label l: Supports
Evidence cells E:
t1.Name: “Barack”
t3.Name: “Nancy”
t1.Party: “Dem”,
t3.Party: “Dem”</p>
      <p>
        TNLI
application
Test data
fact-checking claims for emerging domains such as Covid-19, a new annotated corpus must
be crafted by manually writing examples using the tabular reports published by governments;
) they are not comparable in scale and variety to those available for textual NLI [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For
example, about 80% of the examples in Totto [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have sentences describing the data with text
that does not contain mathematical expressions, such as max, min, and count, or comparison
across values; ) They contain bias and errors that may lead to incorrect learning in the target
models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        The problem of the lack of labeled examples has been treated in the literature for NLI, but it
has not been tackled yet for TNLI. If some examples are given in a warm start setting, existing
NLI augmentation methods can be used in the TNLI setting: the text part of the example can
be rewritten with augmentation w.r.t. the (fixed) data [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. While these methods increase the
number of examples, they do not generate a new corpus that raises the variety and complexity
of the examples w.r.t. the structured data, ultimately with a minor impact on the accuracy of
the TNLI tasks. Moreover, in a cold start setting, where training data is unavailable, there is no
proposal yet on creating annotated examples for TNLI starting only from the tables.
      </p>
      <p>
        User-provided tables can be exploited to generate ad-hoc training data for the application at
hand. Our system, Tenet1 (TExtual traiNing Examples from daTa) generates large annotated
corpora of training examples that are complex and rich in terms of data patterns, linguistic
diversity, and reasoning complexity [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Figure 1 shows an overview of our architecture. The
system generates training data for the target TNLI application, given only a table as input. Once
generated, the examples are used to train the inference model validated on test data.
      </p>
      <p>Tenet is built around three modules that cover the three main elements of a complete and
annotated TNLI example.</p>
      <p>
        Data Evidence. A key intuition in our approach is that tabular data already contains rich
information for new examples. Content changes across datasets, and every relation has its own
active domain. Moreover, data relationships across entities and their properties are arranged
diferently across datasets. To identify data evidence to create a variety of examples, we propose
alternative approaches to select sets of cells from the given table, including a query generation
algorithm for the semi-supervised case. A query returns a set of evidence, such as Donald and
1Code and datasets available at https://github.com/dbunibas/tenet
Michelle in the first example in Figure 1, each partially describing an example.
Textual Hypothesis. Once the data is identified, we obtain the textual statement (or hypothesis)
for the annotated example. Given a set of cells, we generate queries that identify such data
evidence over the input table. Every query characterizes the data with diferent conditions (e.g.,
selections with constants) or constructs (e.g., aggregate). From the query and the evidence,
we create a text with a prompting method that exploits the human-like generation abilities
of large pre-trained language models (PLMs), such as GPT-3 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Our prompting leads to a
variety of factual hypotheses, such as Barack and Nancy are in the same party in the second
example in Figure 1, while maximizing the coverage of the provided evidence and minimizing
hallucination.
      </p>
      <p>Inference Label. Finally, we need the corresponding label for every example. While Supports
examples are obtained naturally, as the hypothesis reflects the evidence from the table, for
Refutes examples we introduce generic methods built around the idea of injecting errors in the
data evidence. Once the data is modified, the process for text generation is applied to the “dirty”
data to obtain hypotheses that are refuted w.r.t. the original “clean” data.</p>
      <p>In the next Section we describe the main components, then we present some experimental
results using Tenet.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the Solution</title>
      <p>Problem Formulation. Let  be a tuple in the instance  for a relational schema  and  an
attribute in . We refer with cell value to the value of tuple  in attribute  and with table to
the instance  for simplicity2. A textual hypothesis is a sentence in natural language.</p>
      <p>
        A Tabular Natural Language Inference (TNLI) application takes as input a pair (table ; textual
hypothesis ℎ) and outputs if ℎ is supported or refuted by . Data evidence is a non-empty subset
of cell values from  that varies from a small fraction in some settings [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to the entire relation
in others [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]3. Solutions for the TNLI task rely on supervised models trained with annotated
examples - our goal is to reduce the efort in creating such training data.
      </p>
      <p>
        We consider solving the example generation problem for a TNLI application  where we are
given the label space  for , a corpus of tables , and (optionally) a set of training examples
 for . Every example is composed by a quadruple (ℎ, , , ) with textual hypothesis ℎ, label
 ∈ , set of data evidence cells  contained in one relational table  in the corpus . We assume
access to a text-to-text pre-trained language model (PLM)  . We do not assume access to the
TNLI application  at hand. In this work, we focus on  with Supports and Refutes labels only,
as those are the most popular in TNLI corpora, e.g., 97% of the examples [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>In the warm start version of the problem, training examples for  are available and used by
Tenet. In the cold start version of the problem, we drop the assumption on the availability of
the examples  . In this case, we aim at creating new training examples  for  just by using
the tables in .
2Some TNLI corpora contain both relational and entity tables, i.e., relational tables transposed with a single row.
Tenet supports both, but we focus the presentation on relational ones for clarity.
3Our proposal is independent of the size of the data evidence and its retrieval.</p>
      <p>Process and Challenges. Tenet is designed around three main steps, as depicted in Figure 2.
Given a relation table  ∈ , it first gathers the evidence (set of cells)  to produce a Supports
example. Second, to enable the generation of a Refutes example, it injects errors in table 
to create its noisy version and derive data evidence ′. Third, a textual claim (hypothesis)
ℎ is generated for every data evidence . The quadruple (data evidence , textual claim ℎ,
label Supports/Refutes, table ) is a complete example for training data  for the target TNLI
application . However, the three steps come with their own challenges.</p>
      <p>Data Evidence. Training examples  must capture the variety of relationships in a table, such
as those relating cell values in the same tuple or attribute. A hypothesis is defined over a group
of cell values, such as the data evidence 1 highlighted in bold in Table 1 for tuples 1 and 2, i.e.,
names of two people with diferent age values. Hypothesis “Mike is older than Anne” captures
the relationship across these four cell values. Data evidence with two cell values, e.g., Name
for tuple 1 and Age from tuple 2 can lead to a hypothesis, e.g., “There is a person called Mike
and a person 22 years old”, but such sentence does not capture relationships across tuples nor
attributes. In general, for efective training, the data evidence covered by the examples should
cover the variety of patterns that can be identified in a relation.</p>
      <p>One approach for the data evidence generation is to pick diferent sets of cell values at
random. While this simple approach is efective and enables an unsupervised solution, there
are meaningful patterns, such as 1, that may be covered rarely by accident. We call this
approach cold-start. One approach to improve this task and obtain meaningful patterns with
fewer generated examples is to infer data patterns from human-provided examples  , when
those are available. For example, in  , we identify a query  (named evidence query or simply
e-query) that returns the cell values in its data evidence as one result row. We then execute
the e-query over the relation. The e-query leads to more sets of cells (one per result row) that
enable the generation of examples following the same data pattern, for example involving 3
and 4. We call this approach warm-start.</p>
      <p>Warm Start. While the cold-start is easy to implement, the generation of the e-query in the
warm start is not trivial. Given the set of cell values  and table  as input, we want to identify
the query  that outputs such  among its results. Executing such query over the original table
, we obtain more data evidence 1, . . . ,  that follow the original data pattern in .</p>
      <p>Consider again the example in Table 1 with cell values in bold in the first two rows ( 1 and
2) as seed data evidence . Given such input, we want an algorithm producing a query that
returns all pairs of distinct names with their diferent ages, such as
q:</p>
      <p>SELECT c1.Name, c2.Name as Name2, c1.Age, c2.Age as Age2
FROM people c1, people c2</p>
      <p>WHERE c1.Age &gt; c2.Age AND c1.Name &lt;&gt; c2.Name</p>
      <p>
        The e-query generation is based on an evidence graph [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] where each node in the graph
corresponds to a cell in the evidence  and a (direct) edge across two nodes represents the
relationship between their values (equality, diference, comparison). Then visiting such graph
we construct the e-query [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Hypothesis. Given a table  and an evidence set  ∈ , the latter can be described with a
textual sentence. However, the way a set of cells is converted to a sentence has a huge impact on
the variety and the reasoning complexity of the training data. Indeed, given a set of cells from a
table, many alternatives exist for describing it in natural language. Consider again data evidence
1 in the example. The values in bold can be correctly described with “Mike is older than Anne."
or “There are two persons with age higher than 19.". The more alternative sentences for a
given data evidence are created, the better the training set for the target model. Unfortunately,
most eforts for automatic data-to-text are focused on surface, or look-up, sentences [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], such
as “Mike is 47 years old and Anne 22.". While these kinds of sentences are fundamental, we
aim to maximize the variety in the training data. For this goal, we generate various queries
that return evidence  given . We call such queries semantic queries or simply s-queries. Such
s-queries represent diferent ways of semantically describing the data. PLMs are trained over
huge amounts of textual data, which gives them proficiency in writing, and source code, which
gives them the ability to write code [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] or to be instructed with functions. We then propose
prompting methods for PLMs to generate alternative sentences to describe the evidence set
according to the semantics of the queries.
      </p>
      <p>
        We identify several types of s-queries: 1) the surface s-queries, i.e. queries that select cells by
using only constant values; 2) comparison s-queries, i.e. queries that compare two or more rows
by at least one attribute; 3) filter s-queries, i.e., queries that select cells according to a condition;
4) aggregate s-queries, i.e., queries that select cells that can be used with an aggregative function
(count, sum, avg, min, max); 5) filter-aggregate s-queries, i.e., queries that select cells for an
aggregation over a group of cells identified by a selection on some conditions. Such s-queries
are automatically detected by Tenet [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        For each s-query, we define a task that describes the text generation function that we want to
use. Such generation functions are defined by us with the prompts for the PLM. The task uses
the function from the s-query and the evidence. The text generation functions mapped to the
relative s-queries are reported in Table 2 with examples of the text they generate. Due to space
limits, the examples of the used prompts with ChatGPT are reported in the full paper [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Label. By construction, the generated data evidence is coherent with the semantics expressed
in the input table. An evidence set leads to an example with a Supports label w.r.t. the data
in the table. The methods above produce Support examples. However, applications also need
examples with a Refutes label, i.e., textual claims not supported by the input table. We tackle
this problem with an error injection approach, perturbing the input table to break the original
relationships across cell values. This new version of the table is then used to identify again
an evidence set ′, which leads to a textual hypothesis that does not reflect the semantics of
the original (clean) table. We generate a Refutes example for every Supports one. Given some
evidence  from the original input table , we inject noise in a copy ′, so that we derive a
new evidence ′ using the same e-query used for the Support example. A hypothesis ℎ′ is then
derived from ′ using the same proposed approach above. Hypothesis ℎ′ is a Supports sentence
for ′, with evidence ′, but it is also a Refutes sentence w.r.t. the original (clean) table  and
evidence . The new example is the tuple with the label Refutes, , ℎ′ and evidence .</p>
      <p>To inject errors, first we create a copy ′ of the table and manipulate it to inject noise. We
shufle in ′ the values for 50% of the attributes involved in . This step breaks the original
relationships across cell values at the tuple level. We then either introduce a new tuple in ′ or
remove from ′ one tuple at random. This step changes the cardinality of the tuples, which is
key for s-queries involving aggregates, and introduces out-of-domain values. The generation of
the new values depends on the type. For categorical attributes, we use a PLM. For numerical
attributes, we generate lower/higher values than the min/max value for every active domain
these new values break the original min/max/avg property for the updated attribute. Finally,
we remove from ′ any row that appears in .</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Conclusions</title>
      <p>
        We organize our evaluation around two main questions. First, does Tenet automatically
generate training data of quality comparable to those manually created by human annotators?
Second, what are the costs of Tenet, in terms of execution time and budget for external APIs?
Train Datasets. In this paper we present results for a dataset from TNLI literature: Feverous [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Results for other datasets are presented in the full paper [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Feverous comes with one subset
(split) of examples for training and one for test. Every annotated example consists of a table,
a textual hypothesis, data evidence (a subset of the table), and a Supports/Refutes label. All
examples are manually written by humans.
      </p>
      <p>As a baseline, we extend the original training dataset with an augmentation for text [17].
Given an example, we produce seven new versions of it by changing the textual hypothesis using
back translation, wordnet, word2vec, synonyms, random word swap, random word deletion,
random word insertion (Aug).</p>
      <p>We also produce training dataset for our techniques. Given a corpus of tables, we always
generate the Tenet Cold (TenetC) dataset. Since Feverous has annotations for data evidence,
we can also generate the dataset for Tenet Warm (TenetW). Hypotheses are created with
squeries and negative examples are generated according the presented technique. For each given
table, we produce three Supports and three Refutes hypotheses, therefore all Tenet datasets
are balanced in terms of labels. For every table, Tenet creates one example with a surface
query and two with other s-queries among the other four types (Comparison, Filter, Aggregate,
FilterAggregate).</p>
      <p>Inference Models for TNLI. Our goal is to show the quality of automatically generated
training data. We therefore do not propose new TNLI models and adopt the ones in the original
papers. For Feverous the inference predictor is a RoBERTa (large) encoder fine-tuned for
classification on multiple NLI datasets [18].</p>
      <p>
        Pre-trained Language Models. For the hypothesis generation and the error injection, we
assume that a pre-trained language model (PLM) is available. We tested several PLMs and use
ChatGPT as default. We report a comparison of T5, fine-tuned on ToTTo, and ChatGPT in the
full paper [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Metrics. We report accuracy for the TNLI task: how many Supports/Refutes classification
decisions are correct over the total number of tests. We also report execution times and cost
(for external APIs) in running the models.</p>
      <p>Quality of Training Examples We start by comparing results with training data with examples
generated from the same sets of tables. The tables are taken from Feverous dataset. As
state of the art solutions, we directly use the manually written examples (Human), eventually
augmenting them (Human+Aug). For Tenet methods, we take the corresponding tables of the
original training data and generate examples with TenetC and TenetW. For every experiment,
we increase the number of input tables, collect or generate the examples, and run the inference
model to compute the accuracy on the same test data.</p>
      <p>The TNLI accuracy results in Figure 3a for the Feverous test data show the impact of
examples, which is a proxy for their quality. Up to 700 input tables, both Tenet-generated
datasets outperform the examples written by humans, with more than 20 absolute points in
cases with less than 150 tables. Even with only 200 tables available for the training step, both
Tenet example generation methods achieve an accuracy over 0.8 on the (manually crafted)
original test data. If we augment the Human examples with those generated by TenetW, we
observe accuracy at 0.8 even with only 150 tables in the training corpus. Tenet benefits by the
fact that for every input table, it extracts one data evidence and generates three Supports and
three Refutes examples, while the humans wrote one example per table.</p>
      <p>Figure 3b reports the results for the training done with a combination of Human and Tenet
examples for Feverous. We report the impact of diferent numbers of generated examples.
(a)
(b)
Increasing the size of the generated training data increases the accuracy on the test set. The
benefit of Tenet examples is higher with smaller numbers of human training examples.
Execution Time and Cost. We measure Tenet execution time to generate training data. We
create five samples of 200 tables from Feverous and execute the full pipeline with Cold and
Warm approaches. On average the Cold takes 2.019 seconds while Warm takes 2.212. The most
expensive step in our approach (97% of the execution time) is due to text generation. This
heavily depends on the ChatGPT availability and it takes on average from 1.5 to 2.2 seconds per
request.
and learning coding, International Journal of Educational Technology in Higher Education
18 (2021). doi:10.1186/s41239-021-00246-1.
[17] J. Eisenschlos, S. Krichene, T. Müller, Understanding tables with intermediate pre-training,
in: EMNLP, 2020, pp. 281–296.
[18] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
arXiv:1907.11692 (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karagiannis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Trummer</surname>
          </string-name>
          ,
          <article-title>Scrutinizer: A mixed-initiative approach to large-scale, data-driven claim verification</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>2508</fpage>
          -
          <lpage>2521</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Computational fact checking through query perturbations</article-title>
          ,
          <source>ACM Trans. Database Syst</source>
          .
          <volume>42</volume>
          (
          <year>2017</year>
          ) 4:
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          :
          <fpage>41</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P. A.</given-names>
            <surname>Corney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <article-title>Automated fact-checking for assisting human fact-checkers, in: IJCAI, ijcai</article-title>
          .org,
          <year>2021</year>
          , pp.
          <fpage>4551</fpage>
          -
          <lpage>4558</lpage>
          . URL: https://doi.org/10.24963/ijcai.
          <year>2021</year>
          /619. doi:
          <volume>10</volume>
          . 24963/ijcai.
          <year>2021</year>
          /619.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Herzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Nowak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piccinno</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Eisenschlos,</surname>
          </string-name>
          <article-title>TaPas: Weakly supervised table parsing via pre-training</article-title>
          , in: ACL, Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>4320</fpage>
          -
          <lpage>4333</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>398</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . acl-main.
          <volume>398</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nokhiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Srikumar</surname>
          </string-name>
          ,
          <string-name>
            <surname>INFOTABS:</surname>
          </string-name>
          <article-title>Inference on tables as semistructured data</article-title>
          , in: ACL,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>2309</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          , G. Badaro,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , Pythia:
          <article-title>Unsupervised generation of ambiguous textual claims from relational data</article-title>
          ,
          <year>2022</year>
          , p.
          <fpage>2409</fpage>
          -
          <lpage>2412</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3514221.3520164.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          , G. Badaro,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <article-title>Data ambiguity profiling for the generation of training examples</article-title>
          ,
          <source>in: 39th IEEE International Conference on Data Engineering, ICDE</source>
          <year>2023</year>
          , Anaheim, CA, USA, April 3-
          <issue>7</issue>
          ,
          <year>2023</year>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>450</fpage>
          -
          <lpage>463</lpage>
          . URL: https://doi.org/10. 1109/ICDE55515.
          <year>2023</year>
          .
          <volume>00041</volume>
          . doi:
          <volume>10</volume>
          .1109/ICDE55515.
          <year>2023</year>
          .
          <volume>00041</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          , FEVEROUS:
          <article-title>Fact extraction and VERification over unstructured and structured information</article-title>
          ,
          <source>in: NeurIPS (Datasets and Benchmarks)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Tabfact: A large-scale dataset for table-based fact verification</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Know what you don't know: Unanswerable questions for SQuAD, in: ACL, Association for Computational Linguistics</article-title>
          , Melbourne, Australia,
          <year>2018</year>
          , pp.
          <fpage>784</fpage>
          -
          <lpage>789</lpage>
          . URL: https://aclanthology.org/P18-2124. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P18</fpage>
          -2124.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Faruqui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhingra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
          </string-name>
          ,
          <article-title>Totto: A controlled table-to-text generation dataset</article-title>
          , in: EMNLP,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>1173</fpage>
          -
          <lpage>1186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bhat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghosal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Srikumar</surname>
          </string-name>
          ,
          <article-title>Is my model using the right evidence? systematic probes for examining evidence-based tabular reasoning</article-title>
          ,
          <source>Trans. Assoc. Comput. Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>659</fpage>
          -
          <lpage>679</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bayer</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Kaufhold</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Reuter</surname>
          </string-name>
          ,
          <article-title>A survey on data augmentation for text classification</article-title>
          ,
          <source>ACM Computing Surveys</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>J.-F. Bussotti</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Veltri</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Santoro</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Papotti</surname>
          </string-name>
          ,
          <article-title>Generation of training examples for tabular natural language inference</article-title>
          ,
          <source>Proc. ACM Manag. Data</source>
          <volume>1</volume>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1145/ 3626730. doi:
          <volume>10</volume>
          .1145/3626730.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>I. Trummer</surname>
          </string-name>
          ,
          <article-title>From BERT to GPT-3 codex: Harnessing the potential of very large language models for data management</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>3770</fpage>
          -
          <lpage>3773</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mecca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sileno</surname>
          </string-name>
          , E. Veltri,
          <article-title>Diogene-ct: tools and methodologies for teaching</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>