=Paper= {{Paper |id=Vol-3034/paper7 |storemode=property |title=Generating Table Vector Representations |pdfUrl=https://ceur-ws.org/Vol-3034/paper7.pdf |volume=Vol-3034 |authors=Aneta Koleva,Martin Ringsquandl,Mitchell Joblin,Volker Tresp }} ==Generating Table Vector Representations== https://ceur-ws.org/Vol-3034/paper7.pdf
Generating Table Vector Representations
Aneta Koleva1,2 , Martin Ringsquandl1 , Mitchell Joblin1 and Volker Tresp1,2
1
    Siemens, Otto-Hahn-Ring 6, 81739 Munich, Germany
2
    Ludwig Maximilian University of Munich, Geschwister-Scholl-Platz 1, 80539 Munich, Germany


                                         Abstract
                                         High-quality Web tables are rich sources of information that can be used to populate Knowledge
                                         Graphs (KG). The focus of this paper is an evaluation of methods for table-to-class annotation, which is a
                                         sub-task of Table Interpretation (TI). We provide a formal definition for table classification as a machine
                                         learning task. We propose an experimental setup and we evaluate 5 fundamentally different approaches
                                         to find the best method for generating vector table representations. Our findings indicate that although
                                         transfer learning methods achieve high F1 score on the table classification task, dedicated table encoding
                                         models are a promising direction as they appear to capture richer semantics.

                                         Keywords
                                         table interpretation, table classification, representation learning.




1. Introduction
Tabular data is one of the most prevalent data representations. The effort by Cafarella [1],
known as WebTables, identified and extracted more than 200 million high-quality tables from
HTML pages. The availability of such large corpus of structured data initiated several directions
of research related to the different applications of tabular data such as: table search [2], table
improvement [3], question answering [4], and semantic annotation of columns [5]. As a result of
the increasing adoption of KGs, which are often populated from tabular data, the task of aligning
tables with KGs, also referred to as table interpretation (TI), has become a highly relevant task.
In contrast to information extraction from unstructured documents, TI should leverage the
explicit relational structure. The unique table structure with rows and columns of cells and
other metadata can be exploited for discovery and disambiguation of the meaning captured in
the table. The task of TI entails three different sub-tasks. The first sub-task, which is the focus in
this paper, is the classification of tables according to classes in a given KG schema. The second
sub-task is related to linking rows from tables to existing entities in the KG. The annotation of
columns as entity attributes and the discovery of binary relations between columns is the third
sub-task of TI. While there have been several works focusing on the row-to-entity [6, 7, 8], and
column-to-attribute sub-tasks [5, 9], the task of linking a table to a class has been neglected.
However, in the case of entity tables, where one column (the core column) is associated to the
name of the entity and the remaining columns are attributes of this entity, discovering the class
of the table as a first step can greatly improve the solving of the other two sub-tasks. It is often
the case that the column names are missing or incorrect, therefore finding the name of the core
ISWC 2021 Workshop DL4KG
Envelope-Open firstname.lastname@siemens.com
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                                           1
Aneta Koleva et al. CEUR Workshop Proceedings                                                 1–10


column does not imply finding the class of the table. Moreover, when two tables have the same
column names and similar content (e.g., one table of class Country and one of class City), it is
not trivial to disambiguate the entities and column types based only on the table content. Once
a table has been interpreted, its content can be used for extracting new triples for enriching the
KG, a task known as KG completion, or for extracting missing facts for the KG, which is the task
of slot-filling.
   Due to the inherent scarcity of labelled data for the first sub-task (class-annotated tables),
a table classification model must either be of low complexity (few parameters) or leverage
pre-trained models. Using pre-trained models in TI has been studied only to a very limited
extend. Hence, we explore two promising directions for making learning-based approaches
more efficient: (a) by using transfer learning, (b) by considering additional inductive biases that
are unique to tabular data representations.
   We propose an experimental setup with the intention of finding the best method for generat-
ing a representation which captures the information from the table but also the row and column
structure, so that it can be later used towards solving the remaining sub-tasks of TI: row-to-entity
linking, column type annotation and relation extraction. We are interested in understanding
how pre-trained language models, such as BERT [10], and their dedicated table-based counter-
parts, for instance TaBERT [11], can be utilized for generating vector representation for table.
Surprisingly, our experiments show that a transfer learning method with a rich vocabulary
of pre-trained word embeddings achieves similar F1 score compared to more sophisticated
pre-trained language models (LM). Another interesting finding is that the inductive bias for
tabular structure in the LM pre-trained on tabular data does not bring beneficial impact to a text
pre-trained LM. However, the classification confusion matrix for this method, gives an insight
to the miss-classifications being justifiable and reasonable. Our main contributions are:
    • A formal definition of table classification as a machine learning task and a protocol for
      evaluating performance on this task.
    • A setup for table encoding using 5 fundamentally different approaches covering a spectrum
      of paradigms from general purpose document encoders to specialized pre-trained models
      designed for tabular data.
    • An extensive empirical evaluation of the different approaches.


2. Background
In this section, we review prior work related to solving the different sub-tasks of TI. We also
give a short overview of methods for generating vector representations of tables.

Table Interpretation The three sub-tasks of TI were first introduced in the paper by Ritze
et al. [12]. That paper also introduced the T2K Matcher, a method for iterative value-based
matching, which solves the TI tasks by matching values from the tables to values of retrieved
candidates from the KG. More recent work by Limaye et al. [9] proposed a probabilistic graphical
method which attempts to jointly solve the two sub-tasks of finding entity-to-row and column-
to-attribute alignments. Deng et al. [13] exploited word embeddings for representing the
contents of tables and utilized them for the discovery of new entities. The SemTab challenge



                                                 2
Aneta Koleva et al. CEUR Workshop Proceedings                                                1–10


[14] has also motivated new approaches [15, 16]. However, the task of table-to-class annotation
is not part of this challenge.

Table classification To the best of our knowledge, the T2K Matcher is the only existing
method for solving the table-to-class task. Namely, the class of the table is chosen by ranking
the sum of the similarity scores of the column-to-property correspondences aggregated per
class. Since this method requires querying of the KG for candidate retrieval and first solving the
column-to-property alignment in order to find the correct class of a table, we do not consider it
during our experiments. In contrast to the T2K Matcher, we consider a closed book scenario,
where the instances of the KG are not available, only the classes in the KG schema.

Representation Learning on Tables Based on powerful LM, dedicated deep learning models
have recently been proposed to exploit tabular data structures, e.g., in table-based question
answering [4, 17] and KG completion from tables [18]. One benefit from using pre-trained LM
is that they can handle synonyms well, e.g., the abbreviation of New York as NY, which are
frequently occurring in tables because of the innate limitation of the cells. The other benefit
is that, due to the exposure to large textual corpora during the pre-training phase, the LM
can store implicit information learned from the data whilst pre-training, in the form of model
parameters [19]. TaBERT [11] by Yin et al. is a novel model which was pre-trained to jointly
learn representation of a natural language question, called utterance, and tables. An example of
utterance for the entity table shown in Figure 1 is the question: How much is the population
of New York?. During encoding, instead of using the full table, TaBERT samples 1 or 3 rows,
referred to as content snapshot. First, each row from the snapshot, concatenated with the
utterance, is encoded by BERT [10]. Second, the encoding of the rows are stacked and in order
to generate vector representations for each of the columns, a vertical self-attention mechanism
is used. Finally, representation for the table is generated by pooling the column representations.
Similar work is the method TAPAS by Herzig et al. [20], which is also pre-trained on tables
and text segments. Ding et al. proposed TURL [17] as a framework for pre-training, also on
tabular data, which uses the same objectives as TaBERT for learning representations of the
content of the tables. Additionally, they proposed task-specific fine-tuning on the framework
for solving the row-to-entity and column-to-attribute annotation. Wang et al. [21] presented a
novel method which exploits information within one table but also aggregates the contextual
information shared across similar tables in order to generate a vector representation that can be
used for column-to-class annotation and relation prediction tasks.


3. Problem Description
We focus on the task of table-to-class annotation. The task has been introduced together with
the two other TI sub-tasks in [12], however without a formal definition. The goal of the table-
to-class annotation is to label a table with its corresponding class according to the given KG
schema. We now provide a definition of this task as a machine learning task.
  An entity table 𝑇𝑖 is a 𝑁𝑖 × 𝑀𝑖 matrix where 𝑁𝑖 and 𝑀𝑖 are the number of rows and columns of
                                                𝑖 , contains one or more tokens, where each token
the table 𝑇𝑖 . Each element of the matrix 𝑇𝑖 , 𝑟𝑛,𝑚



                                                3
Aneta Koleva et al. CEUR Workshop Proceedings                                                         1–10


is a sequence of characters. We denote with 𝑟𝑛,∗          𝑖 and 𝑟 𝑖 the 𝑛-th row and the 𝑚-th column of
                                                                  ∗,𝑚
the matrix 𝑇𝑖 respectively. The header of the table is the first row 𝐻𝑖 = 𝑟0,∗      𝑖 . The content of the
                        𝑖    𝑖           𝑖
table are the rows 𝑟1,∗ , 𝑟2,∗ , … , 𝑟𝑁 ,∗ .
   Let 𝒟 = {(𝑇1 , 𝑐𝑖 ), … , (𝑇𝑙 , 𝑐𝑖 )} be the set of labeled tables with 𝑙 number of tables, and each label
𝑐𝑖 ∈ 𝐶 is in the set of classes defined in the KG schema 𝒞 = {𝑐1 , … , 𝑐𝑘 }. A table encoder 𝐸𝜔
is a model, with a parameter vector 𝜔, which encodes each table 𝐸𝜔 ∶ {𝑇𝑖 } → ℝ𝑑 to a vector
𝐸𝜔 (𝑇𝑖 ) = 𝑥𝑖 and 𝒳 = {𝑥0 , 𝑥1 , … , 𝑥𝑙 } is the set of feature vectors for every 𝑇𝑖 ∈ 𝒟. The final
task is to train a classification model 𝑓𝜃 ∶ ℝ𝑑 → 𝒞 so that each table vector is assigned to one
of the class labels. The problem is defined in the multi-class setting. Formally our setting is
𝑓𝜃 ∘ 𝐸𝜔 ∶ {𝑇𝑖 } → 𝒞, where only the parameters 𝜃 are trained on the table classification task, i.e.,
no gradient updates are performed on 𝜔.


4. Experiments
Figure 1 shows the experimental setup for evaluating different table encoders. Given an entity
table, a table encoder generates a high-dimensional vector representation of the table. We then
train a classifier on the table-to-class task and evaluate the performance achieved by each of
the table encoders. We experiment with different types of table encoders, a simple method
such as document encoder, transfer learning methods with general-purpose pre-trained word
embeddings (Figure 1 (a)) and more complex methods which include a LM pre-trained on large
textual corpora and an approach for question-answering which has been pre-trained on tabular
data (Figure 1 (b)). The code for the experiments is accessible online 1 .

4.1. Dataset
For evaluation we used the second version of the T2D gold standard dataset [12], T2Dv2. To
the best of our knowledge, the T2D sets are the only publicly available datasets which have
been annotated with table-to-class correspondence. The second version of the dataset2 contains
237 such annotations. In our experiments, we consider those classes which have at least two
tables as representatives. The resulting dataset contains 223 tables, each labeled with one of the
27 unique classes. The mean of the number of rows in the dataset is 119.2 and the mean of the
number of columns is 7.7.

4.2. Models compared
In the evaluation we used 5 different models as table encoders, varying from general purpose
document encoders to more sophisticated LM, pre-trained on tabular data.

TF-IDF or term frequency-inverse document frequency, is a term weighting scheme which
generates vector representation for a document based on the frequency of the words in the
document. It is the simplest method which we used as a table encoder.

    1
        https://github.com/anetakoleva/tableClassification
    2
        http://webdatacommons.org/webtables/goldstandardV2.html



                                                      4
Aneta Koleva et al. CEUR Workshop Proceedings                                               1–10




Figure 1: Experimental setup for evaluation of table encoders.


Spacy pre-trained word vectors on a text extracted from blogs, news and comments. We used
the vectorizer from english-medium sized pipeline3 which contains vocabulary of size 684830.

Word2Vec pre-trained word vectors trained with FastText 4 on a Wikipedia text corpus. The
model used for the learning the vectors [22] is an extension of the original word2vec model.
It is skip-gram based and trained to learn representations for character n-grams. This model
consists of vocabulary of size 2.5 million.

BERT is a widely used, Transformer-based LM [10]. During the pre-training phase, the model
has been exposed to a large corpus of unstructured text with the objective of predicting missing
words and prediction of next sentence. This enables the model to learn the correlation of the
words and to generate different vector representation for words depending on the context.

TaBERT is a table encoding method [11], pre-trained on Web tables with the objective to be
used in question-answering tasks on tables. Since the model expects an utterance, i.e., a natural
language question, as input together with a table, in our experiments we provided an empty
space “ ”. We conducted more experiments to evaluate the influence of the utterance on the
generated table representation and we discuss these results in Section 5.
   3
       https://spacy.io/models/en#en_core_web_md
   4
       https://fasttext.cc/docs/en/pretrained-vectors.html



                                                             5
Aneta Koleva et al. CEUR Workshop Proceedings                                                            1–10


4.3. Setup
To systematically evaluate the quality of the representations generated with the different table
encoders, we compare their performance on the classification task under different scenarios. It
is important to note that we did not train or fine-tune any of the methods for table encoding, i.e.,
we used them off-the-shelf. Since the tables can be large, in order to avoid scalability issues, we
resort to sampling of rows. Namely, we first shuffle the rows in the tables and then we sample
the first 𝑞 rows. The shuffling of the rows is done only once. For the experiments, we sampled
𝑞 ∈ {1, 3, 5, 7} rows from each of the tables and used these sampled tables as input to the table
encoders.
   When using TF-IDF as table encoder, the input is a set of sequences, where each sequence
corresponds to a table from the set of tables 𝒟. More formally, a table sequence for table 𝑇𝑖 is a
sequence of rows 𝑆𝑇𝑖 = (𝑟0,∗   𝑖 , 𝑟 𝑖 , … , 𝑟 𝑖 ), such that 𝑞 ∈ {1, 3, 5, 7}, and the set of sequences is the
                                     1,∗      𝑞,∗
set 𝐼 = {𝑆𝑇0 , … , 𝑆𝑇𝑙 }. The table encoder TF-IDF transforms the set of table sequences to the set of
feature vectors 𝐸𝜔tf-idf ∶ 𝐼 → 𝒳.
   Word2Vec and Spacy generate the vector representation for table 𝑇𝑖 in 3 steps. First, the
sequence 𝑆𝐻𝑖 , representing the header of the table 𝑇𝑖 , is encoded as the mean over the word
vectors in the sequence 𝑆𝐻𝑖 , represented as 𝑥𝑖𝐻 . Second, the content of the table, is transformed
into a table sequence 𝑆𝑇𝑖 = (𝑟1,∗   𝑖 … 𝑟 𝑖 ) and encoded as the vector 𝑥𝑖 , which represents the mean
                                          𝑞,∗                                    𝐵
over all the word vectors in 𝑆𝑇𝑖 . Finally, the vector representations for the header and for the
table content are concatenated into one vector 𝑥𝑇𝑖 = 𝑥𝑖𝐻 ‖𝑥𝑖𝐵 .
   Considering that there is a limit on the length of the sequence that BERT can encode in one
step, we used different transformation for the last two methods. BERT encodes each table row
by row, i.e, a sequence 𝑆𝑟𝑖𝑧,∗ is generated for each of the rows 𝑟𝑧,∗         𝑖 of table 𝑇 , where 0 ≤ 𝑧 ≤ 𝑞.
                                                                                            𝑖
BERT generates row-wise vectors, so for each sequence 𝑆𝑟𝑖𝑧,∗ the output is a vector 𝑥𝑟𝑧,∗ . The
vector representation for table 𝑇𝑖 is the vector 𝑥𝑇𝑖 which is the result of the mean-pooling over
the set of the BERT’s output vectors {𝑥𝑟0,∗ , … , 𝑥𝑟𝑞,∗ } that correspond to the table rows. In the
same manner, the TaBERT model also first generates an encoding for each of the rows of table
𝑇𝑖 resulting in a set of vectors. This model uses vertical self-attention focused on the vertically
stacked vectors, {𝑥𝑟0,∗ , … , 𝑥𝑟𝑞,∗ }. Because of the vertically aligned vectors, the output of the model
is a column vector representation {𝑥𝑟∗,0 , … , 𝑥𝑟∗,𝑀 } for each of the 𝑀𝑖 columns in table 𝑇𝑖 . Finally,
                                                          𝑖
we do mean-pooling over the column representations to generate the table encoding 𝑥𝑇𝑖 .
   We then use the Multi-layer Perceptron (MLP) with one hidden layer of size 500, the tanh
activation function and adam optimizer as the classifier 𝑓𝜃 from Figure 1. The hyper parameters
are chosen after an extensive search and they are fixed for all of the experiments. Since the
available dataset is small, instead of splitting it once into a training set and a test set, we use
stratified K-fold validation with 𝐾 = 20 splits. Considering that the dataset is imbalanced, we
report the macro averaged F1 score. The reported scores are the average of the results on the
test set after the cross validation. To explore the effect of the column names, we also encoded
the tables with their column names masked. Specifically, for all of the tables, we substitute their
column names with the token [UNK].




                                                      6
Aneta Koleva et al. CEUR Workshop Proceedings                                              1–10


Table 1
Macro-averaged F1 score.

                           Column names                 Masked column names
                        𝑞=1 𝑞=3 𝑞=5 𝑞=7                𝑞=1 𝑞=3 𝑞=5 𝑞=7
            tf-idf         0.56   0.56   0.54   0.54    0.41   0.45    0.51    0.55
            spacy          0.64   0.69   0.74   0.73    0.48   0.58    0.61    0.63
            word2vec       0.69   0.76   0.76   0.78   0.61    0.77    0.76    0.80
            bert           0.76   0.78   0.79   0.80   0.63    0.75    0.78    0.78
            tabert         0.75   0.77   0.77   0.78   0.61    0.71    0.71    0.74


5. Results
Table 1 shows the macro averaged F1 score for the 5 table encoders on the table classification
task under two different settings: (1) given the input tables with the column names and (2)
given the input tables with their column names masked ([UNK] token). We report the achieved
F1 score for the different sizes of the input tables with the number of sampled rows 𝑞 varying
from 1 row to 7 rows. The simplest table encoder, TF-IDF achieves the lowest F1 score and
the score only got lower when the column names of the tables were masked. For the two
models with pre-trained word vectors, we observe that the model with the richer vocabulary
has higher score. Indeed, the F1 score of Word2Vec is comparable with the scores achieved by
BERT and TaBERT. In the first setting, when the column names of the tables are visible, there
is no significant difference between the scores achieved by BERT and the scores of TaBERT.
However, in the setting when the column names are masked, BERT consistently outperforms
TaBERT. Interestingly, Word2Vec is the only table encoder that was not affected by the masking
of the column names, on the contrary, it achieved better score in the case when 𝑞 = 3 and 𝑞 = 7
under the second setting compared to the setting when the column names are visible.
   Figure 2 shows the row-normalized confusion matrix for the table classification task for
Word2Vec and TaBERT across the different classes. The horizontal axis shows the predicted
labels and the vertical axis shows the true labels. We observe the performance of the two
models under the same scenario: the input tables are with 𝑞 = 7 rows and the column names
are masked. The classes are ordered by the number of instances assigned to them, Country is
the class with the most instances, 33, while Airline has only 2 instances. From the confusion
matrix for TaBERT (Figure 2 right) it can be observed that more miss-classifications are for
the classes with a lower number of instances and they are not that unexpected. For instance,
miss-classifying an instance of class Person as an instance of class Scientist, is an acceptable
mistake. Similarly for the instances of classes Academic Journal and Newspaper, and Political
Party and Election. On the other hand, the miss-classifications by Word2Vec for class Wrestler
and class Animal as instances of class Film are much more unexpected and critical. Likewise,
Word2Vec miss-classifies the tables of class Scientist and of class Radio Station as instances
of the class Country which indicates a weak semantic structure in the vector representations.
These results suggest that although Word2Vec achieves higher F1 score, the TaBERT vector
representations capture semantics with a smoother transitions between classes.



                                                7
Aneta Koleva et al. CEUR Workshop Proceedings                                                  1–10




Figure 2: Classification confusion matrix for Word2Vec (left) and for TaBERT (right).


TaBERT Analysis To get a better understanding of the (under-) performance of TaBERT
we analyse the influence of the utterance and its interplay with column names. In addition to
the empty string “ ” used in previous experiments, we also used a randomly generated string
with 10 characters (unique per table), and one constant string, Thing, for all tables. Moreover,
we experimented with adding the correct class of the tables as utterance, as well as a wrong
class (for instance, all the tables of class Country are encoded with the class Plant as utterance).
Figure 3 shows the results of these experiments, where the input tables were with 𝑞 = 3 rows.
The horizontal axis shows the different options that we passed as utterance to the model and
the vertical axis shows the achieved F1 score. The masking of column names has significant
influence on the generated table representation. The reason for this might be in the way how
a row is transformed into a string, i.e., the value of each table entry is concatenated with the
column name of the entry and its value. Observing the results with the different utterance, we
see that the choice of utterance does not affect the performance of the model when the column
names are not masked. Nevertheless, when the column names are masked, the influence of the
utterance is more significant. In both cases when the utterance is the wrong class or the correct
class, the achieved score is much higher, which might be attributed to a class-wide shift in the
vector space because of the grouping that these utterances cause.

6. Conclusion and Future work
In this paper we explored different types of table encoders for generating vector representations
for tabular data. Specifically, we focused on evaluating different methods for table encoding on
the sub-task for TI, table-to-class annotation. Despite the increasing interest in the problem
of TI, so far, only one approach towards this specific sub-task has been proposed. In this
direction, we provided a formal definition for the table-to-class annotation task as a machine
learning task. We conduct an empirical study with five different methods for generating vector



                                                  8
Aneta Koleva et al. CEUR Workshop Proceedings                                               1–10




Figure 3: TaBERT performance with different utterances.


representation of a table and evaluate their performance on the table-to-class annotation task.
The results from our experiments show that transfer learning methods with large vocabularies
of pre-trained word embeddings perform on par with more complex and expensive modes
such as LM pre-trained on tables. An interesting finding is that the inductive bias for tabular
structure in TaBERT did not bring benefit to the performance of the BERT model. A possible
explanation for this is the missing significant utterance that the TaBERT model expects as input.
Nonetheless, the miss-classifications made by this model are reasonable, suggesting that the
vector representations capture the semantics of the tables. Future work should target closing
the gap between existing general-purpose models and model specific for encoding tabular data.
To further our work we plan to explore other existing methods for table encoding for solving
the table-to-class task, as well as for solving the entity-to-row and column-to-property tasks.


References
 [1] M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, Y. Zhang, Webtables: exploring the power
     of tables on the web, VLDB (2008).
 [2] P. Venetis, A. Y. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, C. Wu, Recovering
     semantics of tables on the web, VLDB (2011).
 [3] M. Zhang, K. Chakrabarti, Infogather+: semantic matching and annotation of numeric
     and time-varying attributes in web tables, in: SIGMOD, 2013.
 [4] H. Sun, H. Ma, X. He, W. Yih, Y. Su, X. Yan, Table cell search for question answering, in:
     WWW, 2016.
 [5] J. Chen, E. Jiménez-Ruiz, I. Horrocks, C. Sutton, Learning semantic annotations for tabular




                                                9
Aneta Koleva et al. CEUR Workshop Proceedings                                                 1–10


     data, in: IJCAI, 2019.
 [6] V. Efthymiou, O. Hassanzadeh, M. Rodriguez-Muro, V. Christophides, Matching web tables
     with knowledge base entities: From entity lookups to entity embeddings, in: ISWC, 2017.
 [7] P. Nguyen, N. Kertkeidkachorn, R. Ichise, H. Takeda, Tabeano: Table to knowledge graph
     entity annotation, CoRR (2020). a r X i v : 2 0 1 0 . 0 1 8 2 9 .
 [8] S. Zhang, E. Meij, K. Balog, R. Reinanda, Novel entity discovery from web tables, in:
     WWW, 2020.
 [9] G. Limaye, S. Sarawagi, S. Chakrabarti, Annotating and searching web tables using entities,
     types and relationships, VLDB (2010).
[10] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional
     transformers for language understanding, in: NAACL-HLT, 2019.
[11] P. Yin, G. Neubig, W. Yih, S. Riedel, Tabert: Pretraining for joint understanding of textual
     and tabular data, in: ACL, 2020.
[12] D. Ritze, O. Lehmberg, C. Bizer, Matching HTML tables to dbpedia, in: WIMS, 2015.
[13] L. Zhang, S. Zhang, K. Balog, Table2vec: Neural word and entity embeddings for table
     population and retrieval, in: SIGIR, 2019.
[14] E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou, J. Chen, K. Srinivas, Semtab 2019: Re-
     sources to benchmark tabular data to knowledge graph matching systems, in: ESWC,
     2020.
[15] S. Chen, A. Karaoglu, C. Negreanu, T. Ma, J. Yao, J. Williams, A. Gordon, C. Lin, Linkingpark:
     An integrated approach for semantic table interpretation, in: SemTab@ISWC, 2020.
[16] P. Nguyen, I. Yamada, N. Kertkeidkachorn, R. Ichise, H. Takeda, Mtab4wikidata at semtab
     2020: Tabular data annotation with wikidata, in: SemTab@ISWC, 2020.
[17] X. Deng, H. Sun, A. Lees, Y. Wu, C. Yu, TURL: table understanding through representation
     learning, VLDB (2020).
[18] B. Kruit, P. A. Boncz, J. Urbani, Extracting novel facts from tables for knowledge graph
     completion, in: ISWC, 2019.
[19] A. Roberts, C. Raffel, N. Shazeer, How much knowledge can you pack into the parameters
     of a language model?, in: EMNLP, 2020.
[20] J. Herzig, P. K. Nowak, T. Müller, F. Piccinno, J. M. Eisenschlos, Tapas: Weakly supervised
     table parsing via pre-training, in: ACL, 2020.
[21] D. Wang, P. Shiralkar, C. Lockard, B. Huang, X. L. Dong, M. Jiang, TCN: table convolutional
     network for web table interpretation, in: WWW, 2021.
[22] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword
     information, Transactions of the Association for Computational Linguistics (2017).




                                                10