=Paper= {{Paper |id=Vol-3784/short2 |storemode=property |title=THoRR: Complex Table Retrieval and Refinement for RAG |pdfUrl=https://ceur-ws.org/Vol-3784/short2.pdf |volume=Vol-3784 |authors=Kihun Kim,Mintae Kim,Hokyung Lee,Seongik Park,Youngsub Han,Byoung-Ki Jeon |dblpUrl=https://dblp.org/rec/conf/ir-rag/KimKLPHJ24 }} ==THoRR: Complex Table Retrieval and Refinement for RAG== https://ceur-ws.org/Vol-3784/short2.pdf
                         THoRR: Complex Table Retrieval and Refinement for RAG
                         Kihun Kim∗ , Mintae Kim, Hokyung Lee, Seongik Park, Youngsub Han and Byoung-Ki Jeon
                         LG UPLUS, 71, Magokjungang 8-ro, Gangseo-gu, Seoul, Republic of Korea


                                          Abstract
                                          Recent advancements in the contextual understanding and generation capabilities of Large Language Models (LLMs) have sparked
                                          increasing interest in the application of Retrieval-Augmented Generation (RAG) in specific domains and industry documents. Retrieving
                                          and understanding tables within these documents is crucial for generating correct answers in RAG systems. This study focuses on
                                          documents containing large and complex tables, such as statistical and industry reports and these presents two major challenges: 1)
                                          processing the large tables and 2) understanding complex tables. Previous studies faced challenges as they considered elements of tabular
                                          data such as cells, headers, and titles. In contrast, we designed the Table Header for Retrieval and Refinement (THoRR) method to address
                                          the aforementioned issues. THoRR performs two tasks: table retrieval and table refinement. In the table retrieval phase, we propose
                                          a table header representation approach that uses headers and titles, without considering cells. In the refinement phase, the model
                                          selects relevant table headers from the retrieved tables and processes them into refined tables containing the necessary information to
                                          answer the questions. This approach aids in understanding complex tables without chunking, by reorganizing information. Our models
                                          outperform existing approaches such as DTR and DPR-table. Moreover, we experimentally demonstrate that our refinement model can
                                          reduce hallucinations. To the best of our knowledge, our table refinement approach for RAG system is the first of its kind in the field.

                                          Keywords
                                          table retrieval, complex table, retrieval-augmented generation (RAG), table refinement, table representation



                         1. Introduction                                                                                                The first challenge involves processing large and complex
                                                                                                                                     tables. Previous studies, such as DTR[3], and DPR-table[4],
                                                                                                                                     were designed with relatively simple open-domain tables in
                                                                                                                                     mind, such as those found in the nq-table[5] dataset, thus
                                                                                                                                     de-emphasizing the processing of large tables. Similar to the
                                                                                                                                     processing of text documents, previous methods involved
                                                                                                                                     dividing data tables into fixed-length segments (chunking),
                                                                                                                                     or even cutoff parts that exceeded a maximum input length.
                                                                                                                                     The chunking method complicates data retrieval by not only
                                                                                                                                     increasing the number of retrieval targets but also making
                                                                                                                                     it challenging to compare values across segmented tables.
                                                                                                                                     Moreover, disregarding overflow sections risks losing table
                                                                                                                                     information, diminishing the probability of obtaining a suf-
                                                                                                                                     ficient table representation. These problems can ultimately
                                                                                                                                     affect table retrieval performance.
                                                                                                                                        The second challenge is the difficulty in understanding ta-
                                                                                                                                     bles due to their complex structure. Complex tables typically
                                                                                                                                     feature hierarchical headers and numerous values, present-
                                                                                                                                     ing a challenge for generator to consider vast amounts of
                                                                                                                                     information. Insufficient table comprehension can lead to in-
                                                                                                                                     correct answers (hallucinations). Figure 1 demonstrates an
                         Figure 1: Example of tableQA with gpt-3.5-turbo. Comparing                                                  example where GPT-3.5-turbo[6] is used to perform tableQA
                         the result of the original complex table (top) and the refined table                                        on a hierarchical table. It showcases how the original table
                         (bottom).                                                                                                   leads to incorrect responses, whereas the refined table, as
                                                                                                                                     processed by our proposed model, yields the correct an-
                                                                                                                                     swers.
                           Recent advancements in the contextual understanding
                                                                                                                                        In this paper, we propose Table Header For Retrieval and
                         and generative capabilities of Large Language Models
                                                                                                                                     Refinement (THoRR) to solve this problem. These method
                         (LLMs) have heightened interest in Retrieval-Augmented
                                                                                                                                     is grounded in a heuristic assumption that, when finding
                         Generation (RAG)[1, 2] for specific domains such as open
                                                                                                                                     and understanding tables, headers are more critical than
                         domain or industry-specific documents. Industry or finance
                                                                                                                                     values. THoRR has two models, a retriever and a refine-
                         domain documents often contain large and complex tables.
                                                                                                                                     ment model, performed sequentially. Each is different from
                         The understanding of which is critical for a RAG system to
                                                                                                                                     the previous one. THoRR: Retriever uses a table header
                         produce accurate responses. However, this task presents
                                                                                                                                     representation. It performs table retrieval using only the
                         several challenges. Our research seeks solutions to two
                                                                                                                                     header without considering the cells of the table. THoRR:
                         primary challenges.
                                                                                                                                     Refinement performs relevant table header detection in the
                                                                                                                                     retrieved table to select table headers that are relevant to the
                          IR-RAG’24: Information Retrieval’s Role in RAG Systems (IR-RAG), July
                          18, 2024, IR-RAG, Washington D.C.                                                                          question, and refines them into a simple table that contains
                         ∗
                               Corresponding author.                                                                                 only the necessary information, reducing the amount of
                          Envelope-Open kimkihun@lguplus.co.kr (K. Kim); iammt@lguplus.co.kr (M. Kim);                               information the generator needs to consider.
                          hogay88@lguplus.co.kr (H. Lee); spark32@lguplus.co.kr (S. Park);                                              We compare THoRR with DTR[3] and DPR-table[4] and
                          yshan042@lguplus.co.kr (Y. Han); bkjeon@lguplus.co.kr (B. Jeon)
                                                                                                                                     show that it has better retrieval performance in fine-tuning
                          Orcid 0009-0005-9453-7443 (K. Kim)
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License   and zero-shot experiments on the HiTab[7] and AIT-QA[8]
                                      Attribution 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
          Figure 2: Architecture of the RAG system with LLM for table data, featuring our proposed THoRR method.



datasets while reducing the information (number of cells)          𝑡. Our method, which utilizes the table’s header and title
needed to input the generator. Furthermore, our proposed           without considering every cell, is relatively free from the
methodology enables an efficient reduction in the number           input limitations of the encoder. The chunking method and
of tokens required for table inputs in the generator.              our comparative experiments are explained in Section 3.2
                                                                      The objective of the training is to minimize the dis-
                                                                   tance between questions q and positive table 𝑡𝑖+ while
2. Method                                                          maximizing the distance between queries and the num-
                                                                   ber of 𝑛 negative tables 𝑡𝑖− in a given training dataset
In this section, we present the Table Header For Retrieval
                                                                   𝐷 = {(𝑞𝑖 , 𝑡𝑖+ , 𝑡𝑖,1
                                                                                     − , 𝑡 − , ..., 𝑡 − )}𝑀 . The loss function, optimized
                                                                                          𝑖,2        𝑖,𝑛 𝑖=1
and Refinement (THoRR) method, designed to retrieve and
                                                                   as Negative Log Likelihood (NLL) :
refine tables within the RAG system [2]. THoRR is divided
into two phases, retrieval and refinement, as shown in Figure                            𝐿𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑟 (𝑞𝑖 , 𝑡𝑖+ , 𝑡𝑖,1
                                                                                                                 − , 𝑡 − , ..., 𝑡 − )
                                                                                                                      𝑖,2        𝑖,𝑛
2. These two phases are separately trained and serve distinct                                                       +
purposes. The retrieval phase is utilized for embedding and                                              𝑒 𝑠𝑖𝑚(𝑞𝑖 ,𝑡𝑖 )                 (3)
                                                                                  = −𝑙𝑜𝑔    𝑛              +                −
indexing tables. Subsequently, as a question is input, it                                  ∑𝑗=1 𝑒 𝑠𝑖𝑚(𝑞𝑖 ,𝑡𝑖 ) + 𝑒 𝑠𝑖𝑚(𝑞𝑖 ,𝑡𝑖,𝑗 )
retrieves the pre-indexed tables. In the refinement phase,
the retrieved tables are processed to extract the necessary
information, refining them into smaller tables.                    2.2. Table Refinement Model
    The goal of this method is to obtain the 𝑇 𝑜𝑝_𝐾 refined
                                                                   This paper introduces a new task called Table Refinement,
Tables 𝑇𝑟 relevant to the given 𝑀 target Tables 𝑇 when a
                                                                   defined as simplifying a table while preserving specific infor-
question 𝑄 is provided. We denote the components of 𝑇 as
                                                                   mation. Accordingly, our THoRR:refinement model aims to
𝑡𝑖𝑡𝑙𝑒, ℎ𝑒𝑎𝑑𝑒𝑟𝑟𝑜𝑤 , and ℎ𝑒𝑎𝑑𝑒𝑟𝑐𝑜𝑙 , representing the row headers,
                                                                   obtain refined tables, denoted as 𝑡𝑟 , for the 𝑇 𝑜𝑝_𝐾 candidate
column headers, and title, respectively. The comparative
                                                                   tables from the retrieval phase. The input 𝑥𝑟 is defined by
experiments between THoRR and the existing table retrieval
                                                                   equation 4, where the ℎ𝑒𝑎𝑑𝑒𝑟 ∈ [ℎ𝑒𝑎𝑑𝑒𝑟𝑟𝑜𝑤 , ℎ𝑒𝑎𝑑𝑒𝑟𝑐𝑜𝑙 ]. Simi-
baseline are explained in Section 3.1
                                                                   lar to equation 5, 𝑥𝑟 is input to the refinement encoder 𝐸𝑛𝑐𝑅
                                                                   to obtain hidden states. And then, the linear layer takes in
2.1. Table Retriever                                               these hidden states and outputs the relevant header score,
                                                                   denoted as ℎ. Using ℎ, we obtain the 𝑇 𝑜𝑝_𝐶 relevant column
Given M target tables T, Our THoRR:retrieval model aims to
                                                                   headers indices (𝐼𝑐 ) and 𝑇 𝑜𝑝_𝑅 relevant row header indices
retrieve the 𝑇 𝑜𝑝_𝐾 candidate tables containing information
                                                                   (𝐼𝑟 ) as specified in Equation 6. Subsequently, we refine candi-
relevant to the question Q. In this paper, we follow the struc-
                                                                   date tables using selected row and column indices to obtain
ture of DPR[9] for comparison with DPR-table[4]. we use
                                                                   𝑡𝑟 .
two different encoders (the table header encoder (𝐸𝑛𝑐𝑇 ) and
the question encoder (𝐸𝑛𝑐𝑄 ), both utilizing the base model
                                                                                 𝑥𝑟 = {[𝐶𝐿𝑆] 𝑄 [𝑆𝐸𝑃] ℎ𝑒𝑎𝑑𝑒𝑟 [𝑆𝐸𝑃]}                      (4)
of [10]. 𝐸𝑛𝑐𝑇 maps target 𝑀 tables to table header repre-
sents 𝑡 and builds an index 𝑡 that will be used for retrieval.                                   ℎ = 𝐸𝑛𝑐𝑅 (𝑥𝑟 )                         (5)
The input 𝑥𝑡 to 𝐸𝑛𝑐𝑇 is defined in equation 1. When given
                                                                                     𝐼𝑟𝑜𝑤 = 𝑎𝑟𝑔𝑚𝑎𝑥(ℎ𝑟𝑜𝑤 , 𝑇 𝑜𝑝_𝑅)
a question 𝑄, obtain a question representation 𝑞 using the                                                                              (6)
𝐸𝑛𝑐𝑄 , and then select the 𝑇 𝑜𝑝_𝐾 closest candidate tables for                         𝐼𝑐𝑜𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥(ℎ𝑐𝑜𝑙 , 𝑇 𝑜𝑝_𝐶)
indexed 𝑡 from it. The similarity between 𝑡 and 𝑞 is defined
by using the dot product, as in [9] (equation 2), and the          The learning objective aims to identify the index of the
encoder uses the base model of [10].                               question and relevant header, with the goal of increasing
                                                                   the score of the answer’s header index ℎ𝑖 . The loss function
                                                                   is as described in Equation (5), We optimized Cross Entropy
  𝑥𝑡 = {[𝐶𝐿𝑆] 𝑡𝑖𝑡𝑙𝑒 [𝑆𝐸𝑃] ℎ𝑒𝑎𝑑𝑒𝑟𝑐𝑜𝑙 [𝑆𝐸𝑃] ℎ𝑒𝑎𝑑𝑒𝑟𝑟𝑜𝑤 [𝑆𝐸𝑃]}         Loss. Where, 𝑁 represents the number of tokens in input 𝑥𝑟
                                                         (1)       and 𝑦 is the gold relevant header index.
              𝑆𝑖𝑚(𝑞, 𝑡) = 𝐸𝑛𝑐𝑄 (𝑄)⊤ ⋅ 𝐸𝑛𝑐𝑇 (𝑥𝑡 )         (2)
                                                                                                                   𝑒𝑥𝑝(ℎ𝑦 )
                                                                               𝐿𝑟𝑒𝑓 𝑖𝑛𝑒𝑚𝑒𝑛𝑡 (ℎ, 𝑦) = −𝑙𝑜𝑔        𝑁
                                                                                                                                        (7)
In this process, a difference aspect of our retriever compared                                                  ∑𝑛=1 = 𝑒𝑥𝑝(ℎ𝑛 )
to previous research lies in the table header representation
     Table 1
     Comparison of retrieval accuracy performance of our THoRR method and the baselines. Fine-tuning denotes training with the
     HiTab dataset and Zero-shot denotes evaluation of AIT-QA using fine-tuned model.
                   Refinement                   HiTab Fine-tuning                                 AIT-QA Zero-shot
 Model           𝑇 𝑜𝑝_𝐶   𝑇 𝑜𝑝_𝑅   HIT@1    HIT@5 HIT@10 HIT@20           HIT@50     HIT@1    HIT@5 HIT@10 HIT@20           HIT@50
 DTR[3]            -        -       19.00    40.97     51.96      64.27     77.53      8.74    20.39     28.16     41.75         71.07
 DPR-table[4]      -        -       40.40    69.51     77.15      84.03     90.66     19.61    41.75     55.15     71.26         89.51
 THoRR             5        -      45.39     74.75     82.83      87.31    91.60     22.52     47.38     62.91     74.95         92.82
 (Ours)            5       10      43.50     71.84     79.55      84.03    88.07     21.75     44.27     59.03     69.51         84.85
                   7        -      45.77     75.51     83.59      88.07    92.49     23.50     48.54     64.47     76.89         94.76
                   7       10      43.88     72.60     80.30      84.79    88.95     22.72     45.44     60.58     71.46         86.80



3. EXPERIMENTS                                                       domain. In this process, a fine-tuned model using the HiTab
                                                                     [7] dataset is used to make predictions on the AIT-QA[8]
Dataset We conduct experiments on two complex table                  dataset without any additional training, and the results are
benchmark datasets. HiTab[7] is a Table QA dataset with              evaluated. Through this experiment, we aim to demonstrate
a hierarchical structure. This dataset consists of questions         that the proposed models can handle complex table retrieval
that require complex numerical calculations, including ta-           in previous unseen domains. Table 1 presents the results of
bles from Wikipedia and statistical reports. It contains a           this experiment, showing superior performance compared
total of 10,672 question-answer pairs, with 7,417 for train-         to the baselines and indicating well THoRR works on com-
ing, 1,671 for validation, and 1,584 for testing. There are          plex tables in different domains.
a total of 3,597 tables in this dataset. We use this dataset
for fine-tuning. AIT-QA[8] is a Table QA dataset specific to
the Airline industry, composed of tables extracted from the
                                                                     3.2. Retrieval Result
U.S. public SEC filings. It includes specialized vocabulary
terms for a specific domain and also has a hierarchical struc-
ture like HiTab[7]. It consists of 515 questions and answers,
with a total of 116 tables. In this paper, this dataset is used
to evaluate the zero-shot performance of the fine-tuning
model.
   Baseline In order to demonstrate the performance of our
method, we compare it with baseline methods. DTR[3] is
a table encoder that uses a table-specific structure. DPR-
table[4], on the other hand, processes tables linearly, similar
to understanding text passages. Both of these baselines
have been trained on the nq-dataset[5] and their pretrained
models are publicly available. We fine-tune these pre-train
models as backbones and compare them with our model.
                                                                                                  (a)
3.1. Main Result : THoRR
The experiments in this paper evaluate the proposed mod-
els, THoRR, in a two-phase process as shown in Figure 1
(THoRR:retrieval and THoRR:refinement). The performance
of the models is evaluated using the ’Hits accuracy’ as the
main evaluation metric. This metric measures the ratio of
correct answers included in the 𝑇 𝑜𝑝_𝐾 selected tables by
the models. Where, 𝑇 𝑜𝑝_𝐾 takes values 1, 5, 10, 20, 50 to
evaluate the accuracy of the models.
   Fine-tuning To compare fine-tuning experiments on the
complex table dataset, we train THoRR and baselines using
the HiTab [7] training set. Table 1 presents the performance
of the THoRR method compared to baseline models. The
experimental results indicate that the proposed models out-
performed baselines in most cases. When 𝑇 𝑜𝑝_𝐶 = 7 and                                            (b)
𝑇 𝑜𝑝_𝐾 <= 10, the proposed models exhibit an accuracy
improvement of more than 5% compared to the baseline’s               Figure 3: (a) Retriever accuracy with DPR-table’s chunking
best accuracy. The superior performance at a small 𝑇 𝑜𝑝_𝐾            method vs THoRR:retrival’s table header representation method.
indicates the importance in the RAG system, as it indicates          (b) Comprison between the number of chunks by the max token
                                                                     length.
effective utilization of a limited number of reference pieces
of information, which is common when the 𝑇 𝑜𝑝_𝐾 is less
than 10.                                                                We compare our proposed table header representation
   Zero-shot The zero-shot experiment intend to observe              method and chunking method in terms of retrieval accuracy.
how the model performs on complex table data from a new              Figure 3(a) illustrates the performance of [4] with the chunk-
ing method and our method. (”inf” refers to the use of the        learns over 26 million natural language questions and ta-
original table without chunking.) As shown in Figure 3(b),        bles. TURL[18] introducing a structure-aware Transformer
we observe a decrease in retrieval accuracy as the lower max      encoder and Masked Entity Recovery (MER) objective for
token length, indicating that the number of retrieval targets     pre-training. StruG[14] proposes a semi-supervised learn-
affects the performance significantly in retrieval tasks. Our     ing framework for learning the connection between text
approach demonstrates superior performance compared to            and SQL. MATE[15] demonstrates the efficient restriction of
methods that consider all values. This highlights the effec-      Transformer attention flow on tabular data, enabling train-
tiveness of our method, which relies solely on table headers      ing with larger sequence lengths. Tableformer[16] learns
for table representation, especially in retrieving large and      from tables using attention biases, making it better at un-
complex tables. Moreover, our method demonstrates su-             derstanding tabular data. TABBIE[17] introduces a method
perior performance compared to existing approaches that           to improve performance on table-based prediction tasks by
consider all values, thereby experimentally validating our        pre-training only tabular data.
heuristic assumption that headers are crucial elements in            Research on table retrieval includes methodologies such
table retrieval.                                                  as [19, 3, 4, 20, 21]. Table2vec[19] proposes a method for
                                                                  obtaining table embeddings by considering various table el-
3.3. Refinement Result                                            ements such as captions, headers, cells, and entities. DTR[9]
                                                                  introduces a table-specific model suitable for open-domain
                                                                  table question answering. DPR-table[4] linearizes tables
                                                                  to handle them similar to text passages, instead of using
                                                                  table-specific models. GTR[20] introduces a model that
                                                                  transforms tables into graphs, capturing both cell and lay-
                                                                  out structures. [21] introduces a method for enhancing
                                                                  the similarity between queries and tables for table retrieval,
                                                                  employing various semantic spaces and similarity measure-
                                                                  ment methods.


                                                                  5. Conclusion
                                                                  We propose the THoRR method, which uses the table head-
                                                                  ers to retrieve and help understand the complex and large
Figure 4: Comparison of human evaluation performance on           tables. We use the table header representations in the re-
TableQA and the number of refined table cells.                    triever that can retrieve tables without chunking them. Ad-
                                                                  ditionally, we propose a novel methodology for refining
                                                                  tables by detecting the table headers that are relevant to the
   In this section, we experiment with our refinement model
                                                                  questions within the table. This approach aims to simplify
to reduce cell information in mitigating hallucinations. In
                                                                  the tables in which an excessive amount of information is
Figure 4, the green line indicates a decreasing trend in the
                                                                  present, particularly in complex tables. THoRR is capable
number of cells in tables when using our model. Further-
                                                                  of handling large and complex tables without dividing them
more, Figure 4 illustrates the human evaluation accuracy on
                                                                  into smaller chunks, reducing the information required for
the results obtained by input refined tables into Llama2[11]
                                                                  preventing hallucinations in LLM generator. Furthermore,
7B-Chat. Where, ”(-,-)” denotes the original table. We ran-
                                                                  the Table Refinement task is the first of its kind in this field,
domly sample 300 questions from the HiTab test dataset for
                                                                  therefore, it is expected to contribute significantly to the
human evaluation. Llama2[11] 7B-Chat takes a gold table as
                                                                  future research in this field. Our future work involves ex-
input to generate responses. If the generated response con-
                                                                  ploring methods to detect the table headers. Additionally,
tains exactly the answer and is correct, we mark it as correct.
                                                                  we aim to prevent potential information loss in questions
Otherwise, we consider it as a hallucination. Three master’s
                                                                  by selecting fewer relevant headers during the refinement
students in the field of AI evaluated the generated results.
                                                                  phase.
To ensure the reliability of the evaluations, one evaluator
and two validators were assigned roles in the evaluation
process. As a result, by setting 𝑇 𝑜𝑝_𝐶 = 7 and 𝑇 𝑜𝑝_𝑅 = 10,      References
we demonstrate that our refinement model reduces the num-
ber of table cells from 153.88 to 58.03, resulting in a 62.2%      [1] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin,
decrease compared to the original table. Additionally, we              N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rock-
observe a 9.33% improvement in the reduction of halluci-               täschel, S. Riedel, D. Kiela, Retrieval-augmented gen-
nations. This validates the superiority of our refinement              eration for knowledge-intensive nlp tasks, in: Proceed-
approach.                                                              ings of the 34th International Conference on Neural
                                                                       Information Processing Systems, NIPS’20, Curran As-
                                                                       sociates Inc., Red Hook, NY, USA, 2020.
4. Related Works                                                   [2] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai,
                                                                       J. Sun, Q. Guo, M. Wang, H. Wang, Retrieval-
Research on table encoders has been focused on pre-training
                                                                       augmented generation for large language models: A
tabular data with table-specific architectures[12, 13, 14, 15,
                                                                       survey, ArXiv abs/2312.10997 (2023). URL: https:
16, 17]. TAPAS[12] introduces a pre-training method using
                                                                       //api.semanticscholar.org/CorpusID:266359151.
Masked-Language-Modeling for the cells of tabular data.
                                                                   [3] J. Herzig, T. Müller, S. Krichene, J. Eisenschlos, Open
TaBERT[13] introduces a pre-training model that jointly
                                                                       domain question answering over tables via dense re-
    trieval, in: K. Toutanova, A. Rumshisky, L. Zettle-              Language Processing (EMNLP), Association for Com-
    moyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cot-           putational Linguistics, Online, 2020, pp. 6769–6781.
    terell, T. Chakraborty, Y. Zhou (Eds.), Proceedings of           URL: https://aclanthology.org/2020.emnlp-main.550.
    the 2021 Conference of the North American Chap-                  doi:10.18653/v1/2020.emnlp- main.550 .
    ter of the Association for Computational Linguistics:       [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
    Human Language Technologies, Association for Com-                Pre-training of deep bidirectional transformers for
    putational Linguistics, Online, 2021, pp. 512–519. URL:          language understanding, in: J. Burstein, C. Doran,
    https://aclanthology.org/2021.naacl-main.43. doi:10.             T. Solorio (Eds.), Proceedings of the 2019 Conference
    18653/v1/2021.naacl- main.43 .                                   of the North American Chapter of the Association for
[4] Z. Wang, Z. Jiang, E. Nyberg, G. Neubig, Table retrieval         Computational Linguistics: Human Language Tech-
    may not necessitate table-specific model design, in:             nologies, Volume 1 (Long and Short Papers), Associa-
    W. Chen, X. Chen, Z. Chen, Z. Yao, M. Yasunaga, T. Yu,           tion for Computational Linguistics, Minneapolis, Min-
    R. Zhang (Eds.), Proceedings of the Workshop on Struc-           nesota, 2019, pp. 4171–4186. URL: https://aclanthology.
    tured and Unstructured Knowledge Integration (SUKI),             org/N19-1423. doi:10.18653/v1/N19- 1423 .
    Association for Computational Linguistics, Seattle,         [11] H. Touvron, L. Martin, K. Stone, P. Albert, A. Alma-
    USA, 2022, pp. 36–46. URL: https://aclanthology.org/             hairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava,
    2022.suki-1.5. doi:10.18653/v1/2022.suki- 1.5 .                  S. Bhosale, et al., Llama 2: Open foundation and fine-
[5] T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins,            tuned chat models, arXiv preprint arXiv:2307.09288
    A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. De-         (2023).
    vlin, K. Lee, K. Toutanova, L. Jones, M. Kelcey, M.-W.      [12] J. Herzig, P. K. Nowak, T. Müller, F. Piccinno,
    Chang, A. M. Dai, J. Uszkoreit, Q. Le, S. Petrov, Nat-           J. Eisenschlos, TaPas: Weakly supervised table
    ural questions: A benchmark for question answering               parsing via pre-training, in: D. Jurafsky, J. Chai,
    research, Transactions of the Association for Com-               N. Schluter, J. Tetreault (Eds.), Proceedings of the
    putational Linguistics 7 (2019) 452–466. URL: https:             58th Annual Meeting of the Association for Compu-
    //aclanthology.org/Q19-1026. doi:10.1162/tacl_a_                 tational Linguistics, Association for Computational
    00276 .                                                          Linguistics, Online, 2020, pp. 4320–4333. URL: https:
[6] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Ka-               //aclanthology.org/2020.acl-main.398. doi:10.18653/
    plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas-             v1/2020.acl- main.398 .
    try, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger,    [13] P. Yin, G. Neubig, W.-t. Yih, S. Riedel, TaBERT:
    T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu,          Pretraining for joint understanding of textual and
    C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin,              tabular data, in: D. Jurafsky, J. Chai, N. Schluter,
    S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish,           J. Tetreault (Eds.), Proceedings of the 58th An-
    A. Radford, I. Sutskever, D. Amodei, Language models             nual Meeting of the Association for Computational
    are few-shot learners, in: Proceedings of the 34th               Linguistics, Association for Computational Linguis-
    International Conference on Neural Information Pro-              tics, Online, 2020, pp. 8413–8426. URL: https://
    cessing Systems, NIPS’20, Curran Associates Inc., Red            aclanthology.org/2020.acl-main.745. doi:10.18653/
    Hook, NY, USA, 2020.                                             v1/2020.acl- main.745 .
[7] Z. Cheng, H. Dong, Z. Wang, R. Jia, J. Guo, Y. Gao,         [14] X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun,
    S. Han, J.-G. Lou, D. Zhang, HiTab: A hierarchical               M. Richardson, Structure-grounded pretraining for
    table dataset for question answering and natural lan-            text-to-SQL, in: K. Toutanova, A. Rumshisky, L. Zettle-
    guage generation, in: S. Muresan, P. Nakov, A. Villav-           moyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cot-
    icencio (Eds.), Proceedings of the 60th Annual Meet-             terell, T. Chakraborty, Y. Zhou (Eds.), Proceedings of
    ing of the Association for Computational Linguistics             the 2021 Conference of the North American Chapter
    (Volume 1: Long Papers), Association for Computa-                of the Association for Computational Linguistics: Hu-
    tional Linguistics, Dublin, Ireland, 2022, pp. 1094–1110.        man Language Technologies, Association for Compu-
    URL: https://aclanthology.org/2022.acl-long.78. doi:10.          tational Linguistics, Online, 2021, pp. 1337–1350. URL:
    18653/v1/2022.acl- long.78 .                                     https://aclanthology.org/2021.naacl-main.105. doi:10.
[8] Y. Katsis, S. Chemmengath, V. Kumar, S. Bharad-                  18653/v1/2021.naacl- main.105 .
    waj, M. Canim, M. Glass, A. Gliozzo, F. Pan, J. Sen,        [15] J. Eisenschlos, M. Gor, T. Müller, W. Cohen, MATE:
    K. Sankaranarayanan, S. Chakrabarti, AIT-QA: Ques-               Multi-view attention for table transformer efficiency,
    tion answering dataset over complex tables in the                in: M.-F. Moens, X. Huang, L. Specia, S. W.-t. Yih
    airline industry, in: A. Loukina, R. Gangadharaiah,              (Eds.), Proceedings of the 2021 Conference on Em-
    B. Min (Eds.), Proceedings of the 2022 Conference of             pirical Methods in Natural Language Processing, As-
    the North American Chapter of the Association for                sociation for Computational Linguistics, Online and
    Computational Linguistics: Human Language Tech-                  Punta Cana, Dominican Republic, 2021, pp. 7606–7619.
    nologies: Industry Track, Association for Compu-                 URL: https://aclanthology.org/2021.emnlp-main.600.
    tational Linguistics, Hybrid: Seattle, Washington +              doi:10.18653/v1/2021.emnlp- main.600 .
    Online, 2022, pp. 305–314. URL: https://aclanthology.       [16] J. Yang, A. Gupta, S. Upadhyay, L. He, R. Goel, S. Paul,
    org/2022.naacl-industry.34. doi:10.18653/v1/2022.                TableFormer: Robust transformer modeling for table-
    naacl- industry.34 .                                             text encoding, in: S. Muresan, P. Nakov, A. Villavi-
[9] V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu,                  cencio (Eds.), Proceedings of the 60th Annual Meet-
    S. Edunov, D. Chen, W.-t. Yih, Dense passage retrieval           ing of the Association for Computational Linguistics
    for open-domain question answering, in: B. Web-                  (Volume 1: Long Papers), Association for Computa-
    ber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the           tional Linguistics, Dublin, Ireland, 2022, pp. 528–537.
    2020 Conference on Empirical Methods in Natural                  URL: https://aclanthology.org/2022.acl-long.40. doi:10.
     18653/v1/2022.acl- long.40 .
[17] H. Iida, D. Thai, V. Manjunatha, M. Iyyer, TAB-
     BIE: Pretrained representations of tabular data,
     in: K. Toutanova, A. Rumshisky, L. Zettlemoyer,
     D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell,
     T. Chakraborty, Y. Zhou (Eds.), Proceedings of the
     2021 Conference of the North American Chapter of
     the Association for Computational Linguistics: Hu-
     man Language Technologies, Association for Compu-
     tational Linguistics, Online, 2021, pp. 3446–3456. URL:
     https://aclanthology.org/2021.naacl-main.270. doi:10.
     18653/v1/2021.naacl- main.270 .
[18] X. Deng, H. Sun, A. Lees, Y. Wu, C. Yu, Turl:
     table understanding through representation learn-
     ing,      Proc. VLDB Endow. 14 (2020) 307–319.
     URL: https://doi.org/10.14778/3430915.3430921. doi:10.
     14778/3430915.3430921 .
[19] L. Zhang, S. Zhang, K. Balog, Table2vec: Neural word
     and entity embeddings for table population and re-
     trieval, in: Proceedings of the 42nd International ACM
     SIGIR Conference on Research and Development in In-
     formation Retrieval, SIGIR’19, Association for Comput-
     ing Machinery, New York, NY, USA, 2019, p. 1029–1032.
     URL: https://doi.org/10.1145/3331184.3331333. doi:10.
     1145/3331184.3331333 .
[20] F. Wang, K. Sun, M. Chen, J. Pujara, P. Szekely, Retriev-
     ing complex tables with multi-granular graph repre-
     sentation learning, SIGIR ’21, Association for Comput-
     ing Machinery, New York, NY, USA, 2021, p. 1472–1482.
     URL: https://doi.org/10.1145/3404835.3462909. doi:10.
     1145/3404835.3462909 .
[21] S. Zhang, K. Balog, Semantic tablenbsp;retrieval us-
     ing keyword and table queries, ACM Trans. Web 15
     (2021). URL: https://doi.org/10.1145/3441690. doi:10.
     1145/3441690 .