=Paper= {{Paper |id=Vol-3656/paper6 |storemode=property |title=A Practical Entity Linking System for Tables in Scientific Literature |pdfUrl=https://ceur-ws.org/Vol-3656/paper6.pdf |volume=Vol-3656 |authors=Varish Mulwad,Tim Finin,Vijay S. Kumar,Jenny Weisenberg Williams,Sharad Dixit,Anupam Joshi |dblpUrl=https://dblp.org/rec/conf/aaai/MulwadFKWDJ23 }} ==A Practical Entity Linking System for Tables in Scientific Literature== https://ceur-ws.org/Vol-3656/paper6.pdf
                                A Practical Entity Linking System for Tables in Scientific
                                Literature
                                Varish Mulwad1,∗ , Tim Finin2 , Vijay S. Kumar3 , Jenny Weisenberg Williams3 , Sharad Dixit3
                                and Anupam Joshi2
                                1
                                  GE Research, John F. Welch Technology Center, Whitefield, Bengaluru, India
                                2
                                  University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD, USA
                                3
                                  GE Research, 1 Research Circle, Niskayuna, NY, USA


                                                                          Abstract
                                                                          Entity linking is an essential step towards constructing knowledge graphs that facilitate advanced question answering over
                                                                          scientific documents—including the retrieval of relevant information present in tables within these documents. This paper
                                                                          introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt
                                                                          this system for linking domain-specific entities, especially those embedded within tables drawn from COVID-19-related
                                                                          scientific literature. We describe the setup of an efficient offline instance of the system that enables our entity-linking approach
                                                                          to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage
                                                                          the structural and semantic characteristics of the tables to improve overall entity linking performance.

                                                                          Keywords
                                                                          entity linking, knowledge graph, tables, scientific documents



                                1. Introduction                                                                                                   charts have come to play a considerable role in how the
                                                                                                                                                  scientific community succinctly conveys descriptive in-
                                The rapid pace of research in dynamic, fast-evolving formation in the literature. Our experience assembling a
                                scenarios, as recently exemplified by COVID-19 and the corpus of over 62,000 open-access coronavirus-related ar-
                                unprecedented volumes of scholarly literature on this sub- ticles from PubMed Central [6] between 2020-21 yielded
                                ject [1], has necessitated more machine-driven, human- over 120,000 tables, underlining a wealth of latent knowl-
                                interpretable approaches to scientific knowledge discov- edge embedded within these structured artifacts. The
                                ery. Open datasets like CORD-19 [2] have motivated extraction and retrieval of relevant information from
                                novel techniques and tools for keyword/semantic search these scientific tables is becoming increasingly critical to
                                and Q&A, recommendation, and summarization of scien- emerging knowledge-driven applications. For example,
                                tific documents. As with the web, discovery from scien- consider a genomic surveillance scenario seeking infor-
                                tific literature is predominantly associated with searching mation on treatment efficacies against the top prevalent
                                over unstructured textual content. Domain-specific neu- COVID-19 variants in each US state. Better responses
                                ral search engines [3, 4] typically produce ranked lists of to such queries entail going beyond text and searching
                                matching articles in response to search requests, while relevant portions of or entire scientific tables for vital
                                mainstream information retrieval methods may also de- knowledge nuggets, possibly fusing information from
                                liver direct short, targeted responses (drawn from text) multiple source tables on the fly.
                                to queries. To facilitate such a search, Sohrab et al. [5]                                                            Although learning-based representational models for
                                introduced the BENNERD system and an annotated sub- tabular data [7] show great promise for understanding
                                set of CORD-19 articles to demonstrate the fundamental relationally structured web tables, these models are typi-
                                tasks of named entity recognition and entity linking for cally not tuned to unconventional structural complexity.
                                COVID-19-related entities found in the text.                                                                      This is especially true for the dense and often implicit
                                              Besides text, alternative modalities such as tables and semantics and diffuse context inherent in scientific tables
                                Third AAAI Workshop on Scientific Document Understanding, 2023
                                                                                                                                                  in highly specialized domains [8]. Representing scien-
                                ∗
                                     Corresponding author.                                                                                        tific tables as semantically annotated linked data artifacts
                                Envelope-Open varish.mulwad@ge.com (V. Mulwad); finin@umbc.edu                                                    accounts for structural complexities and enables explicit
                                (T. Finin); v.kumar@ge.com (V. S. Kumar); weisenje@ge.com                                                         reasoning over tabular content to infer their semantics
                                (J. W. Williams); sharad.dixit@ge.com (S. Dixit); joshi@umbc.edu                                                  and relevance to search queries. Hence, entity linking is
                                (A. Joshi)
                                Orcid 0000-0001-9113-5952 (V. Mulwad); 0000-0002-6593-1792
                                                                                                                                                  fundamental to our end-to-end pipeline for constructing
                                (T. Finin); 0000-0003-2234-1546 (V. S. Kumar); 0000-0002-8641-3193 such knowledge graphs of tables drawn from scientific
                                (A. Joshi)                                                                                                        documents, as depicted in Figure 1.
                                                     © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                                     Attribution 4.0 International (CC BY 4.0).                                                       This paper presents an entity linking system to auto-
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Figure 1: Entity linking and its role in constructing knowledge graphs from scientific tables



matically map the content of individual cells in scientific    each property has an identifier starting with P. The prop-
tables to appropriate entries in the Wikidata knowledge        erty P31 (instance of) links an item with its immediate
base [9]. To keep up with the scientific literature info-      types, P279 (subclass of) links a concept item to its im-
demic, we architected a more efficient local, offline link-    mediate supertypes, and P1647 (subproperty of) links
ing system using periodic Wikidata knowledge dumps.            properties to their immediate super-properties.
While the ensuing efficiency gains make our system more           An entity has just one label in a given language, its
feasible in practice, we discuss the implications for link-    “canonical name”. An entity can have any number of
ing performance.                                               aliases in a language and can have a short description
                                                               in any language. Unlike other open knowledge graphs,
                                                               Wikidata includes and links to specialized knowledge
2. Entity Linking for Scientific                               from additional domain-specific knowledge resources.
   Text and Tables                                             These include the Unified Medical Language System
                                                               (UMLS) [10] knowledge base and the Medical Subject
Given a mention of an entity in a document and a unique        Headings (MeSH) thesaurus [11], which bring together
set of known entities defined in some knowledge base,          biomedical vocabularies and standards to enable interop-
entity linking refers to finding and assigning the entity      erability.
ID corresponding to the mentioned entity. Entities play           Figure 2 shows an example of a simple scientific table
an essential role in text and are often used to describe       with links to appropriate Wikidata items highlighting
what the text is about. Likewise, linking entity men-          several high-level issues we addressed. One is that we
tions in the header and body cells of tables, as well as       must consider the “header” cells (whether for columns or
linking entities in captions or other referring text, can      rows) differently from the regular table body cells. Note
help partly understand or infer the semantic meaning of        that the third column’s header cell, Prevalence, has two
tables. We developed a general-purpose linker to link          good candidate links: the concept Q719602 (“number of
entity mentions in text to items in (and to further ex-        disease cases in a given population at a specific time”) and
tract useful information about items from) Wikidata. We        the property P1193 (“portion in percent of a population
describe the linker’s customization and inner workings         with a given disease or disorder”). We give preference in
for linking highly specialized, idiomatic content within       such cases to using the property item over the concept
header and body cells of tables drawn from a corpus of         item.
COVID-19-related scientific literature.                           The middle header cell containing the text Lineage
                                                               illustrates a second issue: A simple linker might choose
2.1. Wikidata: Reference Knowledge Base                        the most common match for this based only on the text,
                                                               Q1517820 (“line of ancestors and descendants of a per-
Wikidata [9] is a collaboratively edited multilingual          son”). However, the cells in this column (e.g., B.1.1.7) are
knowledge graph used to provide common data for Wiki-          all easily matched to Wikidata items whose immediate
media projects, with currently about 1.2 billion facts on      type is Q104450895 (“variant of SARS-CoV-2”). Therefore,
over 102 million items. Wikidata’s ontology has a fine-        we need to do joint inference using both the header cell
grained type system with more than two million types           and a sample of its data cells to choose the best links for
and about 11 thousand properties, including an item’s          both.
label, aliases, and description. Each Wikidata item has a         The first column of the table highlights a third aspect
unique identifier beginning with Q, like Q3519875 (“Na-        of the task: mining additional knowledge from resources
tional Institute of Allergy and Infectious Diseases”), and
Figure 2: Examples of header and cell annotations of links to Wikidata items and properties



connected to candidate Wikidata items. Wikidata items part of the linker’s configuration for a domain and task.
often link to other knowledge graphs, such as DBpedia These are ordered from best to worst as follows: (1) Tar-
[12], that contain additional useful information. DBpedia, get types are those we want to find based on the mention
for example, has a short paragraph describing its items type identified by an NLP system; (2) Near-miss types
and links to types in the Yago fine-grained type system are close to the target types and often confused with the
[13].                                                          targets by an NLP system; (3) Good types are ones that
                                                               are very relevant to the domain, such as a MESH term
2.2. Core Entity Linking Algorithm                             (Medical   Subject Heading); (4) OK types include types
                                                               that are acceptable and common in many domains, such
Our entity linker takes a mention string (e.g., from a table as organizations, people, geo-political entities, and loca-
header or cell) and begins by retrieving a pre-specified tions; and (5) Bad types are ones we are not interested in
number of Wikidata items using the MediaWiki search (e.g., fictional characters, journal articles, musical groups)
API. This returns a ranked list containing each item’s and result in a candidate being immediately rejected.
Wikidata ID, label, aliases, and English language descrip-        The type names of interest are mapped to Wikidata
tion. Next, we rerank candidates to promote ones that types via the linker’s configuration dictionary. Extend-
resulted in an exact match of their mention string with ing this dictionary-enabled us to easily customize our
a Wikidata item’s label (best) or alias (second best). For linker to specific domains, such as COVID-19-related sci-
each candidate, we use a SPARQL query to retrieve its entific research. For our domain, examples of good types
types, both immediate (P31) and inherited, via a chain of are Wikidata high-level classes corresponding to disease,
P279 links for concept super-classes and P1647 links for protein, chemical compound, vaccine type, and type of
property super-properties.                                     statistic. OK types are those associated with the standard
   For specific domains, our linker leverages the ultra- OntoNotes [14] types, such as person, event, facility, or-
fine-grained Wikidata type system to infer additional ganization, and location. Entities of these types often
domain types for an item by checking for specific domain- occur in biomedical tables. Our bad types cover things
relevant properties. We identified a custom set of Wiki- like songs, works of art, sports organizations, fictional
data item types and properties to support entity linking things, and other high-level types unlikely to be present
for the biomedical domain. For example, we infer the in medical tables. For example, there exist 83 Wikidata
mesh item type if an item has a MESH descriptor ID prop- items with the canonical name “virus”. These include
erty (P486) that connects the item with a UMLS Medical Q808, the infectious agent, as well as films, songs, musical
Subject Heading.                                               albums, rock groups, painting, video games, musicians,
   When linking the text in a header cell, we give more professional wrestlers, and more.
weight to candidates that are Wikidata properties. For            Finally, we have a mapping of near-miss types that rep-
example, candidates for the text “location” include an resent types that are easily confused. A classic example is
item representing the geographic location (Q2221906) the OntoNotes types FAC (for facility) and LOC (for loca-
as well as the property location (P276). While either tion) are easily confused by most NLP systems. An entity
might be relevant, our annotation methodology strongly like Wuhan Institute of Virology can be marked as an ORG,
preferred the latter.                                          LOC, or FAC, depending on its context. Since locations
   The linker’s filtering and ranking of candidate items are a common type in tables for this domain, we can treat
are based initially on analyzing an item’s types. This an item identified as a FAC or ORG by a language proces-
type of analysis is controlled by five lists of types that are sor as possibly referring to a location. Additional ranking
for an item’s prominence is then done using its number     contemporary scientific publication rates.
of sitelinks, i.e., the number of links to other Wikimedia    To address these API rate-limit bottlenecks, we initially
projects that contain information about the item.          set up a transient caching layer for cell entity linking re-
   Beyond type analysis-based filtering, the last step is  sults so that future requests to link the same mention
the ranking of the final candidates using a context span orstring would be served from the cache, avoiding API in-
string, if provided. The similarity of the context and the vocations. However, this strategy was insufficient, so we
item’s description is computed with embeddings from the    decoupled our core entity linker from the public Wiki-
spaCy [15] large language model and generates a score      data altogether by architecting and progressively setting
that is used along with the item’s rank in the candidate list
                                                           up a more efficient system using local periodic dumps of
to select and return the best link. This worked reasonably relevant Wikidata knowledge.
well for both well-structured text (e.g., table captions) and The system is offline because the linker no longer relies
for collections of terms from the row and column headers   on Wikidata APIs. Wikidata’s complex software archi-
and could be improved by using an embedding model          tecture [17] and its enormous size make it challenging
fine-tuned on the biomedical domain.                       to replicate locally in its entirety. That said, our entity
                                                           linker does not need all the capabilities that Wikidata
                                                           offers. We targeted emulation strategies addressing bot-
3. Efficient Entity Linking at Large tlenecks with cross-item graph search (via the Wikidata
     Scale                                                 query service (WDQS) and Wikidata’s underlying RDF
                                                           triple store) and full-text search over items and their prop-
Our entity linker initially used the Wikidata and Wiki- erties (via the Action API and underlying CirrusSearch
media APIs to retrieve the initial ranked list of Wikidata Wikibase extension). We leverage proven open-source
candidate items and their type and supertype informa- storage technologies such as the Elasticsearch engine and
tion. Since Wikidata is a public resource, the APIs are the Redis key-value store to emulate underlying Wikidata
understandably rate-limited such that unreasonable ac- capabilities, as depicted in Figure 4.
cess requests and query rates in excess of established
limits may lead to IP address blacklisting [16]. The ta-
ble in Figure 3 breaks down our average observed entity
linking time to link a single exemplar mention string
to a Wikidata entity while operating under the above
limits. Accessing public Wikidata APIs, our linker can
operate no faster than around 30 seconds per entity. For
our dataset of 120,000+ tables (a rate reflective of the
COVID-19 infodemic), annotating even just 10 cells per
                                                           Figure 4: Functional architecture of an efficient ‘offline’ entity
table at this rate could end up taking over a year.        linker


                                                                   We implemented this system by uploading partial
                                                                JSON dumps of Wikidata items, their basic attributes
                                                                (label, aliases, description), specific types, and ‘sitelinks’
                                                                counts1 into a local Elasticsearch index. This resulted in
                                                                a locally searchable collection of 95.8M items. Offline, we
                                                                retrieved the current type hierarchy (by traversing P31
                                                                and P279 property relationships) and loaded the resulting
                                                                dictionary, mapping each of Wikidata’s 2.6M types to its
                                                                supertypes into Redis. This reduced determining if an
Figure 3: Entity linking time using Wikidata APIs               entity was an instance of a given type (direct or inherited)
                                                                to a dictionary lookup.
   Furthermore, when applying entity linking to infer              In this efficient entity linking system, an initial can-
table semantics (see next section), the linking of a single     didate search is performed using an Elasticsearch multi-
header cell could, in turn, translate to the linking of all     match query that compares a mention string against la-
other cells in the respective column or row—potentially         bels and aliases. In lieu of Wikidata’s CirrusSearch rank-
placing far greater stress on the linker. As a result, while
                                                                1
Wikidata APIs facilitated a proof of concept of our core            A Wikidata item’s sitelinks property is the number of other Wiki-
                                                                    media sites such as Wikipedia, Wikisource, and Wikivoyage in
entity linking algorithm, they cannot sustain a practi-             which it appears. It is commonly used as a metric for the item’s
cal, scalable linking service capable of keeping up with            importance.
Figure 5: Replacing the entity linker’s use of public Wikidata APIs with efficient offline, local queries



ing mechanisms, we use an item’s sitelinks count (i.e.,                • Specialists: We use pattern-based or machine-
popularity) as a proxy for its prominence and rank candi-                learning approaches to independently assess com-
dates in descending order of their sitelinks counts. Once                monly encoded data types in table cells to avoid
we have a ranked list of candidates for each item, we                    linking those cell values that are deemed to be spe-
query Redis using the item’s entity ID and direct types as               cific kinds of literals (e.g., RNA/DNA sequences
keys to retrieve associated inherited types. Type analysis               or Clinical Trial IDs).
and re-ranking then proceed as before.
                                                                    Our entity linking system achieves a fair degree of
   Figure 5 shows a progression in replacing Wikidata API
                                                                 accuracy in linking table cells to Wikidata items. We
invocations with queries to these local knowledge stores.
                                                                 based our evaluations on a manually annotated subset
The resulting system trades linking accuracy for a three-
                                                                 of 47 tables extracted from 45 COVID-19-related articles
fold improvement in linking efficiency, with the potential
                                                                 drawn randomly from PubMed Central [6]. Of the 910
for even further speedups via parallel processing. The
                                                                 table cells (out of a total of 3600 manually annotated cells
impact on entity linking performance is largely dictated
                                                                 in these tables) expected to be mapped to a Wikidata item,
by the quality of the initial ranked candidate list returned
                                                                 our linker achieved a recall of 0.82 when the expected
by our Elasticsearch query. We are exploring techniques
                                                                 annotation was part of the linker’s initial candidate item
like PageRank to estimate an item’s relative importance
                                                                 set, and a precision of 0.51 over the subset of these cells
better.
                                                                 with expected Wikidata annotations.

4. Entity Linking to Infer                                       5. Discussion and Conclusions
   Semantics of Tables
                                                             Existing NLP tools for entity linking like spaCy [15] sup-
The meaning of text derives from its constituent words,      port a very limited entity type system, often based on
which in turn are understood using grammatical knowl-        just Ontonotes 5.0 types (e.g., PER, ORG, LOC, FAC) and
edge and context provided by surrounding text. Inferring     do not cover specialized scientific entities. The SemTab
the intended meaning of tables additionally requires in-     challenge on Tabular Data to Knowledge Graph Match-
terpreting row/column headers and relations between          ing focuses on three mapping tasks aimed at inferring the
them, besides linking cell values to entities. To improve    semantics of web tables [19]. While it recently included
entity linking performance for inferring the semantics       tables from biology literature, leading tabular entity link-
of scientific tables, we supplement our core algorithm       ing systems [20] do not adequately cover domain-specific
with other techniques (beyond the scope of this paper),      entities. Bespoke entity linking systems for COVID-19-
as shown in Figure 1. These include:                         related entities [5] link against UMLS and do not exploit
                                                             the extensive type hierarchy or entity coverage of Wiki-
     • Rule-based syntactic characterization: We cate- data.
       gorize tables into types (e.g., horizontal) based on     Part of our goal is to fill this missing gap with a prac-
       their structure,                                      tical entity linking system that can not only be adapted
     • Joint inference based on embeddings of Wikidata for domain-specific entities but can also help infer table
       items. We use Wembedder-driven [18] cluster- semantics with high accuracy by leveraging Wikidata’s
       ing operations to compute compatibility between rich type system. As entity linking of tables against Wiki-
       entities and to jointly assign entities to cells in a data at large scale is bottlenecked by rate-limited APIs
       column, and                                           [21], we built an offline version of our linking system,
achieving a three-fold improvement in efficiency while             International Conference on Information Reuse
sacrificing a tolerable reduction in linking performance.          and Integration (IEEE IRI 2014), 2014, pp. 677–686.
                                                                   doi:10.1109/IRI.2014.7051955 .
                                                               [9] D. Vrandečić, M. Krötzsch, Wikidata: a free col-
Acknowledgments                                                    laborative knowledgebase, Communications of the
                                                                   ACM 57 (2014) 78–85.
This research is based on work supported in part by the
                                                              [10] O. Bodenreider, The unified medical language sys-
Office of the Director of National Intelligence (ODNI), In-
                                                                   tem (umls): integrating biomedical terminology,
telligence Advanced Research Projects Activity (IARPA),
                                                                   Nucleic acids research 32 (2004) D267–D270.
via [2021-21022600004]. The views and conclusions con-
                                                              [11] C. E. Lipscomb, Medical subject headings (mesh),
tained herein are those of the authors and should not be
                                                                   Bulletin of the Medical Library Association 88
interpreted as necessarily representing the official poli-
                                                                   (2000) 265.
cies, either expressed or implied, of ODNI, IARPA, or the
                                                              [12] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer,
U.S. Government.
                                                                   C. Becker, R. Cyganiak, S. Hellmann, Dbpedia-a
                                                                   crystallization point for the web of data, Journal of
References                                                         web semantics 7 (2009) 154–165.
                                                              [13] F. M. Suchanek, G. Kasneci, G. Weikum, Yago: a
 [1] H. Else, Covid in papers: a torrent of science, Na-           core of semantic knowledge, in: Proceedings of the
     ture (2020) 553–553.                                          16th international conference on World Wide Web,
 [2] L. L. Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang,        2007, pp. 697–706.
     D. Burdick, D. Eide, K. Funk, Y. Katsis, R. M. Kinney,   [14] R. Weischedel, S. Pradhan, L. Ramshaw, J. Kauf-
     et al., Cord-19: The covid-19 open research dataset,          man, M. Franchini, M. El-Bachouti, N. Xue,
     in: Proceedings of the 1st Workshop on NLP for                M. Palmer, J. D. Hwang, C. Bonial, Ontonotes
     COVID-19 at ACL 2020, 2020.                                   release 5.0, 2013. doi:DOI:https://doi.org/10.
 [3] E. Zhang, N. Gupta, R. Nogueira, K. Cho, J. Lin,              35111/xmhb- 2b84 .
     Rapidly deploying a neural search engine for             [15] M. Honnibal, I. Montani, S. Van Landeghem,
     the covid-19 open research dataset: Preliminary               A. Boyd, et al., spacy: Industrial-strength natural
     thoughts and lessons learned, in: ACL 2020 Work-              language processing in python (2020).
     shop on Natural Language Processing for COVID-           [16] Wikidata, Wikidata query service user man-
     19 (NLP-COVID), 2020.                                         ual, https://www.mediawiki.org/wiki/Wikidata_
 [4] K. Hall, n nlu-powered tool to explore covid-19               Query_Service/User_Manual, 2022. Accessed: 2022-
     scientific literature, https://ai.googleblog.com/2020/        11-02.
     05/an-nlu-powered-tool-to-explore-covid-19.html,         [17] Wikidata,       Wikidata architecture,         https:
     2020. Accessed = 2022-11-02.                                  //upload.wikimedia.org/wikipedia/commons/
 [5] M. G. Sohrab, K. Duong, M. Miwa, G. Topić,                    2/2e/Wikidata_Architecture_Overview_-_High_
     I. Masami, T. Hiroya,           BENNERD: A neural             Level.svg, 2018. Accessed: 2022-11-02.
     named entity linking system for COVID-19, in:            [18] F. Å. Nielsen, Wembedder: Wikidata entity em-
     Q. Liu, D. Schlangen (Eds.), Proceedings of the               bedding web service, preprint arXiv:1710.04099
     2020 Conference on Empirical Methods in Natu-                 (2017).
     ral Language Processing: System Demonstrations,          [19] E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou,
     Association for Computational Linguistics, On-                J. Chen, K. Srinivas, Semtab 2019: Resources to
     line, 2020, pp. 182–188. URL: https://aclanthology.           benchmark tabular data to knowledge graph match-
     org/2020.emnlp-demos.24. doi:10.18653/v1/2020.                ing systems, in: Proceedings of the 17th Interna-
     emnlp- demos.24 .                                             tional Conference European Semantic Web Confer-
 [6] National Library of Medicine, PMC open access                 ence, Springer, 2020, pp. 514–530.
     subset, https://www.ncbi.nlm.nih.gov/pmc/tools/          [20] Y. Chabot, T. Labbé, J. Liu, R. Troncy, Dagobah: An
     openftlist/, 2022. Accessed = 2022-11-02.                     end-to-end context-free tabular data semantic anno-
 [7] P. Yin, G. Neubig, W.-t. Yih, S. Riedel, Tabert: Pre-         tation system, in: The 18th International Semantic
     training for joint understanding of textual and tab-          Web Conference, 2019, pp. 41–48.
     ular data, in: Proceedings of the 58th Annual Meet-      [21] P. Nguyen, H. Takeda, Wikidata-lite for knowledge
     ing of the Association for Computational Linguis-             extraction and exploration, in: 2022 IEEE Inter-
     tics, 2020, pp. 8413–8426.                                    national Conference on Big Data (Big Data), IEEE,
 [8] V. Mulwad, T. Finin, A. Joshi, Interpreting medical           2022, pp. 3684–3686.
     tables as linked data for generating meta-analysis
     reports, in: Proceedings of the 2014 IEEE 15th