<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>for Tables in Scientific Literature</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anupam Joshi</string-name>
          <email>joshi@umbc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Varish Mulwad</string-name>
          <email>varish.mulwad@ge.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Finin</string-name>
          <email>finin@umbc.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vijay S. Kumar</string-name>
          <email>v.kumar@ge.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jenny Weisenberg Williams</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharad Dixit</string-name>
          <email>sharad.dixit@ge.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GE Research</institution>
          ,
          <addr-line>1 Research Circle, Niskayuna, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>GE Research, John F. Welch Technology Center</institution>
          ,
          <addr-line>Whitefield, Bengaluru</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Third AAAI Workshop on Scientific Document Understanding</institution>
          ,
          <addr-line>2023</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore County, 1000 Hilltop Circle, Baltimore, MD</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Entity linking is an essential step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents-including the retrieval of relevant information present in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domain-specific entities, especially those embedded within tables drawn from COVID-19-related scientific literature. We describe the setup of an eficient ofline instance of the system that enables our entity-linking approach to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage the structural and semantic characteristics of the tables to improve overall entity linking performance.</p>
      </abstract>
      <kwd-group>
        <kwd>entity linking</kwd>
        <kwd>knowledge graph</kwd>
        <kwd>tables</kwd>
        <kwd>scientific documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>The rapid pace of research in dynamic, fast-evolving
scenarios, as recently exemplified by COVID-19 and the
ject [1], has necessitated more machine-driven,
humaninterpretable approaches to scientific knowledge
discovery. Open datasets like CORD-19 [2] have motivated
novel techniques and tools for keyword/semantic search
and Q&amp;A, recommendation, and summarization of
scientific documents. As with the web, discovery from
scientific literature is predominantly associated with searching
over unstructured textual content. Domain-specific
neural search engines [3, 4] typically produce ranked lists of
matching articles in response to search requests, while
mainstream information retrieval methods may also
deliver direct short, targeted responses (drawn from text)
to queries. To facilitate such a search, Sohrab et al. [5]
set of CORD-19 articles to demonstrate the fundamental
tasks of named entity recognition and entity linking for</p>
      <sec id="sec-2-1">
        <title>COVID-19-related entities found in the text.</title>
      </sec>
      <sec id="sec-2-2">
        <title>Besides text, alternative modalities such as tables and</title>
        <p>∗Corresponding author.
(A. Joshi)
(A. Joshi)
CEUR
htp:/ceur-ws.org
ISN1613-073
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).</p>
        <p>CEUR</p>
        <p>Workshop Proceedings (CEUR-WS.org)
charts have come to play a considerable role in how the
scientific community succinctly conveys descriptive
information in the literature. Our experience assembling a
corpus of over 62,000 open-access coronavirus-related
arover 120,000 tables, underlining a wealth of latent
knowledge embedded within these structured artifacts. The
extraction and retrieval of relevant information from
these scientific tables is becoming increasingly critical to
emerging knowledge-driven applications. For example,
consider a genomic surveillance scenario seeking
information on treatment eficacies against the top prevalent
COVID-19 variants in each US state. Better responses
to such queries entail going beyond text and searching
relevant portions of or entire scientific tables for vital
knowledge nuggets, possibly fusing information from
multiple source tables on the fly.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Although learning-based representational models for</title>
        <p>relationally structured web tables, these models are
typically not tuned to unconventional structural complexity.
This is especially true for the dense and often implicit
semantics and difuse context inherent in scientific tables
in highly specialized domains [8]. Representing
scientific tables as semantically annotated linked data artifacts
accounts for structural complexities and enables explicit
reasoning over tabular content to infer their semantics
and relevance to search queries. Hence, entity linking is
fundamental to our end-to-end pipeline for constructing
such knowledge graphs of tables drawn from scientific
documents, as depicted in Figure 1.</p>
      </sec>
      <sec id="sec-2-4">
        <title>This paper presents an entity linking system to auto</title>
        <p>introduced the BENNERD system and an annotated sub- tabular data [7] show great promise for understanding
matically map the content of individual cells in scientific each property has an identifier starting with P. The
proptables to appropriate entries in the Wikidata knowledge erty P31 (instance of) links an item with its immediate
base [9]. To keep up with the scientific literature info- types, P279 (subclass of) links a concept item to its
imdemic, we architected a more eficient local, ofline link- mediate supertypes, and P1647 (subproperty of) links
ing system using periodic Wikidata knowledge dumps. properties to their immediate super-properties.
While the ensuing eficiency gains make our system more An entity has just one label in a given language, its
feasible in practice, we discuss the implications for link- “canonical name”. An entity can have any number of
ing performance. aliases in a language and can have a short description
in any language. Unlike other open knowledge graphs,
Wikidata includes and links to specialized knowledge
2. Entity Linking for Scientific from additional domain-specific knowledge resources.</p>
        <p>Text and Tables These include the Unified Medical Language System
(UMLS) [10] knowledge base and the Medical Subject
Given a mention of an entity in a document and a unique Headings (MeSH) thesaurus [11], which bring together
set of known entities defined in some knowledge base, biomedical vocabularies and standards to enable
interopentity linking refers to finding and assigning the entity erability.</p>
        <p>ID corresponding to the mentioned entity. Entities play Figure 2 shows an example of a simple scientific table
an essential role in text and are often used to describe with links to appropriate Wikidata items highlighting
what the text is about. Likewise, linking entity men- several high-level issues we addressed. One is that we
tions in the header and body cells of tables, as well as must consider the “header” cells (whether for columns or
linking entities in captions or other referring text, can rows) diferently from the regular table body cells. Note
help partly understand or infer the semantic meaning of that the third column’s header cell, Prevalence, has two
tables. We developed a general-purpose linker to link good candidate links: the concept Q719602 (“number of
entity mentions in text to items in (and to further ex- disease cases in a given population at a specific time”) and
tract useful information about items from) Wikidata. We the property P1193 (“portion in percent of a population
describe the linker’s customization and inner workings with a given disease or disorder”). We give preference in
for linking highly specialized, idiomatic content within such cases to using the property item over the concept
header and body cells of tables drawn from a corpus of item.</p>
        <p>COVID-19-related scientific literature. The middle header cell containing the text Lineage
illustrates a second issue: A simple linker might choose
2.1. Wikidata: Reference Knowledge Base the most common match for this based only on the text,
Q1517820 (“line of ancestors and descendants of a
person”). However, the cells in this column (e.g., B.1.1.7) are
all easily matched to Wikidata items whose immediate
type is Q104450895 (“variant of SARS-CoV-2”). Therefore,
we need to do joint inference using both the header cell
and a sample of its data cells to choose the best links for
both.</p>
        <p>The first column of the table highlights a third aspect
of the task: mining additional knowledge from resources
Wikidata [9] is a collaboratively edited multilingual
knowledge graph used to provide common data for
Wikimedia projects, with currently about 1.2 billion facts on
over 102 million items. Wikidata’s ontology has a
finegrained type system with more than two million types
and about 11 thousand properties, including an item’s
label, aliases, and description. Each Wikidata item has a
unique identifier beginning with Q, like Q3519875
(“National Institute of Allergy and Infectious Diseases”), and
connected to candidate Wikidata items. Wikidata items part of the linker’s configuration for a domain and task.
often link to other knowledge graphs, such as DBpedia These are ordered from best to worst as follows: (1)
Tar[12], that contain additional useful information. DBpedia, get types are those we want to find based on the mention
for example, has a short paragraph describing its items type identified by an NLP system; (2) Near-miss types
and links to types in the Yago fine-grained type system are close to the target types and often confused with the
[13]. targets by an NLP system; (3) Good types are ones that
are very relevant to the domain, such as a MESH term
2.2. Core Entity Linking Algorithm (Medical Subject Heading); (4) OK types include types
that are acceptable and common in many domains, such
Our entity linker takes a mention string (e.g., from a table as organizations, people, geo-political entities, and
locaheader or cell) and begins by retrieving a pre-specified tions; and (5) Bad types are ones we are not interested in
number of Wikidata items using the MediaWiki search (e.g., fictional characters, journal articles, musical groups)
API. This returns a ranked list containing each item’s and result in a candidate being immediately rejected.
Wikidata ID, label, aliases, and English language descrip- The type names of interest are mapped to Wikidata
tion. Next, we rerank candidates to promote ones that types via the linker’s configuration dictionary.
Extendresulted in an exact match of their mention string with ing this dictionary-enabled us to easily customize our
a Wikidata item’s label (best) or alias (second best). For linker to specific domains, such as COVID-19-related
scieach candidate, we use a SPARQL query to retrieve its entific research. For our domain, examples of good types
types, both immediate (P31) and inherited, via a chain of are Wikidata high-level classes corresponding to disease,
P279 links for concept super-classes and P1647 links for protein, chemical compound, vaccine type, and type of
property super-properties. statistic. OK types are those associated with the standard</p>
        <p>For specific domains, our linker leverages the ultra- OntoNotes [14] types, such as person, event, facility,
orifne-grained Wikidata type system to infer additional ganization, and location. Entities of these types often
domain types for an item by checking for specific domain- occur in biomedical tables. Our bad types cover things
relevant properties. We identified a custom set of Wiki- like songs, works of art, sports organizations, fictional
data item types and properties to support entity linking things, and other high-level types unlikely to be present
for the biomedical domain. For example, we infer the in medical tables. For example, there exist 83 Wikidata
mesh item type if an item has a MESH descriptor ID prop- items with the canonical name “virus”. These include
erty (P486) that connects the item with a UMLS Medical Q808, the infectious agent, as well as films, songs, musical
Subject Heading. albums, rock groups, painting, video games, musicians,</p>
        <p>When linking the text in a header cell, we give more professional wrestlers, and more.
weight to candidates that are Wikidata properties. For Finally, we have a mapping of near-miss types that
repexample, candidates for the text “location” include an resent types that are easily confused. A classic example is
item representing the geographic location (Q2221906) the OntoNotes types FAC (for facility) and LOC (for
locaas well as the property location (P276). While either tion) are easily confused by most NLP systems. An entity
might be relevant, our annotation methodology strongly like Wuhan Institute of Virology can be marked as an ORG,
preferred the latter. LOC, or FAC, depending on its context. Since locations</p>
        <p>The linker’s filtering and ranking of candidate items are a common type in tables for this domain, we can treat
are based initially on analyzing an item’s types. This an item identified as a FAC or ORG by a language
procestype of analysis is controlled by five lists of types that are sor as possibly referring to a location. Additional ranking
for an item’s prominence is then done using its number contemporary scientific publication rates.
of sitelinks, i.e., the number of links to other Wikimedia To address these API rate-limit bottlenecks, we initially
projects that contain information about the item. set up a transient caching layer for cell entity linking
re</p>
        <p>Beyond type analysis-based filtering, the last step is sults so that future requests to link the same mention
the ranking of the final candidates using a context span or string would be served from the cache, avoiding API
instring, if provided. The similarity of the context and the vocations. However, this strategy was insuficient, so we
item’s description is computed with embeddings from the decoupled our core entity linker from the public
WikispaCy [15] large language model and generates a score data altogether by architecting and progressively setting
that is used along with the item’s rank in the candidate list up a more eficient system using local periodic dumps of
to select and return the best link. This worked reasonably relevant Wikidata knowledge.
well for both well-structured text (e.g., table captions) and The system is ofline because the linker no longer relies
for collections of terms from the row and column headers on Wikidata APIs. Wikidata’s complex software
archiand could be improved by using an embedding model tecture [17] and its enormous size make it challenging
ifne-tuned on the biomedical domain. to replicate locally in its entirety. That said, our entity
linker does not need all the capabilities that Wikidata
ofers. We targeted emulation strategies addressing
bot3. Eficient Entity Linking at Large tlenecks with cross-item graph search (via the Wikidata
Scale query service (WDQS) and Wikidata’s underlying RDF
triple store) and full-text search over items and their
propOur entity linker initially used the Wikidata and Wiki- erties (via the Action API and underlying CirrusSearch
media APIs to retrieve the initial ranked list of Wikidata Wikibase extension). We leverage proven open-source
candidate items and their type and supertype informa- storage technologies such as the Elasticsearch engine and
tion. Since Wikidata is a public resource, the APIs are the Redis key-value store to emulate underlying Wikidata
understandably rate-limited such that unreasonable ac- capabilities, as depicted in Figure 4.
cess requests and query rates in excess of established
limits may lead to IP address blacklisting [16]. The
table in Figure 3 breaks down our average observed entity
linking time to link a single exemplar mention string
to a Wikidata entity while operating under the above
limits. Accessing public Wikidata APIs, our linker can
operate no faster than around 30 seconds per entity. For
our dataset of 120,000+ tables (a rate reflective of the
COVID-19 infodemic), annotating even just 10 cells per
table at this rate could end up taking over a year.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Furthermore, when applying entity linking to infer</title>
        <p>table semantics (see next section), the linking of a single
header cell could, in turn, translate to the linking of all
other cells in the respective column or row—potentially
placing far greater stress on the linker. As a result, while
Wikidata APIs facilitated a proof of concept of our core
entity linking algorithm, they cannot sustain a
practical, scalable linking service capable of keeping up with</p>
      </sec>
      <sec id="sec-2-6">
        <title>We implemented this system by uploading partial</title>
        <p>JSON dumps of Wikidata items, their basic attributes
(label, aliases, description), specific types, and ‘sitelinks’
counts1 into a local Elasticsearch index. This resulted in
a locally searchable collection of 95.8M items. Ofline, we
retrieved the current type hierarchy (by traversing P31
and P279 property relationships) and loaded the resulting
dictionary, mapping each of Wikidata’s 2.6M types to its
supertypes into Redis. This reduced determining if an
entity was an instance of a given type (direct or inherited)
to a dictionary lookup.</p>
        <p>In this eficient entity linking system, an initial
candidate search is performed using an Elasticsearch
multimatch query that compares a mention string against
labels and aliases. In lieu of Wikidata’s CirrusSearch
rank</p>
      </sec>
      <sec id="sec-2-7">
        <title>1A Wikidata item’s sitelinks property is the number of other Wiki</title>
        <p>media sites such as Wikipedia, Wikisource, and Wikivoyage in
which it appears. It is commonly used as a metric for the item’s
importance.
ing mechanisms, we use an item’s sitelinks count (i.e.,
popularity) as a proxy for its prominence and rank
candidates in descending order of their sitelinks counts. Once
we have a ranked list of candidates for each item, we
query Redis using the item’s entity ID and direct types as
keys to retrieve associated inherited types. Type analysis
and re-ranking then proceed as before.</p>
        <p>Figure 5 shows a progression in replacing Wikidata API
invocations with queries to these local knowledge stores.
The resulting system trades linking accuracy for a
threefold improvement in linking eficiency, with the potential
for even further speedups via parallel processing. The
impact on entity linking performance is largely dictated
by the quality of the initial ranked candidate list returned
by our Elasticsearch query. We are exploring techniques
like PageRank to estimate an item’s relative importance
better.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Entity Linking to Infer Semantics of Tables</title>
      <p>• Specialists: We use pattern-based or
machinelearning approaches to independently assess
commonly encoded data types in table cells to avoid
linking those cell values that are deemed to be
specific kinds of literals (e.g., RNA/DNA sequences
or Clinical Trial IDs).</p>
      <p>Our entity linking system achieves a fair degree of
accuracy in linking table cells to Wikidata items. We
based our evaluations on a manually annotated subset
of 47 tables extracted from 45 COVID-19-related articles
drawn randomly from PubMed Central [6]. Of the 910
table cells (out of a total of 3600 manually annotated cells
in these tables) expected to be mapped to a Wikidata item,
our linker achieved a recall of 0.82 when the expected
annotation was part of the linker’s initial candidate item
set, and a precision of 0.51 over the subset of these cells
with expected Wikidata annotations.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Discussion and Conclusions</title>
      <sec id="sec-4-1">
        <title>Existing NLP tools for entity linking like spaCy [15] sup</title>
        <p>The meaning of text derives from its constituent words, port a very limited entity type system, often based on
which in turn are understood using grammatical knowl- just Ontonotes 5.0 types (e.g., PER, ORG, LOC, FAC) and
edge and context provided by surrounding text. Inferring do not cover specialized scientific entities. The SemTab
the intended meaning of tables additionally requires in- challenge on Tabular Data to Knowledge Graph
Matchterpreting row/column headers and relations between ing focuses on three mapping tasks aimed at inferring the
them, besides linking cell values to entities. To improve semantics of web tables [19]. While it recently included
entity linking performance for inferring the semantics tables from biology literature, leading tabular entity
linkof scientific tables, we supplement our core algorithm ing systems [20] do not adequately cover domain-specific
with other techniques (beyond the scope of this paper), entities. Bespoke entity linking systems for
COVID-19as shown in Figure 1. These include: related entities [5] link against UMLS and do not exploit
the extensive type hierarchy or entity coverage of
Wiki• Rule-based syntactic characterization: We cate- data.</p>
        <p>gorize tables into types (e.g., horizontal) based on Part of our goal is to fill this missing gap with a
practheir structure, tical entity linking system that can not only be adapted
• Joint inference based on embeddings of Wikidata for domain-specific entities but can also help infer table
items. We use Wembedder-driven [18] cluster- semantics with high accuracy by leveraging Wikidata’s
ing operations to compute compatibility between rich type system. As entity linking of tables against
Wikientities and to jointly assign entities to cells in a data at large scale is bottlenecked by rate-limited APIs
column, and [21], we built an ofline version of our linking system,
achieving a three-fold improvement in eficiency while
sacrificing a tolerable reduction in linking performance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This research is based on work supported in part by the</title>
        <p>Ofice of the Director of National Intelligence (ODNI),
Intelligence Advanced Research Projects Activity (IARPA),
via [2021-21022600004]. The views and conclusions
contained herein are those of the authors and should not be
interpreted as necessarily representing the oficial
policies, either expressed or implied, of ODNI, IARPA, or the
U.S. Government.</p>
      </sec>
      <sec id="sec-5-2">
        <title>International Conference on Information Reuse</title>
        <p>and Integration (IEEE IRI 2014), 2014, pp. 677–686.</p>
        <p>doi:10.1109/IRI.2014.7051955.
[9] D. Vrandečić, M. Krötzsch, Wikidata: a free
collaborative knowledgebase, Communications of the</p>
        <p>ACM 57 (2014) 78–85.
[10] O. Bodenreider, The unified medical language
system (umls): integrating biomedical terminology,</p>
        <p>Nucleic acids research 32 (2004) D267–D270.
[11] C. E. Lipscomb, Medical subject headings (mesh),</p>
        <p>Bulletin of the Medical Library Association 88
(2000) 265.
[12] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer,</p>
        <p>C. Becker, R. Cyganiak, S. Hellmann, Dbpedia-a
crystallization point for the web of data, Journal of
web semantics 7 (2009) 154–165.
[13] F. M. Suchanek, G. Kasneci, G. Weikum, Yago: a
[1] H. Else, Covid in papers: a torrent of science, Na- core of semantic knowledge, in: Proceedings of the
ture (2020) 553–553. 16th international conference on World Wide Web,
[2] L. L. Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang, 2007, pp. 697–706.</p>
        <p>D. Burdick, D. Eide, K. Funk, Y. Katsis, R. M. Kinney, [14] R. Weischedel, S. Pradhan, L. Ramshaw, J.
Kaufet al., Cord-19: The covid-19 open research dataset, man, M. Franchini, M. El-Bachouti, N. Xue,
in: Proceedings of the 1st Workshop on NLP for M. Palmer, J. D. Hwang, C. Bonial, Ontonotes
COVID-19 at ACL 2020, 2020. release 5.0, 2013. doi:DOI:https://doi.org/10.
[3] E. Zhang, N. Gupta, R. Nogueira, K. Cho, J. Lin, 35111/xmhb- 2b84.</p>
        <p>Rapidly deploying a neural search engine for [15] M. Honnibal, I. Montani, S. Van Landeghem,
the covid-19 open research dataset: Preliminary A. Boyd, et al., spacy: Industrial-strength natural
thoughts and lessons learned, in: ACL 2020 Work- language processing in python (2020).
shop on Natural Language Processing for COVID- [16] Wikidata, Wikidata query service user
man19 (NLP-COVID), 2020. ual, https://www.mediawiki.org/wiki/Wikidata_
[4] K. Hall, n nlu-powered tool to explore covid-19 Query_Service/User_Manual, 2022. Accessed:
2022scientific literature, https://ai.googleblog.com/2020/ 11-02.
05/an-nlu-powered-tool-to-explore-covid-19.html, [17] Wikidata, Wikidata architecture, https:
2020. Accessed = 2022-11-02. //upload.wikimedia.org/wikipedia/commons/
[5] M. G. Sohrab, K. Duong, M. Miwa, G. Topić, 2/2e/Wikidata_Architecture_Overview_-_High_
I. Masami, T. Hiroya, BENNERD: A neural Level.svg, 2018. Accessed: 2022-11-02.
named entity linking system for COVID-19, in: [18] F. Å. Nielsen, Wembedder: Wikidata entity
emQ. Liu, D. Schlangen (Eds.), Proceedings of the bedding web service, preprint arXiv:1710.04099
2020 Conference on Empirical Methods in Natu- (2017).
ral Language Processing: System Demonstrations, [19] E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou,
Association for Computational Linguistics, On- J. Chen, K. Srinivas, Semtab 2019: Resources to
line, 2020, pp. 182–188. URL: https://aclanthology. benchmark tabular data to knowledge graph
matchorg/2020.emnlp-demos.24. doi:10.18653/v1/2020. ing systems, in: Proceedings of the 17th
Internaemnlp- demos.24. tional Conference European Semantic Web
Confer[6] National Library of Medicine, PMC open access ence, Springer, 2020, pp. 514–530.
subset, https://www.ncbi.nlm.nih.gov/pmc/tools/ [20] Y. Chabot, T. Labbé, J. Liu, R. Troncy, Dagobah: An
openftlist/, 2022. Accessed = 2022-11-02. end-to-end context-free tabular data semantic
anno[7] P. Yin, G. Neubig, W.-t. Yih, S. Riedel, Tabert: Pre- tation system, in: The 18th International Semantic
training for joint understanding of textual and tab- Web Conference, 2019, pp. 41–48.
ular data, in: Proceedings of the 58th Annual Meet- [21] P. Nguyen, H. Takeda, Wikidata-lite for knowledge
ing of the Association for Computational Linguis- extraction and exploration, in: 2022 IEEE
Intertics, 2020, pp. 8413–8426. national Conference on Big Data (Big Data), IEEE,
[8] V. Mulwad, T. Finin, A. Joshi, Interpreting medical 2022, pp. 3684–3686.</p>
        <p>tables as linked data for generating meta-analysis
reports, in: Proceedings of the 2014 IEEE 15th</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>