<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Practical Entity Linking System for Tables in Scientific Literature</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Varish</forename><surname>Mulwad</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">GE Research</orgName>
								<orgName type="institution">John F. Welch Technology Center</orgName>
								<address>
									<settlement>Whitefield</settlement>
									<region>Bengaluru</region>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tim</forename><surname>Finin</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">University of Maryland</orgName>
								<address>
									<addrLine>Baltimore County, 1000 Hilltop Circle</addrLine>
									<settlement>Baltimore</settlement>
									<region>MD</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vijay</forename><forename type="middle">S</forename><surname>Kumar</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">GE Research</orgName>
								<address>
									<addrLine>1 Research Circle</addrLine>
									<settlement>Niskayuna</settlement>
									<region>NY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jenny</forename><forename type="middle">Weisenberg</forename><surname>Williams</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">GE Research</orgName>
								<address>
									<addrLine>1 Research Circle</addrLine>
									<settlement>Niskayuna</settlement>
									<region>NY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sharad</forename><surname>Dixit</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">GE Research</orgName>
								<address>
									<addrLine>1 Research Circle</addrLine>
									<settlement>Niskayuna</settlement>
									<region>NY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Anupam</forename><surname>Joshi</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">University of Maryland</orgName>
								<address>
									<addrLine>Baltimore County, 1000 Hilltop Circle</addrLine>
									<settlement>Baltimore</settlement>
									<region>MD</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Practical Entity Linking System for Tables in Scientific Literature</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">2868A89E52EEF119D1D98C4A223A00CE</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>entity linking</term>
					<term>knowledge graph</term>
					<term>tables</term>
					<term>scientific documents</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Entity linking is an essential step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents-including the retrieval of relevant information present in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domain-specific entities, especially those embedded within tables drawn from COVID-19-related scientific literature. We describe the setup of an efficient offline instance of the system that enables our entity-linking approach to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage the structural and semantic characteristics of the tables to improve overall entity linking performance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The rapid pace of research in dynamic, fast-evolving scenarios, as recently exemplified by COVID-19 and the unprecedented volumes of scholarly literature on this subject <ref type="bibr" target="#b0">[1]</ref>, has necessitated more machine-driven, humaninterpretable approaches to scientific knowledge discovery. Open datasets like CORD-19 <ref type="bibr">[2]</ref> have motivated novel techniques and tools for keyword/semantic search and Q&amp;A, recommendation, and summarization of scientific documents. As with the web, discovery from scientific literature is predominantly associated with searching over unstructured textual content. Domain-specific neural search engines <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref> typically produce ranked lists of matching articles in response to search requests, while mainstream information retrieval methods may also deliver direct short, targeted responses (drawn from text) to queries. To facilitate such a search, Sohrab et al. <ref type="bibr" target="#b4">[5]</ref> introduced the BENNERD system and an annotated subset of CORD-19 articles to demonstrate the fundamental tasks of named entity recognition and entity linking for COVID-19-related entities found in the text.</p><p>Besides text, alternative modalities such as tables and</p><p>Third AAAI Workshop on Scientific Document Understanding, 2023 * Corresponding author. Envelope varish.mulwad@ge.com (V. Mulwad); finin@umbc.edu (T. Finin); v.kumar@ge.com (V. S. Kumar); weisenje@ge.com (J. W. Williams); sharad.dixit@ge.com (S. Dixit); joshi@umbc.edu (A. Joshi) Orcid 0000-0001-9113-5952 (V. Mulwad); 0000-0002-6593-1792 (T. Finin); 0000-0003-2234-1546 (V. S. Kumar); 0000-0002-8641-3193 (A. Joshi)</p><p>charts have come to play a considerable role in how the scientific community succinctly conveys descriptive information in the literature. Our experience assembling a corpus of over 62,000 open-access coronavirus-related articles from PubMed Central <ref type="bibr" target="#b5">[6]</ref> between 2020-21 yielded over 120,000 tables, underlining a wealth of latent knowledge embedded within these structured artifacts. The extraction and retrieval of relevant information from these scientific tables is becoming increasingly critical to emerging knowledge-driven applications. For example, consider a genomic surveillance scenario seeking information on treatment efficacies against the top prevalent COVID-19 variants in each US state. Better responses to such queries entail going beyond text and searching relevant portions of or entire scientific tables for vital knowledge nuggets, possibly fusing information from multiple source tables on the fly. Although learning-based representational models for tabular data <ref type="bibr" target="#b6">[7]</ref> show great promise for understanding relationally structured web tables, these models are typically not tuned to unconventional structural complexity. This is especially true for the dense and often implicit semantics and diffuse context inherent in scientific tables in highly specialized domains <ref type="bibr" target="#b7">[8]</ref>. Representing scientific tables as semantically annotated linked data artifacts accounts for structural complexities and enables explicit reasoning over tabular content to infer their semantics and relevance to search queries. Hence, entity linking is fundamental to our end-to-end pipeline for constructing such knowledge graphs of tables drawn from scientific documents, as depicted in Figure <ref type="figure">1</ref>.</p><p>This paper presents an entity linking system to auto-Figure <ref type="figure">1</ref>: Entity linking and its role in constructing knowledge graphs from scientific tables matically map the content of individual cells in scientific tables to appropriate entries in the Wikidata knowledge base <ref type="bibr" target="#b8">[9]</ref>. To keep up with the scientific literature infodemic, we architected a more efficient local, offline linking system using periodic Wikidata knowledge dumps. While the ensuing efficiency gains make our system more feasible in practice, we discuss the implications for linking performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Entity Linking for Scientific Text and Tables</head><p>Given a mention of an entity in a document and a unique set of known entities defined in some knowledge base, entity linking refers to finding and assigning the entity ID corresponding to the mentioned entity. Entities play an essential role in text and are often used to describe what the text is about. Likewise, linking entity mentions in the header and body cells of tables, as well as linking entities in captions or other referring text, can help partly understand or infer the semantic meaning of tables. We developed a general-purpose linker to link entity mentions in text to items in (and to further extract useful information about items from) Wikidata. We describe the linker's customization and inner workings for linking highly specialized, idiomatic content within header and body cells of tables drawn from a corpus of COVID-19-related scientific literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Wikidata: Reference Knowledge Base</head><p>Wikidata <ref type="bibr" target="#b8">[9]</ref> is a collaboratively edited multilingual knowledge graph used to provide common data for Wikimedia projects, with currently about 1.2 billion facts on over 102 million items. Wikidata's ontology has a finegrained type system with more than two million types and about 11 thousand properties, including an item's label, aliases, and description. Each Wikidata item has a unique identifier beginning with Q, like Q3519875 ("National Institute of Allergy and Infectious Diseases"), and each property has an identifier starting with P. The property P31 (instance of) links an item with its immediate types, P279 (subclass of) links a concept item to its immediate supertypes, and P1647 (subproperty of) links properties to their immediate super-properties.</p><p>An entity has just one label in a given language, its "canonical name". An entity can have any number of aliases in a language and can have a short description in any language. Unlike other open knowledge graphs, Wikidata includes and links to specialized knowledge from additional domain-specific knowledge resources. These include the Unified Medical Language System (UMLS) <ref type="bibr" target="#b9">[10]</ref> knowledge base and the Medical Subject Headings (MeSH) thesaurus <ref type="bibr" target="#b10">[11]</ref>, which bring together biomedical vocabularies and standards to enable interoperability.</p><p>Figure <ref type="figure" target="#fig_0">2</ref> shows an example of a simple scientific table with links to appropriate Wikidata items highlighting several high-level issues we addressed. One is that we must consider the "header" cells (whether for columns or rows) differently from the regular table body cells. Note that the third column's header cell, Prevalence, has two good candidate links: the concept Q719602 ("number of disease cases in a given population at a specific time") and the property P1193 ("portion in percent of a population with a given disease or disorder"). We give preference in such cases to using the property item over the concept item.</p><p>The middle header cell containing the text Lineage illustrates a second issue: A simple linker might choose the most common match for this based only on the text, Q1517820 ("line of ancestors and descendants of a person"). However, the cells in this column (e.g., B.1.1.7) are all easily matched to Wikidata items whose immediate type is Q104450895 ("variant of SARS-CoV-2"). Therefore, we need to do joint inference using both the header cell and a sample of its data cells to choose the best links for both.</p><p>The first column of the table highlights a third aspect of the task: mining additional knowledge from resources connected to candidate Wikidata items. Wikidata items often link to other knowledge graphs, such as DBpedia <ref type="bibr" target="#b11">[12]</ref>, that contain additional useful information. DBpedia, for example, has a short paragraph describing its items and links to types in the Yago fine-grained type system <ref type="bibr" target="#b12">[13]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Core Entity Linking Algorithm</head><p>Our entity linker takes a mention string (e.g., from a table header or cell) and begins by retrieving a pre-specified number of Wikidata items using the MediaWiki search API. This returns a ranked list containing each item's Wikidata ID, label, aliases, and English language description. Next, we rerank candidates to promote ones that resulted in an exact match of their mention string with a Wikidata item's label (best) or alias (second best). For each candidate, we use a SPARQL query to retrieve its types, both immediate (P31) and inherited, via a chain of P279 links for concept super-classes and P1647 links for property super-properties.</p><p>For specific domains, our linker leverages the ultrafine-grained Wikidata type system to infer additional domain types for an item by checking for specific domainrelevant properties. We identified a custom set of Wikidata item types and properties to support entity linking for the biomedical domain. For example, we infer the mesh item type if an item has a MESH descriptor ID property (P486) that connects the item with a UMLS Medical Subject Heading.</p><p>When linking the text in a header cell, we give more weight to candidates that are Wikidata properties. For example, candidates for the text "location" include an item representing the geographic location (Q2221906) as well as the property location (P276). While either might be relevant, our annotation methodology strongly preferred the latter.</p><p>The linker's filtering and ranking of candidate items are based initially on analyzing an item's types. This type of analysis is controlled by five lists of types that are part of the linker's configuration for a domain and task. These are ordered from best to worst as follows: (1) Target types are those we want to find based on the mention type identified by an NLP system; (2) Near-miss types are close to the target types and often confused with the targets by an NLP system; (3) Good types are ones that are very relevant to the domain, such as a MESH term (Medical Subject Heading); (4) OK types include types that are acceptable and common in many domains, such as organizations, people, geo-political entities, and locations; and (5) Bad types are ones we are not interested in (e.g., fictional characters, journal articles, musical groups) and result in a candidate being immediately rejected.</p><p>The type names of interest are mapped to Wikidata types via the linker's configuration dictionary. Extending this dictionary-enabled us to easily customize our linker to specific domains, such as COVID-19-related scientific research. For our domain, examples of good types are Wikidata high-level classes corresponding to disease, protein, chemical compound, vaccine type, and type of statistic. OK types are those associated with the standard OntoNotes <ref type="bibr" target="#b13">[14]</ref> types, such as person, event, facility, organization, and location. Entities of these types often occur in biomedical tables. Our bad types cover things like songs, works of art, sports organizations, fictional things, and other high-level types unlikely to be present in medical tables. For example, there exist 83 Wikidata items with the canonical name "virus". These include Q808, the infectious agent, as well as films, songs, musical albums, rock groups, painting, video games, musicians, professional wrestlers, and more.</p><p>Finally, we have a mapping of near-miss types that represent types that are easily confused. A classic example is the OntoNotes types FAC (for facility) and LOC (for location) are easily confused by most NLP systems. An entity like Wuhan Institute of Virology can be marked as an ORG, LOC, or FAC, depending on its context. Since locations are a common type in tables for this domain, we can treat an item identified as a FAC or ORG by a language processor as possibly referring to a location. Additional ranking for an item's prominence is then done using its number of sitelinks, i.e., the number of links to other Wikimedia projects that contain information about the item.</p><p>Beyond type analysis-based filtering, the last step is the ranking of the final candidates using a context span or string, if provided. The similarity of the context and the item's description is computed with embeddings from the spaCy <ref type="bibr" target="#b14">[15]</ref> large language model and generates a score that is used along with the item's rank in the candidate list to select and return the best link. This worked reasonably well for both well-structured text (e.g., table captions) and for collections of terms from the row and column headers and could be improved by using an embedding model fine-tuned on the biomedical domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Efficient Entity Linking at Large Scale</head><p>Our entity linker initially used the Wikidata and Wikimedia APIs to retrieve the initial ranked list of Wikidata candidate items and their type and supertype information. Since Wikidata is a public resource, the APIs are understandably rate-limited such that unreasonable access requests and query rates in excess of established limits may lead to IP address blacklisting <ref type="bibr" target="#b15">[16]</ref>. The table in Figure <ref type="figure">3</ref> breaks down our average observed entity linking time to link a single exemplar mention string to a Wikidata entity while operating under the above limits. Accessing public Wikidata APIs, our linker can operate no faster than around 30 seconds per entity. For our dataset of 120,000+ tables (a rate reflective of the COVID-19 infodemic), annotating even just 10 cells per table at this rate could end up taking over a year.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 3: Entity linking time using Wikidata APIs</head><p>Furthermore, when applying entity linking to infer table semantics (see next section), the linking of a single header cell could, in turn, translate to the linking of all other cells in the respective column or row-potentially placing far greater stress on the linker. As a result, while Wikidata APIs facilitated a proof of concept of our core entity linking algorithm, they cannot sustain a practical, scalable linking service capable of keeping up with contemporary scientific publication rates.</p><p>To address these API rate-limit bottlenecks, we initially set up a transient caching layer for cell entity linking results so that future requests to link the same mention string would be served from the cache, avoiding API invocations. However, this strategy was insufficient, so we decoupled our core entity linker from the public Wikidata altogether by architecting and progressively setting up a more efficient system using local periodic dumps of relevant Wikidata knowledge.</p><p>The system is offline because the linker no longer relies on Wikidata APIs. Wikidata's complex software architecture <ref type="bibr" target="#b16">[17]</ref> and its enormous size make it challenging to replicate locally in its entirety. That said, our entity linker does not need all the capabilities that Wikidata offers. We targeted emulation strategies addressing bottlenecks with cross-item graph search (via the Wikidata query service (WDQS) and Wikidata's underlying RDF triple store) and full-text search over items and their properties (via the Action API and underlying CirrusSearch Wikibase extension). We leverage proven open-source storage technologies such as the Elasticsearch engine and the Redis key-value store to emulate underlying Wikidata capabilities, as depicted in Figure <ref type="figure" target="#fig_1">4</ref>. We implemented this system by uploading partial JSON dumps of Wikidata items, their basic attributes (label, aliases, description), specific types, and 'sitelinks' counts<ref type="foot" target="#foot_0">1</ref> into a local Elasticsearch index. This resulted in a locally searchable collection of 95.8M items. Offline, we retrieved the current type hierarchy (by traversing P31 and P279 property relationships) and loaded the resulting dictionary, mapping each of Wikidata's 2.6M types to its supertypes into Redis. This reduced determining if an entity was an instance of a given type (direct or inherited) to a dictionary lookup.</p><p>In this efficient entity linking system, an initial candidate search is performed using an Elasticsearch multimatch query that compares a mention string against labels and aliases. In lieu of Wikidata's CirrusSearch rank- ing mechanisms, we use an item's sitelinks count (i.e., popularity) as a proxy for its prominence and rank candidates in descending order of their sitelinks counts. Once we have a ranked list of candidates for each item, we query Redis using the item's entity ID and direct types as keys to retrieve associated inherited types. Type analysis and re-ranking then proceed as before.</p><p>Figure <ref type="figure" target="#fig_2">5</ref> shows a progression in replacing Wikidata API invocations with queries to these local knowledge stores. The resulting system trades linking accuracy for a threefold improvement in linking efficiency, with the potential for even further speedups via parallel processing. The impact on entity linking performance is largely dictated by the quality of the initial ranked candidate list returned by our Elasticsearch query. We are exploring techniques like PageRank to estimate an item's relative importance better.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Entity Linking to Infer Semantics of Tables</head><p>The meaning of text derives from its constituent words, which in turn are understood using grammatical knowledge and context provided by surrounding text. Inferring the intended meaning of tables additionally requires interpreting row/column headers and relations between them, besides linking cell values to entities. To improve entity linking performance for inferring the semantics of scientific tables, we supplement our core algorithm with other techniques (beyond the scope of this paper), as shown in Figure <ref type="figure">1</ref>. These include:</p><p>• Rule-based syntactic characterization: We categorize tables into types (e.g., horizontal) based on their structure, • Joint inference based on embeddings of Wikidata items. We use Wembedder-driven <ref type="bibr" target="#b18">[18]</ref> clustering operations to compute compatibility between entities and to jointly assign entities to cells in a column, and</p><p>• Specialists: We use pattern-based or machinelearning approaches to independently assess commonly encoded data types in table cells to avoid linking those cell values that are deemed to be specific kinds of literals (e.g., RNA/DNA sequences or Clinical Trial IDs).</p><p>Our entity linking system achieves a fair degree of accuracy in linking table cells to Wikidata items. We based our evaluations on a manually annotated subset of 47 tables extracted from 45 COVID-19-related articles drawn randomly from PubMed Central <ref type="bibr" target="#b5">[6]</ref>. Of the 910 table cells (out of a total of 3600 manually annotated cells in these tables) expected to be mapped to a Wikidata item, our linker achieved a recall of 0.82 when the expected annotation was part of the linker's initial candidate item set, and a precision of 0.51 over the subset of these cells with expected Wikidata annotations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and Conclusions</head><p>Existing NLP tools for entity linking like spaCy <ref type="bibr" target="#b14">[15]</ref> support a very limited entity type system, often based on just Ontonotes 5.0 types (e.g., PER, ORG, LOC, FAC) and do not cover specialized scientific entities. The SemTab challenge on Tabular Data to Knowledge Graph Matching focuses on three mapping tasks aimed at inferring the semantics of web tables <ref type="bibr" target="#b19">[19]</ref>. While it recently included tables from biology literature, leading tabular entity linking systems <ref type="bibr" target="#b20">[20]</ref> do not adequately cover domain-specific entities. Bespoke entity linking systems for COVID-19related entities <ref type="bibr" target="#b4">[5]</ref> link against UMLS and do not exploit the extensive type hierarchy or entity coverage of Wikidata.</p><p>Part of our goal is to fill this missing gap with a practical entity linking system that can not only be adapted for domain-specific entities but can also help infer table semantics with high accuracy by leveraging Wikidata's rich type system. As entity linking of tables against Wikidata at large scale is bottlenecked by rate-limited APIs <ref type="bibr" target="#b21">[21]</ref>, we built an offline version of our linking system, achieving a three-fold improvement in efficiency while sacrificing a tolerable reduction in linking performance.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Examples of header and cell annotations of links to Wikidata items and properties</figDesc><graphic coords="3,89.29,84.19,416.68,120.11" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Functional architecture of an efficient 'offline' entity linker</figDesc><graphic coords="4,302.62,355.51,203.37,75.54" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Replacing the entity linker's use of public Wikidata APIs with efficient offline, local queries</figDesc><graphic coords="5,89.29,84.19,416.70,110.75" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">A Wikidata item's sitelinks property is the number of other Wikimedia sites such as Wikipedia, Wikisource, and Wikivoyage in which it appears. It is commonly used as a metric for the item's importance.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research is based on work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via . The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Covid in papers: a torrent of science</title>
		<author>
			<persName><forename type="first">H</forename><surname>Else</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="page" from="553" to="553" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Cord-19: The covid-19 open research dataset</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chandrasekhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Reas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Burdick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Eide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Funk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Katsis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Kinney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020</title>
				<meeting>the 1st Workshop on NLP for COVID-19 at ACL 2020</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Rapidly deploying a neural search engine for the covid-19 open research dataset: Preliminary thoughts and lessons learned</title>
		<author>
			<persName><forename type="first">E</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACL 2020 Workshop on Natural Language Processing for COVID-19</title>
				<imprint>
			<publisher>NLP-COVID</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Hall</surname></persName>
		</author>
		<ptr target="https://ai.googleblog.com/2020/05/an-nlu-powered-tool-to-explore-covid-19.html" />
		<title level="m">n nlu-powered tool to explore covid-19 scientific literature</title>
				<imprint>
			<date type="published" when="2020">2020. 2022-11-02</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">BENNERD: A neural named entity linking system for COVID-19</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Sohrab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Duong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Miwa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Topić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Masami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hiroya</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-demos.24</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-demos.24.doi:10.18653/v1/2020.emnlp-demos.24" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schlangen</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="182" to="188" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/" />
		<title level="m">PMC open access subset</title>
				<imprint>
			<date type="published" when="2022-11-02">2022. 2022-11-02</date>
		</imprint>
		<respStmt>
			<orgName>National Library of Medicine</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Tabert: Pretraining for joint understanding of textual and tabular data</title>
		<author>
			<persName><forename type="first">P</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="8413" to="8426" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Interpreting medical tables as linked data for generating meta-analysis reports</title>
		<author>
			<persName><forename type="first">V</forename><surname>Mulwad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joshi</surname></persName>
		</author>
		<idno type="DOI">10.1109/IRI.2014.7051955</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)</title>
				<meeting>the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="677" to="686" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Wikidata: a free collaborative knowledgebase</title>
		<author>
			<persName><forename type="first">D</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Krötzsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The unified medical language system (umls): integrating biomedical terminology</title>
		<author>
			<persName><forename type="first">O</forename><surname>Bodenreider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic acids research</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="D267" to="D270" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Medical subject headings (mesh)</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Lipscomb</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bulletin of the Medical Library Association</title>
		<imprint>
			<biblScope unit="volume">88</biblScope>
			<biblScope unit="page">265</biblScope>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Dbpedia-a crystallization point for the web of data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of web semantics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="154" to="165" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Yago: a core of semantic knowledge</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th international conference on World Wide Web</title>
				<meeting>the 16th international conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="697" to="706" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">R</forename><surname>Weischedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pradhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramshaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kaufman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Franchini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>El-Bachouti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Hwang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bonial</surname></persName>
		</author>
		<idno type="DOI">10.35111/xmhb-2b84</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/10.35111/xmhb-2b84" />
	</analytic>
	<monogr>
		<title level="j">Ontonotes release</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Honnibal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Montani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Van Landeghem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Boyd</surname></persName>
		</author>
		<title level="m">spacy: Industrial-strength natural language processing in python</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><surname>Wikidata</surname></persName>
		</author>
		<ptr target="https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual" />
		<title level="m">Wikidata query service user manual</title>
				<imprint>
			<date type="published" when="2022-11-02">2022. 2022-11-02</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<ptr target="https://upload.wikimedia.org/wikipedia/commons/" />
		<title level="m">Wikidata architecture</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<ptr target="/2e/" />
		<title level="m">Wikidata_Architecture_Overview_-_High_ Level</title>
				<imprint>
			<date type="published" when="2018">2018. 2022-11-02</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">Å</forename><surname>Nielsen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1710.04099</idno>
		<title level="m">Wembedder: Wikidata entity embedding web service</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Resources to benchmark tabular data to knowledge graph matching systems</title>
		<author>
			<persName><forename type="first">E</forename><surname>Jiménez-Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Hassanzadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Efthymiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Srinivas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th International Conference European Semantic Web Conference</title>
				<meeting>the 17th International Conference European Semantic Web Conference</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019. 2020</date>
			<biblScope unit="page" from="514" to="530" />
		</imprint>
	</monogr>
	<note>Semtab</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Dagobah: An end-to-end context-free tabular data semantic annotation system</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chabot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Labbé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 18th International Semantic Web Conference</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="41" to="48" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Wikidata-lite for knowledge extraction and exploration</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Takeda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2022 IEEE International Conference on Big Data (Big Data)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3684" to="3686" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
