<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">From web-tables to a knowledge graph: prospects of an end-to-end solution</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Alexey</forename><surname>Shigarov</surname></persName>
							<email>shigarov@icc.ru</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Matrosov Institute for System Dynamics and Control Theory</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">the Russian Academy of Sciences</orgName>
								<address>
									<addrLine>134 Lermontov St</addrLine>
									<postCode>664033</postCode>
									<settlement>Irkutsk</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nikita</forename><surname>Dorodnykh</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Matrosov Institute for System Dynamics and Control Theory</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">the Russian Academy of Sciences</orgName>
								<address>
									<addrLine>134 Lermontov St</addrLine>
									<postCode>664033</postCode>
									<settlement>Irkutsk</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexander</forename><surname>Yurin</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Matrosov Institute for System Dynamics and Control Theory</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">the Russian Academy of Sciences</orgName>
								<address>
									<addrLine>134 Lermontov St</addrLine>
									<postCode>664033</postCode>
									<settlement>Irkutsk</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrey</forename><surname>Mikhailov</surname></persName>
							<email>mikhailov@icc.ru</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Matrosov Institute for System Dynamics and Control Theory</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">the Russian Academy of Sciences</orgName>
								<address>
									<addrLine>134 Lermontov St</addrLine>
									<postCode>664033</postCode>
									<settlement>Irkutsk</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Viacheslav</forename><surname>Paramonov</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Matrosov Institute for System Dynamics and Control Theory</orgName>
								<orgName type="department" key="dep2">Siberian Branch</orgName>
								<orgName type="institution">the Russian Academy of Sciences</orgName>
								<address>
									<addrLine>134 Lermontov St</addrLine>
									<postCode>664033</postCode>
									<settlement>Irkutsk</settlement>
									<country key="RU">Russian Federation</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">From web-tables to a knowledge graph: prospects of an end-to-end solution</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">82C8D023D5431289E002B3CACDF8AADD</idno>
					<idno type="arXiv">arXiv:2003.02320.</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:14+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>table understanding</term>
					<term>semantic table interpretation</term>
					<term>web-tables</term>
					<term>data extraction</term>
					<term>knowledge graph population</term>
					<term>semantic web</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Web stores a large volume of web-tables with semi-structured data. The Semantic Web community considers them as a valuable source for the knowledge graph population. Interrelated named entities can be extracted from web-tables and mapped to a knowledge graph. It generally requires reconstructing the semantics missing in web-tables to interpret them according to their meaning. This paper discusses prospects of an end-to-end solution for the knowledge graph population by entities extracted from web-tables of predefined types. The discussion covers theoretical foundations both for transforming data from web-tables to entity sets (table analysis) and for mapping entities, attributes, and relations to a knowledge graph (semantic table annotation). Unlike general-purpose text mining and web-scraping tools, we aim at developing a solution that takes into account the relational nature of the information represented in web-tables. In contrast to the table-specific proposals, our approach implies both the table analysis and the semantic table annotation.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The Web stores a large volume of tables. The exploration of the Web crawl discovered hundreds of millions of web-tables containing relational data <ref type="bibr" target="#b1">[1,</ref><ref type="bibr" target="#b2">2]</ref>. There are at least billiards of valuable facts that can be extracted from web-tables. All of these make web-tables an attractive data source in various applications, such as knowledge base construction <ref type="bibr" target="#b3">[3,</ref><ref type="bibr" target="#b4">4]</ref>, question-answering systems, and table augmentation <ref type="bibr">[5,</ref><ref type="bibr" target="#b6">6,</ref><ref type="bibr" target="#b7">7]</ref>. However, in general, web-tables are not interpretable by computer programs. Their original representation does not provide all explicit semantics required to interpret them according to their meaning. This hinders the wide usage of such tabular data in practice.</p><p>Reconstruction of the semantics missing in tables is commonly referred to as the "table understanding". This problem was first formulated by M. Hurst in 2000 <ref type="bibr" target="#b8">[8,</ref><ref type="bibr" target="#b9">9]</ref>. Over the two last  decades, hundreds of papers devoted to its issues were published <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b7">7,</ref><ref type="bibr" target="#b10">10,</ref><ref type="bibr" target="#b11">11]</ref>. The literature survey shows that this topic continues to rapidly develop in several communities such as document understanding, semantic web, and end-user programming. The last 3 years were marked by an extraordinary growth of proposals based on a novel apparatus, namely, deep learning, word and entity embedding, and knowledge graphs. There several are challenges for the expert community. One of them is to develop of a common theoretical and technological basis applicable to various digital environments and formats for representing tabular data in the Web (such as print-oriented documents, spreadsheets, and web-pages). Our approach is addressed to this challenge, namely the extraction and semantic interpretation of data from web-tables represented in HTML-format (Fig. <ref type="figure" target="#fig_0">1</ref>).</p><p>The recent surveys of the thematic literature <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b7">7,</ref><ref type="bibr" target="#b10">10,</ref><ref type="bibr" target="#b11">11]</ref> note that the problem of table understanding remains open. The review <ref type="bibr" target="#b6">[6]</ref> revealed that the majority of the works focuses mainly on the tasks of discrimination and semantic interpretation of web-tables. Roldán et al. <ref type="bibr" target="#b7">[7]</ref> indicated that none of the known solutions is complete. They do not provide all steps of the table understanding. This is also confirmed by Burdick et al. <ref type="bibr" target="#b10">[10]</ref>. As reported in <ref type="bibr" target="#b7">[7]</ref>, many table design properties are not taken into account by the state-of-the-art solutions. This often hinders their practical application.</p><p>The existing models of table representation do not completely reflect the complexity of the structure of real tables. One of the commonly-used assumptions is "all cell values are atomic". They assume that any non-blank cell contains only one functional data item. To the best of our knowledge, all competitive solutions follow this simplification. However, a real cell can have several data items with the same or different functions. The latter should be taken to account in order to extend the range of cases for table processing.</p><p>The novelty of our proposal is established by the following. First, we propose an end-to-end solution covering the stages from extracting data from syntactically tagged tables to their semantic interpretation (i.e. mapping extracted data and metadata to a cross-domain knowledge graph). Second, we take into account structured cells which content should be decomposed into several atomic data items with different functional roles. Third, our proposal can show the applicability of some promising techniques (cell embeddings, contextualized word embeddings, entity embedding) to the tasks of table understanding.</p><p>Unlike the general-purpose text mining and web-scraping tools, our solution takes into account the relational nature of the information represented in web-tables. In contrast to similar proposals that target data extraction from web-tables, we cover a wider range of cases by involving the structured content of a cell. Moreover, the competitive techniques are limited either by data extraction stage or the stage of semantic table interpretation, whereas, our approach implies both of them. Therefore, the expected results could be applied in the knowledge base population.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Data extraction from web-tables</head><p>The approach to the data extraction from web-tables includes two stages: (i) classifying webtables by predefined types; (ii) extracting entity sets from web-tables, using algorithms appropriated to the corresponding types.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Web-table classification</head><p>In the last decade, several taxonomies of web-table types were published in the last decade <ref type="bibr" target="#b12">[12,</ref><ref type="bibr" target="#b13">13,</ref><ref type="bibr" target="#b14">14]</ref>. All taxonomies describe three common types of web-tables with relational data (Fig. <ref type="figure" target="#fig_1">2</ref>). Eberius et al. refer to them as "vertical listing", "horizontal listing", and "matrix". This taxonomy is used by the latest proposals for the table type classification <ref type="bibr" target="#b15">[15,</ref><ref type="bibr" target="#b16">16,</ref><ref type="bibr" target="#b17">17,</ref><ref type="bibr" target="#b18">18]</ref>. We also rely on this taxonomy. This will allow us to quantitatively compare our results with others.</p><p>We plan to develop a deep neural network model for classifying web-tables based on DeepTable<ref type="foot" target="#foot_0">1</ref>  <ref type="bibr" target="#b17">[17]</ref>, the ad-hoc architecture that provides four main blocks: (i) Embedding layer for extracting vector representation of cell tokens; (ii) LSTM (recurrent neural network) for identifying semantic dependencies between tokens in a cell; (iii) MLP (multilayer perceptron) for identifying non-linear dependencies between all cells in a table; (iv) Softmax as a classification</p><p>layer.</p><p>An open collection of tagged tables extracted from biomedical research papers (PubMed Central 2 ) can be used as training data. To select a basic tool for contextualized vector representation of words, we propose to try several variations (ELMo 3 <ref type="bibr" target="#b19">[19]</ref>, fastText 4 [20], etc.). Some classifiers can be trained for each variation. This allows us to compare their accuracy and choose the best for this task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Transformation of entity sets from web-tables</head><p>We propose to develop algorithms that target three table types of <ref type="bibr" target="#b14">[14]</ref>. The algorithms should analyze the logical structure of web-tables by using built-in rules and trained classifiers dealing with these types. It is important to note that web-tables mix data and metadata. Moreover, one cell may contain several values of both data and metadata. The extraction of the logical structure requires: (i) to associate each cell value (data item) with its functional role (data and metadata); (ii) to associate data values with metadata ones; (iii) to group data belonging to one record (entity).</p><p>After the table type classification, the extraction of data and metadata values becomes a "cornerstone" step. We plan to implement a classifier based on machine learning algorithms to assign functional roles to data items. A promising approach to encoding cell context in the vector representation named "cell embedding" was recently proposed in <ref type="bibr" target="#b21">[21,</ref><ref type="bibr" target="#b22">22]</ref>. In our case, the cell context can be deduced automatically by using the built-in type-specific table structure analysis. This approach should allow cells classification taking into account the properties of their layout and formatting, as well as the semantic similarity of their text content. We plan to reduce the number of possible false-positive and false-negative errors by using some table-specific constraints. To associate data with metadata and group data values with records, we also suggest using rule-based analysis of the table structure. The extracted entity sets can be represented in JSON notation compatible with TOMATE 5 <ref type="bibr" target="#b18">[18]</ref>, a recently published framework for the performance evaluation of web-table data extraction tools.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Semantic annotation of entity sets</head><p>There are 3 main approaches to the semantic table interpretation, namely: (i) ontology matching;</p><p>(ii) entity lookup and wikification; (iii) vector representations of knowledge graphs (entity embedding). The recent study <ref type="bibr">[23]</ref> showed experimentally that a hybrid approach combining lookup services and entity the efficient We plan to exploit such a hybrid using the available toolset (DBpedia Lookup<ref type="foot" target="#foot_1">6</ref> , DBpedia SPARQL Endpoint<ref type="foot" target="#foot_2">7</ref> , DBpedia Spotlight<ref type="foot" target="#foot_3">8</ref> , RDF2Vec<ref type="foot" target="#foot_4">9</ref> , KGloVe<ref type="foot" target="#foot_5">10</ref> , and Wikipedia2Vec<ref type="foot" target="#foot_6">11</ref> ).</p><p>The end-to-end semantic table interpretation includes 3 stages:</p><p>• Entity linking -CEA (Cell-Entity Annotation).</p><p>• Attribute-concept matching -CTA (Column-Type Annotation).</p><p>• Relation extraction -CPA (Column-Property Annotation).</p><p>As a result, this enables knowledge graph augmentation. Fig. <ref type="figure" target="#fig_3">3</ref> shows two examples where webtables from Wikipedia 1213 (Fig. <ref type="figure" target="#fig_3">3, a, b</ref>) are normalized and enriched by the semantic annotation (Fig. <ref type="figure" target="#fig_3">3, c, d</ref>), i.e. links to a knowledge graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Entity linking -CEA</head><p>The proposed solution should provide for the following: (i) identifying a subject column containing names of entities listed in a table <ref type="table">;</ref> (ii) lookup a set of candidate 𝐾𝐺-instances for each entity; (iii) entity disambiguation in cases when several candidate KG-instances are associated with an entity. The subject column is selected among potential keys that contain entity mentions. We are limited by the trivial case when there is only one candidate subject column. (Note that the general case requires the end-to-end semantic table interpretation).</p><p>As the main tool for linking entities, we propose to use the vector representations of subsets from a knowledge graph. The initial lookup of candidate KG-instances can be performed by using SPARQL-queries to the knowledge graph. Such queries are composed of surface forms contained in the text of cells. Each KG-instance can be encoded as a vector representation of the entity by the existing algorithms, such as RDF2Vec <ref type="bibr" target="#b24">[24]</ref>, KGloVe <ref type="bibr" target="#b25">[25]</ref>, or Wikipedia2Vec <ref type="bibr" target="#b26">[26]</ref>. The formed vector model should allow us to use some semantic similarity metrics <ref type="bibr" target="#b27">[27]</ref> to rank candidate KG-instances by relevance to the entity.</p><p>The approach to the entity disambiguation relies on the assumption proposed by <ref type="bibr" target="#b28">[28]</ref> which implies that that the most relevant KG-instances from the candidate sets have the highest semantic similarity values in pairwise matching. This can be explained by the following example from <ref type="bibr" target="#b28">[28]</ref>. Let a column contain 3 mentions: "USA", "China", and "India". They should be matched to 3 sets of candidate KG-instances respectively: "USA" → ["University of South Alabama (University)", "United States of America (Country)"], "China" → ["People's Rep. of China (Country)", "China (Band)", "China, Kagoshima (City)"], "India" → ["India (Country)", "India (George W. Bush's cat)", "India (Xandria album)"]. Among all pairs of KG-instances, "United States of America (Country)", "People's Rep. of China (Country)" and "India (Country)" would be the most semantically (they the knowledge graph).</p><p>Thus, this approach should allow us to rank candidate KG-instances and select from them the reference KG-instances for specific mentions. For example, the table showed in Fig. <ref type="figure" target="#fig_3">3</ref>, a contains the surface form "London" that can mean "Location" or "Degree powers". Obviously, in the context of the column [Oxford, Cambridge, London, Durham, Stockton] it should be assigned to the instance of "Location" while in the context of the column [Full, London, Taught] it corresponds to the instance of "Degree powers".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Attribute-concept matching -CTA</head><p>In practice, many tables are not accompanied by metadata (named attributes). Generally, to map a column to a KG-class, first it is needed to associate the entities listed in the column with the reference KG-instances. After that, it is possible to form an index of all candidate KG-classes to which the reference KG-instances belong. Among them, the KG-class which is most relevant to all column values is selected. For example, in Fig. <ref type="figure" target="#fig_3">3</ref>, a three columns should be matched to KG-classes (Fig. <ref type="figure" target="#fig_3">3, b</ref>) as follows:</p><p>"University" -&gt; o:EducationalInstitution "Location" -&gt; o:Location "Degree powers" -&gt; o:DegreePowers While the rest of columns are corresponded to KG-properties of o:EducationalInstitution (KG-class) as follows:</p><p>"Established" -&gt; o:Established "Num. of students" -&gt; o:NumOfStudents "Tuition fee" -&gt; o:Tuition(£)</p><p>In the cases when entity linking (CEA-stage) fails, we propose to use ANN-models to predict the KG-class of a column based on ColNet algorithms <ref type="bibr" target="#b29">[29,</ref><ref type="bibr" target="#b30">30]</ref>. To map a column of literal values (NUMERIC, DATE, CURRENCY, etc.) to a KG-datatype, it is enough to recognize standard named entities. This is reached by using regular expressions and NER-models available in popular NLP-libraries (e.g. Stanford CoreNLP 14 , AllenNLP 15 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Relation extraction -CPA</head><p>To map pairs of columns (&lt; 𝐸, 𝑃 &gt;, where 𝐸 is a subject, 𝑃 is not a subject) with KG-properties, we plan to use entity relatedness metrics <ref type="bibr" target="#b31">[31]</ref>. It is assumed that these metrics will allow ranking the index of candidate KG-properties and choosing the most relevant ones. For example, two web-tables showed in Fig. <ref type="figure" target="#fig_3">3</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Knowledge graph augmentation</head><p>An entity set represented as linked data (RDF-triples with URI-references to concepts in a knowledge graph) should be suitable for further interpretation. In particular, other facts (RDFtriples) can be inferred from them and asserted to the knowledge graph. Such restored semantics would provide populating the existing knowledge graphs with new entities extracted from web-tables. For example, Fig. <ref type="figure" target="#fig_4">4</ref> shows the terminological level (TBox) of a knowledge graph constructed by using 49 tables scrapped from Wikipedia pages in "Category: Universities in the United Kingdom". The facts extracted from these tables can be asserted into the ABox component of the knowledge graph.</p><p>We plan to demonstrate the applicability of the proposed solution by an illustrative example of populating a domain-specific knowledge graph (ABox component) in the area of industrial safety expertise. This should cover aligning entity set records with the structure of the knowledge graph (row-to-instance matching) and synthesizing new KG-instances and KG-properties. Tables extracted from real reports on industrial safety expertise may be used as a source of domain data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>Our previous work <ref type="bibr" target="#b32">[32,</ref><ref type="bibr" target="#b33">33,</ref><ref type="bibr" target="#b34">34]</ref> was aimed at data extraction from spreadsheets driven by user-defined rules. We proposed end-user programming as the main approach. This allowed us to support specific tricks of table layout, formatting, and content. However, scaling such solutions may be challenging when there are ambiguous tricks applied within source tables. Nonetheless, a solution for the Web should be easily This is possible there types of web-tables. The latter is needed to classify them and select type-specific algorithms of analysis and interpretation. Thus, our previous approach is suitable for spreadsheet sources, but not for the Web.</p><p>The current proposal aims to fill this gap by the development of a scalable solution for web-tables. The expected results contribute to the following: (i) data extraction, including algorithms for classifying web-tables by types of taxonomy and extracting entity sets from tables of predefined types, (ii) semantic table annotation, including algorithms for mapping entities, attributes, and relations to concepts of an external knowledge graph, (iii) open software for implementing the functionality of the extraction and semantic annotation of tabular data in applications of the knowledge graph population.</p><p>To the best of our knowledge, all existing proposals for data extraction from web-tables exploit a specific constraint: "any cell contains only one atomic data item". This constraint can be eliminated in the proposed solution. We argue that the structured content of a cell can be decomposed into several data items. Moreover, all proposals implement the semantic table interpretation only for entity sets, not pivots. We plan to study both kinds of tabular data. We think this can expand the range of cases to be processed.</p><p>We propose to apply the state-of-the-art methods and tools, including contextualized word embeddings, vector representations of knowledge graphs, entity lookup services, as well as metrics of semantic similarity and entity relatedness. The applicability of some of these tools for the considered issues remains poorly studied. The expected results could demonstrate the promise of the use of these techniques.</p><p>The expected results could be useful to intellectualize software for tabular data extraction and integration in scientific and industrial applications. It can be of particular interest in areas with the intensive use of tabular data (e.g., finance, government statistics, and business management) to form linked open data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Steps for the knowledge graph population with entities extracted from web-tables.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Web-tables taxonomy: vertical listing -(a), horizontal listing -(b), and matrix -(c).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Example of the semantic table annotation: origin web-tables (vertical listing -a and matrix c) and their normalized and annotated forms (b, d); the prefix o ("ontology") denotes a KG-class or KG-property of entities defined in the terminological component (TBox) while r ("resource") is a KG-instance from the assertion component (ABox).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: An example of the terminological level (TBox) of a knowledge graph constructed from tables of Wikipedia pages in "Category: Universities in the United Kingdom".</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>present table layout, formatting, and content; (ii) limited representation formats (such as Excel or HTML) that do not provide all semantics needed for data interpretation. Generally, the solution requires all stages of the table understanding: (i) table detection or discrimination; (ii) table structure recognition and cleaning; (iii) role and structural analysis (i.e. extracting interrelated data and metadata values from the content); (iv) semantic interpretation (i.e. matching the semantic table structure with an external dictionary).</figDesc><table /><note>The two highly-rated conferences, "Int. Semantic Web Conf." and "Int. Conf. Document Analysis and Recognition", recently conducted competitions related to this problem.The complexity of the problem is determined by two factors: (i) a wide variety of tricks to</note></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/marhabibi/deeptable</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1">https://lookup.dbpedia.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_2">https://dbpedia.org/sparql</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_3">https://www.dbpedia-spotlight.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_4">http://rdf2vec.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_5">https://datalab.rwth-aachen.de/embedding/kglove</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_6">https://wikipedia2vec.github.io/wikipedia2vec</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_7">https://en.wikipedia.org/wiki/List_of_universities_in_England</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_8">https://en.wikipedia.org/wiki/Rankings_of_universities_in_the_United_Kingdom</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by the Russian Science Foundation (Grant No. 18-71-10001).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://github.com/allenai/allennlp" />
		<title level="m">DurhamUniversity, o:located_in, r:Durham&gt; &lt;r:DurhamUniversity, o:located_in, r:Stockton&gt; &lt;&lt;r:UniversityOfOxford, o:ranked_in, r:THE&gt;, o:positioned_at</title>
				<meeting><address><addrLine>positioned_at</addrLine></address></meeting>
		<imprint>
			<date>14</date>
		</imprint>
	</monogr>
	<note>UniversityOfOxford, o:ranked_in, r:Guardian&gt;</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">WebTables: exploring the power of tables on the web</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.14778/1453856.1453916</idno>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endowment</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="538" to="549" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Uncovering the relational web</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">C</forename><surname>Berkeley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 11th Int. W. on Web and Databases</title>
				<meeting>11th Int. W. on Web and Databases</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Knowledge graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Blomqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cochez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Melo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gutierrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E L</forename><surname>Gayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Neumaier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Polleres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-C</forename><forename type="middle">N</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Rashid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schmelzeisen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sequeda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zimmermann</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Information extraction meets the semantic web: a survey</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Martinez-Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Lopez-Arevalo</surname></persName>
		</author>
		<idno type="DOI">10.3233/SW-180333</idno>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="255" to="335" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ten years of WebTables</title>
		<author>
			<persName><forename type="first">Lee</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Madhavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wu</surname></persName>
		</author>
		<idno type="DOI">10.14778/3229863.3240492</idno>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endowment</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="2140" to="2149" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Web table extraction, retrieval, and augmentation: a survey</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<idno type="DOI">10.1145/3372117</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Intell. Syst. Technol</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">On extracting data from tables that are encoded using html</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Roldán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jiménez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Corchuelo</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.knosys.2019.105157</idno>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">190</biblScope>
			<biblScope unit="page">105157</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Hurst</surname></persName>
		</author>
		<title level="m">The interpretation of tables in texts</title>
				<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Layout and language: challenges for table understanding on the web</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hurst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Int. W. on Web Document Analysis</title>
				<meeting>Int. W. on Web Document Analysis</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="27" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Table extraction and understanding for scientific and enterprise applications</title>
		<author>
			<persName><forename type="first">D</forename><surname>Burdick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Danilevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Evfimievski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Katsis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.14778/3415478.3415563</idno>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="3433" to="3436" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Table understanding approaches for extracting knowledge from heterogeneous tables</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bonfitto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Casiraghi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mesiti</surname></persName>
		</author>
		<idno type="DOI">10.1002/widm.1407</idno>
	</analytic>
	<monogr>
		<title level="j">WIREs Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page">e1407</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Web-scale table census and classification</title>
		<author>
			<persName><forename type="first">E</forename><surname>Crestan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pantel</surname></persName>
		</author>
		<idno type="DOI">10.1145/1935826.1935904</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 4th ACM Int. Conf. on Web Search and Data Mining</title>
				<meeting>4th ACM Int. Conf. on Web Search and Data Mining</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="545" to="554" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Web table taxonomy and formalization</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">R</forename><surname>Lautert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Scheidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F</forename><surname>Dorneles</surname></persName>
		</author>
		<idno type="DOI">10.1145/2536669.2536674</idno>
	</analytic>
	<monogr>
		<title level="j">ACM SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="page" from="28" to="33" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Building the dresden web table corpus: a classification approach</title>
		<author>
			<persName><forename type="first">J</forename><surname>Eberius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Braunschweig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hentsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thiele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ahmadov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
		<idno type="DOI">10.1109/BDC.2015.30</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE/ACM 2nd Int. S. on Big Data Computing</title>
				<meeting>IEEE/ACM 2nd Int. S. on Big Data Computing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="41" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A large public corpus of web tables containing time and context metadata</title>
		<author>
			<persName><forename type="first">O</forename><surname>Lehmberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ritze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Meusel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<idno type="DOI">10.1145/2872518.2889386</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 25th Int. Conf. on World Wide Web</title>
				<meeting>25th Int. Conf. on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="75" to="76" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Understanding the semantic structures of tables with a hybrid deep neural network architecture</title>
		<author>
			<persName><forename type="first">K</forename><surname>Nishida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sadamitsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Higashinaka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsuo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 31st AAAI Conf. on Artificial Intelligence</title>
				<meeting>31st AAAI Conf. on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="168" to="174" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Deeptable: a permutation invariant neural network for table orientation classification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Habibi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Starlinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Leser</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10618-020-00711-x</idno>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="1963" to="1983" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Tomate: A heuristic-based approach to extract data from html tables</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Roldán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jiménez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Corchuelo</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ins.2021.04.087</idno>
	</analytic>
	<monogr>
		<title level="j">Information Sciences</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Deep contextualized word representations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of NAACL</title>
				<meeting>of NAACL</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.04606</idno>
		<title level="m">Enriching word vectors with subword information</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Tabular cell classification using pre-trained cell embeddings</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ghasemi-Gol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pujara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICDM.2019.00033</idno>
	</analytic>
	<monogr>
		<title level="m">2019 IEEE International Conference on Data Mining (ICDM)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="230" to="239" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Deeptable: a permutation invariant neural network for table orientation classification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ghasemi-Gol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pujara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Szekely</forename></persName>
		</author>
		<idno type="DOI">10.1007/s10115-020-01508-6</idno>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Matching web tables with knowledge base entities: from entity lookups to entity embeddings</title>
		<author>
			<persName><forename type="first">O</forename><surname>Hassanzadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rodriguez-Muro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Christophides</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-68288-4_16</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-319-68288-4_16" />
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2017</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">10587</biblScope>
			<biblScope unit="page" from="260" to="277" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Rdf2vec: Rdf graph embeddings for data mining</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ristoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-46523-4_30</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2016</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="498" to="514" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Global rdf vector space embeddings</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cochez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ristoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">P</forename><surname>Ponzetto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-68288-4_12</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2017</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="190" to="207" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Wikipedia2Vec: An efficient toolkit for learning and visualizing the embeddings of words and entities from Wikipedia</title>
		<author>
			<persName><forename type="first">I</forename><surname>Yamada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sakuma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Shindo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Takeda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Takefuji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsumoto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 2020 Conf. Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</title>
				<meeting>2020 Conf. Empirical Methods in Natural Language essing: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="23" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Exploiting semantic similarity for named entity disambiguation in knowledge graphs</title>
		<author>
			<persName><forename type="first">G</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Iglesias</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2018.02.011</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="page" from="8" to="24" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Doser -a knowledge-base-agnostic framework for entity disambiguation using semantic embeddings</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zwicklbauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Seifert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Granitzer</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-34129-3_12</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web. Latest Advances and New Domains</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="182" to="198" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Colnet: Embedding the semantics of web tables for column type prediction</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Jiménez-Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
		<idno type="DOI">10.1609/aaai.v33i01.330129</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. AAAI Conf. Artificial Intelligence</title>
				<meeting>AAAI Conf. Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="29" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Learning semantic annotations for tabular data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Jimenez-Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
		<idno type="DOI">10.24963/ijcai.2019/289</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. 28th Int. Joint Conf. Artificial Intelligence</title>
				<meeting>28th Int. Joint Conf. Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="2088" to="2094" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">On computing entity relatedness in wikipedia, with applications</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ponza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ferragina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chakrabarti</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.knosys.2019.105051</idno>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">188</biblScope>
			<biblScope unit="page">105051</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Table understanding using a rule engine</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shigarov</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2014.08.045</idno>
	</analytic>
	<monogr>
		<title level="j">Expert Syst. Appl</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Rule-based spreadsheet data transformation from arbitrary to relational tables</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shigarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mikhailov</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.is.2017.08.004</idno>
	</analytic>
	<monogr>
		<title level="j">Inform. Syst</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="123" to="136" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">TabbyXL: software platform for rule-based spreadsheet data extraction and transformation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shigarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Khristyuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mikhailov</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.softx.2019.100270</idno>
	</analytic>
	<monogr>
		<title level="j">SoftwareX</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">100270</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
