<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Wikidata-Driven CEA and CTA for Life Sciences Table Matching extending DREIFLUSS</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vishvapalsinhji</forename><surname>Parmar</surname></persName>
							<email>vishvapalsinhji.parmar@uni-passau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Data and Knowledge Engineering</orgName>
								<orgName type="institution">University of Passau Passau</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alsayed</forename><surname>Algergawy</surname></persName>
							<email>alsayed.algergawy@uni-passau.de</email>
							<affiliation key="aff0">
								<orgName type="department">Chair of Data and Knowledge Engineering</orgName>
								<orgName type="institution">University of Passau Passau</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Wikidata-Driven CEA and CTA for Life Sciences Table Matching extending DREIFLUSS</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">4C973E6C16B4A48A44907BABBF23759F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:03+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Table Matching</term>
					<term>Cell Entity Annotation (CEA)</term>
					<term>Column Type Annotation (CTA)</term>
					<term>Knowledge Discovery</term>
					<term>Data Integration</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In our previous work for the SemTab 2023 challenge, we presented DREIFLUSS, a minimalist approach utilizing machine learning models and sampling techniques to tackle Column Property Annotation (CPA) and Column Type Annotation (CTA) tasks. Building on this groundwork, this paper shifts focus for the SemTab 2024 challenge by harnessing the semantic capabilities of the Wikidata knowledge graph to address Cell Entity Annotation (CEA) and CTA tasks. Our approach leverages optimized preprocessing and querying techniques with the Wikidata API 1 , leading to significant improvements in the accuracy and efficiency of table annotations. We achieved F1 scores of 93.20% for CEA and 61.50% for CTA on the tBiodivL-Horizontal dataset, along with an F1 score of 92.50% for CEA on the tBiomedL-Horizontal dataset. These results highlight the promise of knowledge graph-based methods in refining table-matching processes, laying the groundwork for future research that combines machine learning techniques with knowledge graph-driven strategies to achieve more robust annotation outcomes.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Matching tables to knowledge graphs, a vital aspect of data integration and knowledge discovery, has gained significant attention due to the proliferation of digital information. It involves harmonizing information across different tables, which is crucial for extracting valuable insights. With millions of high-quality tables available on the Internet-a number that continues to rise due to advancements in automated data extraction and the growing reliance on structured data across various sectors, including business, academia, and government <ref type="bibr" target="#b0">[1]</ref>-effective table matching is more important than ever.</p><p>The SemTab Challenge 1 has emerged as a leading competition that pushes the frontiers of table understanding and annotation. In the 2023 edition, we introduced DREIFLUSS, a minimalist approach that utilized machine learning models and strategic sampling techniques to address the tasks of Column Property Annotation (CPA) and CTA <ref type="bibr" target="#b1">[2]</ref>. This approach demonstrated the effectiveness of using data-driven techniques to achieve high accuracy in semantic table annotations. Building on this foundation, the 2024 SemTab challenge presented an opportunity to explore a different dimension of table annotation by leveraging the semantic richness of knowledge graphs. In this work, we extend the DREIFLUSS methodology by utilizing the Wikidata knowledge graph to tackle the tasks of CEA and CTA. Unlike the previous machine learning-based approach <ref type="bibr" target="#b1">[2]</ref>, this paper focuses on using the Wikidata API to extract and integrate semantic labels, which significantly enhances the precision and efficiency of table annotations.</p><p>By employing a knowledge graph-driven strategy, our approach showcases the potential of semantic resources like Wikidata in refining table matching processes. This shift allows for the exploration of new methods in table annotation, underscoring the importance of adaptability and scalability in today's data-driven landscape <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. The insights gained from this exploration also lay the groundwork for future research that combines knowledge graph-based techniques with machine learning approaches to further improve table annotation outcomes. The rapid growth of structured data on the web presents both immense opportunities for knowledge discovery and significant challenges. Each table often comes with a unique structure, schema, and notation, requiring advanced methods for understanding and harmonization. Competitions like SemTab play a vital role in addressing these challenges by advancing the capabilities of table understanding and annotation. The critical tasks of CTA and CEA are central to achieving comprehensive table comprehension, efficient data integration, and effective knowledge discovery.</p><p>To address these needs, our current methodology leverages pre-existing semantic resources, specifically focusing on Wikidata to enhance the table annotation process. This approach demonstrates the advantages of using a knowledge graph-based strategy to improve annotation accuracy and efficiency. Moreover, it provides inspiration for future work that could integrate machine learning models with semantic resources to develop more robust and adaptable solutions for table annotation challenges. This work focuses on datasets from Life Sciences, such as those in biodiversity and biomedicine, where accurate table annotation is critical for knowledge discovery and data integration in domains like healthcare and biology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Since its inception in 2019, the SemTab challenge has been instrumental in advancing the field of semantic table interpretation, which focuses on understanding and annotating tabular data with semantic information. In the inaugural year, Oliveira and d'Aquin introduced "ADOG" <ref type="bibr" target="#b4">[5]</ref>, a system that utilizes ontologies for data annotation. Complementing this, Cremaschi et al. presented "MantisTable" <ref type="bibr" target="#b5">[6]</ref>, an innovative system for automatic semantic table interpretation. Another significant contribution was made by Thawani et al., who focused on CTA and CPA tasks, developing a method to link entities to knowledge graphs for inferring column types and properties <ref type="bibr" target="#b6">[7]</ref>.</p><p>The challenge evolved in 2020 with Huynh et al.'s enhanced version of "DAGOBAH" <ref type="bibr" target="#b7">[8]</ref>, which highlighted scalable annotations for large datasets. Concurrently, Abdelmageed and Schindler introduced "JenTab" <ref type="bibr" target="#b8">[9]</ref>, a system designed to align tabular data with knowledge graphs, bridging the gap between structured and unstructured data. By 2021, the challenge saw refinements in previous systems, with "DAGOBAH" <ref type="bibr" target="#b9">[10]</ref> being optimized for more efficient semantic annotations, and "MantisTable V" <ref type="bibr" target="#b10">[11]</ref> offering a novel approach to table interpretation. Systems like "s-elBat" by Cremaschi et al. <ref type="bibr" target="#b11">[12]</ref> further explored the challenges of interpreting real-world, messy datadata. The 2022 edition of the challenge introduced specialized datasets such as "SOTAB" <ref type="bibr" target="#b12">[13]</ref> and "MammoTab" <ref type="bibr" target="#b13">[14]</ref>, which closely aligned with the 2023 tasks focusing on Schema.org annotations. In the SemTab 2023 challenge, we introduced DREIFLUSS, a minimalist approach for table matching that leveraged machine learning techniques to perform CTA and CPA tasks using knowledge graphs such as Schema.org and DBpedia <ref type="bibr" target="#b1">[2]</ref>. While this approach was effective, it operated within the constraints of a limited number of labels, with Schema.org and DBpedia offering a label set ranging from 46 to 105. For the SemTab 2024 challenge, we have shifted our focus towards the CEA and CTA tasks using the much larger and more semantically rich Wikidata knowledge graph. Given Wikidata's vast label set and comprehensive coverage, we developed a new approach to tackle these tasks using proper techniques leveraging Wikidata API.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Tasks</head><p>The second round of the SemTab challenge more specifically Accuracy Track emphasizes many tasks out of those we are focusing on two core tasks: CEA and CTA. These tasks aim to enhance table comprehension by assigning specific labels to cells and columns, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Cell Entity Annotation (CEA)</head><p>CEA involves linking cell values to specific entities from a knowledge base, such as people, places, or organizations. This process enriches the semantic understanding of the table's content, improving data retrieval, integration, and knowledge discovery. Properly annotating cells with relevant entities is crucial for tables with ambiguous or abbreviated terms, which could otherwise lead to misinterpretation. CEA enhances the quality and utility of structured data by ensuring that each cell is connected to a contextually accurate entity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Column Type Annotation (CTA)</head><p>CTA focuses on categorizing columns by associating them with specific semantic labels that describe their content or purpose. This process involves attributing appropriate labels to columns based on their content, using labels derived from knowledge graphs such as DBpedia and Schema.org. CTA facilitates efficient data integration and enables downstream applications to understand table structure and semantics, proving essential for tasks like data cleaning, schema matching, and query optimization. By providing insights into each column's intended purpose, CTA improves data understanding and analysis.</p><p>Together, CEA and CTA tasks aim to enhance table matching and comprehension. These tasks add semantic richness to tables, aiding in data integration, knowledge discovery, and other applications. The following sections will explore the datasets used for CEA and CTA, the experimental setup, the results obtained, and the effectiveness of our approach in addressing these tasks within the SemTab challenge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Dataset</head><p>The SemTab 2024 competition<ref type="foot" target="#foot_0">2</ref> features three distinct challenge tracks, with our focus on the Accuracy track. Within this track, various datasets are provided, including WikidataTa-bles2024R1(R2), tBiodiv(L), and tBiomed(L), each consisting of two rounds. Our experiments specifically target the datasets from Round 2, namely tBiodivL<ref type="foot" target="#foot_1">3</ref> and tBiomedL <ref type="foot" target="#foot_2">4</ref> , both of which are publicly available on Zenodo. Having these large datasets our approach shows the feasibility in the scalability aspect of it.</p><p>For our study, we focused on the CEA and CTA tasks using these datasets. Each dataset is organized into two main subdirectories: entity and horizontal. Our experiments were conducted using the horizontal subdirectory, which is further divided into three subfolders: gt (ground truth), tables, and targets. The gt folder contains the ground truth annotations, the tables folder includes all possible ground truth annotations for the tables, and the targets folder lists all the targets requiring annotation (those without existing ground truth).</p><p>Both the biodiversity and biomedical datasets are provided in CSV format. For the Round 2 CEA and CTA tasks, the tBiomedL dataset includes 5,496 tables, while the tBiodivL dataset contains 1,616 tables. The target datasets, also in CSV format, were utilized for evaluation purposes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Methodology</head><p>To address the CEA and CTA tasks, we followed a detailed pipeline. The complete implementation, including code, is available on our GitHub repository <ref type="foot" target="#foot_3">5</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">CEA</head><p>For the CEA task, we began by loading the CSV file into a DataFrame to streamline processing. The dataset includes three columns: the table name, column index, and row index. To perform the annotation, specific cells were extracted from the table using the provided column and row indices, and these cell values were incorporated into the DataFrame. These values may vary, encompassing strings, paragraphs, and numeric data. Given that some cells contain multiple values, we decided to use only the first value from each cell to simplify the annotation process and mitigate potential ambiguities. The updated DataFrame, as shown in Table <ref type="table" target="#tab_0">1</ref>, reflects these adjustments.</p><p>The CEA process aims to link cell values from tabular data to corresponding entities in the Wikidata knowledge graph. This involves assigning a unique Wikidata Entity URI to each cell value, thereby enhancing the semantic enrichment and interoperability of the data.</p><p>The methodology for CEA includes the following steps:</p><p>1. Data Loading and Preparation: The CSV file was imported into a DataFrame with columns for the table name, column index, and row index. Cells were extracted based on these indices and added to the DataFrame. 2. Handling Multiple Values: Since some cells contained more than one value, we opted to use only the first value from each cell to streamline the annotation process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.1.">Rate Limiting and Caching</head><p>To adhere to the Wikidata API's rate limits, a RateLimiter class was created. This class ensures that API requests do not exceed the maximum allowed frequency, preventing throttling or denial of service. The rate limiter monitors recent API call timestamps and calculates the necessary wait time before making additional requests.</p><p>A caching mechanism was also employed using a Python defaultdict to store results from previous queries. This approach minimizes redundant API calls, thereby enhancing the overall efficiency of the annotation process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.2.">Entity Identification and URI Construction</head><p>To identify the corresponding Wikidata entity for each cell value, we defined the function get_wikidata_id(category_label). This function performs the following steps:</p><p>1. Checks if the entity ID for the given category label is available in the cache. If found, it returns the cached ID. 2. If the entity ID is not cached, it invokes the rate limiter's wait() method to comply with API rate limits. 3. Sends a GET request to the Wikidata API using the requests library with the appropriate search parameters, including the action type, format, language, and label. 4. If the response status is 200 (OK), it parses the JSON response to extract the entity ID. A valid ID is cached and returned; if not found or if the response is malformed, appropriate error messages are logged.</p><p>Upon obtaining a valid Wikidata ID, the construct_entity_uri(wikidata_id) function constructs the corresponding Wikidata Entity URI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.3.">Processing and Annotation of Tabular Data</head><p>The primary function for annotating tabular data is fetch_and_assign_wikidata_uri(category_label), which integrates the above steps to fetch and assign the Wikidata URI for each cell value. This function ensures that each value is a string, removes any leading or trailing whitespace, and then uses get_wikidata_id to retrieve the entity ID. If a valid ID is found, the corresponding URI is constructed; otherwise, None is returned.</p><p>To efficiently apply this function across the dataset, the process_row(row) function processes each row of the DataFrame. The parallel_apply(df, func, workers) function employs the ThreadPoolExecutor from Python's concurrent.futures module to enable parallel processing. This parallelization accelerates the annotation process by distributing the workload across multiple threads. The parallel_apply function was configured to use up to 20 worker threads to balance performance and resource utilization.</p><p>Finally, the annotated DataFrame, annotated_target_df, is produced by applying the process_row function to the input dataset (table_biodiv_cea_target) using parallel execution. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">CTA</head><p>The CTA process enhances the semantic understanding of dataset columns by mapping them to appropriate types or classes in the Wikidata knowledge graph.</p><p>For the CTA task, we started with a CSV file containing two columns: the first specifying the table name and the second providing the column index within the table. This file was loaded into a DataFrame for further processing. An example of the target dataset is shown in Table <ref type="table">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Example of the CTA Target Dataset</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table Name</head><p>Column Index</p><formula xml:id="formula_0">EGN060702I0010 1 EGN060702I0010 3 EGN060702I0031 1</formula><p>To perform the annotation, we extracted the specified columns from the indicated tables using the provided column indices. These columns were added to the DataFrame under a new column header, clean_column_values. The values in this column were cleaned to retain only unique entries, with multiple values separated by the delimiter "||". An example of the cleaned DataFrame is shown in Table <ref type="table" target="#tab_1">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1.">Caching and Rate Limiting</head><p>To optimize performance and avoid excessive requests, a local cache (wikidata_cache) was implemented. This cache consists of two components: id_cache for storing label-to-ID map- pings and related_cache for storing related entity IDs. A rate-limiting decorator was applied to ensure that no more than 10 requests per second are made, adhering to Wikidata's API rate limits and improving overall efficiency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2.">Entity Identification and Relation Mapping</head><p>The function get_wikidata_id is used to retrieve the Wikidata ID for each label in the clean_column_values. If the ID is not already present in the cache, the function sends a request to the Wikidata API and updates the cache with the result. Additionally, the function get_related_ids retrieves related IDs based on properties such as P31 (instance of) and P279 (subclass of), which are crucial for determining the semantic type or class of the column values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.3.">Processing and Annotation of Columns</head><p>The process_cell function processes each entry in the clean_column_values column. This function splits the values, filters out irrelevant entries, and deduplicates them. For each unique label, it retrieves the Wikidata ID and associated subclass IDs. These subclass IDs are then aggregated, and the most frequently occurring ones are selected as the final column type annotation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.4.">Cache Management</head><p>To maintain efficiency and reduce redundant API requests, the cache is saved to a file at the end of the script execution using the save_cache function. When the script is restarted, the load_cache function reloads the cache, preserving previously obtained results and ensuring more efficient subsequent executions. In summary, the CTA process involves extracting, cleaning, and annotating column data using the Wikidata knowledge graph, with caching and rate limiting employed to optimize performance and resource utilization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Results</head><p>We evaluated the performance of our methodology by applying it to the CEA and CTA tasks on datasets such as tBiodivL and tBiomedL. This evaluation utilized the target datasets provided by the SemTab organizers <ref type="foot" target="#foot_4">6</ref> . Our results underscore the effectiveness of our approach, particularly regarding F1 and Precision scores.</p><p>For the SemTab 2024 competition, we focused on two primary datasets: tBiodiv-Large-Relational and tBiomed-Large-Relational. Our methodology demonstrated strong performance, achieving F1 scores between 61% and 93% across both CTA and CEA tasks. These results are summarized in Table <ref type="table">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Precision, recall, and F1 scores for CEA and CTA tasks on tBiodiv-Large-Relational and tBiomed-Large-Relational datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset</head><p>Task F1 Score Precision </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Discussion</head><p>Our results demonstrate the effectiveness of our proposed methodology for CEA and CTA on the SemTab 2024 datasets. The methodology utilized pre-existing semantic resources from Wikidata to enhance table annotation tasks, showcasing significant improvements in both accuracy and efficiency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1.">Performance Insights</head><p>The CEA task achieved impressive F1 scores, reaching up to 93.20% for the tBiodiv-Large-Relational dataset and 92.50% for the tBiomed-Large-Relational dataset, indicating high precision in linking cell values to Wikidata entities. These high scores reflect the robustness of our system in identifying and annotating cell values accurately, which is crucial for integrating and enriching tabular data with semantic information.</p><p>In contrast, the CTA task showed a broader range of F1 scores, with the tBiodiv-Large-Relational dataset reaching 61.50%. While this score is lower compared to CEA, it still represents a significant achievement in classifying column types. The variability in CTA performance could be attributed to the complexity and diversity of column types across different datasets, which may affect the consistency of the annotations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2.">Methodological Contributions</head><p>Our approach leverages the rich semantic labels provided by Wikidata, enhancing the accuracy of table annotations by providing standardized and comprehensive semantic details. The integration of these labels allows for more precise and meaningful annotations, which improve the interoperability and usability of the annotated data.</p><p>The implementation of rate limiting and caching mechanisms has proven essential in managing API usage and optimizing performance. By reducing redundant API requests and adhering to rate limits, our system efficiently handles large-scale data processing, which is critical for real-world applications involving extensive datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3.">Future Work</head><p>Future research could focus on integrating additional knowledge graphs or domain-specific ontologies to overcome the limitations of relying solely on Wikidata. Enhancing the performance of the CTA task may benefit from the development of more advanced classification models or the inclusion of richer features from the datasets. Expanding the methodology to accommodate multilingual and domain-specific datasets could further broaden its applicability across diverse contexts and industries. Additionally, the current approach will be extended into a more comprehensive framework based on our previous work <ref type="bibr" target="#b1">[2]</ref>, allowing for scalability and the potential incorporation of machine learning techniques.</p><p>In conclusion, our methodology presents a sound approach in the field of table annotation, offering a scalable and effective approach to integrating semantic information into tabular data. The positive results achieved in both CEA and CTA tasks demonstrate the potential of combining pre-existing semantic resources with innovative processing techniques to enhance data interoperability and knowledge discovery.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Target DataFrame for CEA after Adding Cell Values</figDesc><table><row><cell>Table Name</cell><cell cols="2">Column Index Row Index</cell><cell>Cell Values</cell><cell>First Cell Value</cell></row><row><cell>EGN060702I0010</cell><cell>1</cell><cell>0</cell><cell>Marchamp, Kinly</cell><cell>Marchamp</cell></row><row><cell>EGN060702I0010</cell><cell>1</cell><cell>1</cell><cell>Saint-Maurice -de-Gourdans,...</cell><cell>Saint-Maurice -de-Gourdans</cell></row><row><cell>EGN060702I0010</cell><cell>1</cell><cell>2</cell><cell>Nivigne et Suran, ...</cell><cell>Nivigne et Suran</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Example of the DataFrame After Fetching and Cleaning Column Values</figDesc><table><row><cell>Table Name</cell><cell>Column Index</cell><cell>clean_column_values</cell></row><row><cell>EGN060702I0010</cell><cell>1</cell><cell>Marchamp || Saint-Maurice-de-Gourdans</cell></row><row><cell>EGN060702I0031</cell><cell>1</cell><cell>Category:Judiciary of Iran || Category:Judiciary of Ukraine</cell></row><row><cell>EGN060702I0072</cell><cell>2</cell><cell>Wikipedia:Vital articles/Level/4 || Wikipedia:Vital articles/Level/5</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://sem-tab-challenge.github.io/2024/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://zenodo.org/records/10283083</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://zenodo.org/records/10283119</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://github.com/DKEPassau/CEACTA24</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://sem-tab-challenge.github.io/2024/results.html</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Table understanding: Problem overview</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">O</forename><surname>Shigarov</surname></persName>
		</author>
		<idno type="DOI">10.1002/widm.1482</idno>
		<ptr target="https://doi.org/10.1002/widm.1482" />
	</analytic>
	<monogr>
		<title level="j">WIREs Data Mining Knowl. Discov</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">DREIFLUSS: A minimalist approach for table matching</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">R</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Algergawy</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3557/paper4.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab 2023, co-located with the 22nd International Semantic Web Conference, ISWC 2023</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">V</forename><surname>Efthymiou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Jiménez-Ruiz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Cutrona</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Hassanzadeh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Sequeda</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Srinivas</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Abdelmageed</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hulsebos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Khatiwada</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Korini</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Kruit</surname></persName>
		</editor>
		<meeting>the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab 2023, co-located with the 22nd International Semantic Web Conference, ISWC 2023<address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">November 6-10, 2023. 2023</date>
			<biblScope unit="volume">3557</biblScope>
			<biblScope unit="page" from="50" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Dbpedia -A crystallization point for the web of data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kobilarov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.websem.2009.07.002</idno>
		<ptr target="https://doi.org/10.1016/j.websem.2009.07.002" />
	</analytic>
	<monogr>
		<title level="j">J. Web Semant</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="154" to="165" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Schema.org: Evolution of structured data on the web</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">V</forename><surname>Guha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Brickley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Macbeth</surname></persName>
		</author>
		<idno type="DOI">10.1145/2857274.2857276</idno>
		<ptr target="https://doi.org/10.1145/2857274.2857276" />
	</analytic>
	<monogr>
		<title level="j">ACM Queue</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">10</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">ADOG -annotating data with ontologies and graphs</title>
		<author>
			<persName><forename type="first">D</forename><surname>Oliveira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Aquin</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-2553/paper1.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2553</biblScope>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Mantistable: an automatic approach for the semantic table interpretation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cremaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Avogadro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chieregato</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-2553/paper3.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2553</biblScope>
			<biblScope unit="page" from="15" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Entity linking to knowledge graphs to infer column types and properties</title>
		<author>
			<persName><forename type="first">A</forename><surname>Thawani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zafar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">T</forename><surname>Divvala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Qasemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Szekely</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pujara</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-2553/paper4.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2553</biblScope>
			<biblScope unit="page" from="25" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">DAGOBAH: enhanced scoring algorithms for scalable annotations of tabular data</title>
		<author>
			<persName><forename type="first">V</forename><surname>Huynh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chabot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Labbé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Monnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-2775/paper3.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2775</biblScope>
			<biblScope unit="page" from="27" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Jentab: Matching tabular data to knowledge graphs</title>
		<author>
			<persName><forename type="first">N</forename><surname>Abdelmageed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schindler</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-2775/paper4.pdf" />
	</analytic>
	<monogr>
		<title level="s">CEUR Workshop Proceedings</title>
		<imprint>
			<biblScope unit="volume">2775</biblScope>
			<biblScope unit="page" from="40" to="49" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">DAGOBAH: table and graph contexts for efficient semantic annotation of tabular data</title>
		<author>
			<persName><forename type="first">V</forename><surname>Huynh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chabot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Deuzé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Labbé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Monnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3103/paper2.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">3103</biblScope>
			<biblScope unit="page" from="19" to="31" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mantistable V: A novel and efficient approach to semantic table interpretation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Avogadro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cremaschi</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3103/paper7.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">3103</biblScope>
			<biblScope unit="page" from="79" to="91" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">s-elbat: A semantic interpretation approach for messy table-s</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cremaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Avogadro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chieregato</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3320/paper7.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3320</biblScope>
			<biblScope unit="page" from="59" to="71" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">SOTAB: the WDC schema.org table annotation benchmark</title>
		<author>
			<persName><forename type="first">K</forename><surname>Korini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Peeters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3320/paper1.pdf" />
	</analytic>
	<monogr>
		<title level="s">CEUR Workshop Proceedings</title>
		<imprint>
			<biblScope unit="volume">3320</biblScope>
			<biblScope unit="page" from="14" to="19" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Mammotab: A giant and comprehensive dataset for semantic table interpretation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Marzocchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cremaschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Avogadro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmonari</surname></persName>
		</author>
		<idno>WS.org</idno>
		<ptr target="https://ceur-ws.org/Vol-3320/paper3.pdf" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3320</biblScope>
			<biblScope unit="page" from="28" to="33" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
