<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Semantic Web Conference (ISWC), November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>CTA for Life Sciences Table Matching extending DREIFLUSS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vishvapalsinhji Parmar</string-name>
          <email>vishvapalsinhji.parmar@uni-passau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alsayed Algergawy</string-name>
          <email>alsayed.algergawy@uni-passau.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Data and Knowledge Engineering, University of Passau Passau</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Table Matching, Cell Entity Annotation (CEA)</institution>
          ,
          <addr-line>Column Type Annotation (CTA), Knowledge Discovery</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>1</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In our previous work for the SemTab 2023 challenge, we presented DREIFLUSS, a minimalist approach utilizing machine learning models and sampling techniques to tackle Column Property Annotation (CPA) and Column Type Annotation (CTA) tasks. Building on this groundwork, this paper shifts focus for the SemTab 2024 challenge by harnessing the semantic capabilities of the Wikidata knowledge graph to address Cell Entity Annotation (CEA) and CTA tasks. Our approach leverages optimized preprocessing and querying techniques with the Wikidata API 1, leading to significant improvements in the accuracy and eficiency of table annotations. We achieved F1 scores of 93.20% for CEA and 61.50% for CTA on the tBiodivL-Horizontal dataset, along with an F1 score of 92.50% for CEA on the tBiomedLHorizontal dataset. These results highlight the promise of knowledge graph-based methods in refining table-matching processes, laying the groundwork for future research that combines machine learning techniques with knowledge graph-driven strategies to achieve more robust annotation outcomes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Matching tables to knowledge graphs, a vital aspect of data integration and knowledge discovery,
has gained significant attention due to the proliferation of digital information. It involves
harmonizing information across diferent tables, which is crucial for extracting valuable insights.
With millions of high-quality tables available on the Internet—a number that continues to rise
due to advancements in automated data extraction and the growing reliance on structured
data across various sectors, including business, academia, and government [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]—efective table
matching is more important than ever.
      </p>
      <p>
        The SemTab Challenge1 has emerged as a leading competition that pushes the frontiers of
table understanding and annotation. In the 2023 edition, we introduced DREIFLUSS, a minimalist
approach that utilized machine learning models and strategic sampling techniques to address
the tasks of Column Property Annotation (CPA) and CTA [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This approach demonstrated
the efectiveness of using data-driven techniques to achieve high accuracy in semantic table
CEUR
Workshop
Proceedings
annotations. Building on this foundation, the 2024 SemTab challenge presented an opportunity
to explore a diferent dimension of table annotation by leveraging the semantic richness of
knowledge graphs. In this work, we extend the DREIFLUSS methodology by utilizing the
Wikidata knowledge graph to tackle the tasks of CEA and CTA. Unlike the previous machine
learning-based approach [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], this paper focuses on using the Wikidata API to extract and
integrate semantic labels, which significantly enhances the precision and eficiency of table
annotations.
      </p>
      <p>
        By employing a knowledge graph-driven strategy, our approach showcases the potential of
semantic resources like Wikidata in refining table matching processes. This shift allows for the
exploration of new methods in table annotation, underscoring the importance of adaptability
and scalability in today’s data-driven landscape [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. The insights gained from this exploration
also lay the groundwork for future research that combines knowledge graph-based techniques
with machine learning approaches to further improve table annotation outcomes. The rapid
growth of structured data on the web presents both immense opportunities for knowledge
discovery and significant challenges. Each table often comes with a unique structure, schema,
and notation, requiring advanced methods for understanding and harmonization. Competitions
like SemTab play a vital role in addressing these challenges by advancing the capabilities of
table understanding and annotation. The critical tasks of CTA and CEA are central to
achieving comprehensive table comprehension, eficient data integration, and efective knowledge
discovery.
      </p>
      <p>To address these needs, our current methodology leverages pre-existing semantic resources,
specifically focusing on Wikidata to enhance the table annotation process. This approach
demonstrates the advantages of using a knowledge graph-based strategy to improve annotation
accuracy and eficiency. Moreover, it provides inspiration for future work that could integrate
machine learning models with semantic resources to develop more robust and adaptable
solutions for table annotation challenges. This work focuses on datasets from Life Sciences, such as
those in biodiversity and biomedicine, where accurate table annotation is critical for knowledge
discovery and data integration in domains like healthcare and biology.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Since its inception in 2019, the SemTab challenge has been instrumental in advancing the field
of semantic table interpretation, which focuses on understanding and annotating tabular data
with semantic information. In the inaugural year, Oliveira and d’Aquin introduced ”ADOG”
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a system that utilizes ontologies for data annotation. Complementing this, Cremaschi et al.
presented ”MantisTable” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], an innovative system for automatic semantic table interpretation.
Another significant contribution was made by Thawani et al., who focused on CTA and CPA
tasks, developing a method to link entities to knowledge graphs for inferring column types and
properties [7].
      </p>
      <p>The challenge evolved in 2020 with Huynh et al.’s enhanced version of ”DAGOBAH” [8],
which highlighted scalable annotations for large datasets. Concurrently, Abdelmageed and
Schindler introduced ”JenTab” [9], a system designed to align tabular data with knowledge
graphs, bridging the gap between structured and unstructured data. By 2021, the challenge saw
refinements in previous systems, with ”DAGOBAH” [ 10] being optimized for more eficient
semantic annotations, and ”MantisTable V” [11] ofering a novel approach to table interpretation.</p>
      <p>
        Systems like ”s-elBat” by Cremaschi et al. [12] further explored the challenges of interpreting
real-world, messy datadata. The 2022 edition of the challenge introduced specialized datasets
such as ”SOTAB” [13] and ”MammoTab” [14], which closely aligned with the 2023 tasks focusing
on Schema.org annotations. In the SemTab 2023 challenge, we introduced DREIFLUSS, a
minimalist approach for table matching that leveraged machine learning techniques to perform
CTA and CPA tasks using knowledge graphs such as Schema.org and DBpedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. While this
approach was efective, it operated within the constraints of a limited number of labels, with
Schema.org and DBpedia ofering a label set ranging from 46 to 105. For the SemTab 2024
challenge, we have shifted our focus towards the CEA and CTA tasks using the much larger
and more semantically rich Wikidata knowledge graph. Given Wikidata’s vast label set and
comprehensive coverage, we developed a new approach to tackle these tasks using proper
techniques leveraging Wikidata API.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Tasks</title>
      <p>The second round of the SemTab challenge more specifically Accuracy Track emphasizes many
tasks out of those we are focusing on two core tasks: CEA and CTA. These tasks aim to enhance
table comprehension by assigning specific labels to cells and columns, respectively.</p>
      <sec id="sec-3-1">
        <title>3.1. Cell Entity Annotation (CEA)</title>
        <p>CEA involves linking cell values to specific entities from a knowledge base, such as people,
places, or organizations. This process enriches the semantic understanding of the table’s
content, improving data retrieval, integration, and knowledge discovery. Properly annotating
cells with relevant entities is crucial for tables with ambiguous or abbreviated terms, which
could otherwise lead to misinterpretation. CEA enhances the quality and utility of structured
data by ensuring that each cell is connected to a contextually accurate entity.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Column Type Annotation (CTA)</title>
        <p>CTA focuses on categorizing columns by associating them with specific semantic labels that
describe their content or purpose. This process involves attributing appropriate labels to
columns based on their content, using labels derived from knowledge graphs such as DBpedia
and Schema.org. CTA facilitates eficient data integration and enables downstream applications
to understand table structure and semantics, proving essential for tasks like data cleaning,
schema matching, and query optimization. By providing insights into each column’s intended
purpose, CTA improves data understanding and analysis.</p>
        <p>Together, CEA and CTA tasks aim to enhance table matching and comprehension. These
tasks add semantic richness to tables, aiding in data integration, knowledge discovery, and
other applications. The following sections will explore the datasets used for CEA and CTA, the
experimental setup, the results obtained, and the efectiveness of our approach in addressing
these tasks within the SemTab challenge.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Dataset</title>
      <p>The SemTab 2024 competition2 features three distinct challenge tracks, with our focus on
the Accuracy track. Within this track, various datasets are provided, including
WikidataTables2024R1(R2), tBiodiv(L), and tBiomed(L), each consisting of two rounds. Our experiments
specifically target the datasets from Round 2, namely tBiodivL 3 and tBiomedL4, both of which
are publicly available on Zenodo. Having these large datasets our approach shows the feasibility
in the scalability aspect of it.</p>
      <p>For our study, we focused on the CEA and CTA tasks using these datasets. Each dataset is
organized into two main subdirectories: entity and horizontal. Our experiments were conducted
using the horizontal subdirectory, which is further divided into three subfolders: gt (ground
truth), tables, and targets. The gt folder contains the ground truth annotations, the tables folder
includes all possible ground truth annotations for the tables, and the targets folder lists all the
targets requiring annotation (those without existing ground truth).</p>
      <p>Both the biodiversity and biomedical datasets are provided in CSV format. For the Round
2 CEA and CTA tasks, the tBiomedL dataset includes 5,496 tables, while the tBiodivL dataset
contains 1,616 tables. The target datasets, also in CSV format, were utilized for evaluation
purposes.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>To address the CEA and CTA tasks, we followed a detailed pipeline. The complete
implementation, including code, is available on our GitHub repository 5.
5.1. CEA
For the CEA task, we began by loading the CSV file into a DataFrame to streamline processing.
The dataset includes three columns: the table name, column index, and row index. To perform
the annotation, specific cells were extracted from the table using the provided column and row
indices, and these cell values were incorporated into the DataFrame. These values may vary,
encompassing strings, paragraphs, and numeric data. Given that some cells contain multiple
values, we decided to use only the first value from each cell to simplify the annotation process
and mitigate potential ambiguities. The updated DataFrame, as shown in Table 1, reflects these
adjustments.</p>
      <p>The CEA process aims to link cell values from tabular data to corresponding entities in the
Wikidata knowledge graph. This involves assigning a unique Wikidata Entity URI to each cell
value, thereby enhancing the semantic enrichment and interoperability of the data.</p>
      <p>The methodology for CEA includes the following steps:
2https://sem-tab-challenge.github.io/2024/
3https://zenodo.org/records/10283083
4https://zenodo.org/records/10283119
5https://github.com/DKEPassau/CEACTA24
1. Data Loading and Preparation: The CSV file was imported into a DataFrame with
columns for the table name, column index, and row index. Cells were extracted based on
these indices and added to the DataFrame.
2. Handling Multiple Values: Since some cells contained more than one value, we opted
to use only the first value from each cell to streamline the annotation process.</p>
      <sec id="sec-5-1">
        <title>5.1.1. Rate Limiting and Caching</title>
        <p>To adhere to the Wikidata API’s rate limits, a RateLimiter class was created. This class ensures
that API requests do not exceed the maximum allowed frequency, preventing throttling or denial
of service. The rate limiter monitors recent API call timestamps and calculates the necessary
wait time before making additional requests.</p>
        <p>A caching mechanism was also employed using a Python defaultdict to store results from
previous queries. This approach minimizes redundant API calls, thereby enhancing the overall
eficiency of the annotation process.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.1.2. Entity Identification and URI Construction</title>
        <p>To identify the corresponding Wikidata entity for each cell value, we defined the function
get_wikidata_id(category_label). This function performs the following steps:
1. Checks if the entity ID for the given category label is available in the cache. If found, it
returns the cached ID.
2. If the entity ID is not cached, it invokes the rate limiter’s wait() method to comply with</p>
        <p>API rate limits.
3. Sends a GET request to the Wikidata API using the requests library with the appropriate
search parameters, including the action type, format, language, and label.
4. If the response status is 200 (OK), it parses the JSON response to extract the entity ID. A
valid ID is cached and returned; if not found or if the response is malformed, appropriate
error messages are logged.</p>
        <p>Upon obtaining a valid Wikidata ID, the construct_entity_uri(wikidata_id) function
constructs the corresponding Wikidata Entity URI.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.1.3. Processing and Annotation of Tabular Data</title>
        <p>The primary function for annotating tabular data is
fetch_and_assign_wikidata_uri(category_label), which integrates the above steps to
fetch and assign the Wikidata URI for each cell value. This function ensures that each value is a
string, removes any leading or trailing whitespace, and then uses get_wikidata_id to retrieve
the entity ID. If a valid ID is found, the corresponding URI is constructed; otherwise, None is
returned.</p>
        <p>To eficiently apply this function across the dataset, the process_row(row) function processes
each row of the DataFrame. The parallel_apply(df, func, workers) function employs the
ThreadPoolExecutor from Python’s concurrent.futures module to enable parallel
processing. This parallelization accelerates the annotation process by distributing the workload across
multiple threads. The parallel_apply function was configured to use up to 20 worker threads
to balance performance and resource utilization.</p>
        <p>Finally, the annotated DataFrame, annotated_target_df, is produced by applying the
process_row function to the input dataset (table_biodiv_cea_target) using parallel
execution.
5.2. CTA
The CTA process enhances the semantic understanding of dataset columns by mapping them to
appropriate types or classes in the Wikidata knowledge graph.</p>
        <p>For the CTA task, we started with a CSV file containing two columns: the first specifying the
table name and the second providing the column index within the table. This file was loaded
into a DataFrame for further processing. An example of the target dataset is shown in Table 2.</p>
        <p>To perform the annotation, we extracted the specified columns from the indicated tables
using the provided column indices. These columns were added to the DataFrame under a new
column header, clean_column_values. The values in this column were cleaned to retain only
unique entries, with multiple values separated by the delimiter ”||”. An example of the cleaned
DataFrame is shown in Table 3.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.2.1. Caching and Rate Limiting</title>
        <p>To optimize performance and avoid excessive requests, a local cache (wikidata_cache) was
implemented. This cache consists of two components: id_cache for storing label-to-ID
mappings and related_cache for storing related entity IDs. A rate-limiting decorator was applied
to ensure that no more than 10 requests per second are made, adhering to Wikidata’s API rate
limits and improving overall eficiency.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.2.2. Entity Identification and Relation Mapping</title>
        <p>The function get_wikidata_id is used to retrieve the Wikidata ID for each label in the
clean_column_values. If the ID is not already present in the cache, the function sends a
request to the Wikidata API and updates the cache with the result. Additionally, the function
get_related_ids retrieves related IDs based on properties such as P31 (instance of) and P279
(subclass of), which are crucial for determining the semantic type or class of the column values.</p>
      </sec>
      <sec id="sec-5-6">
        <title>5.2.3. Processing and Annotation of Columns</title>
        <p>The process_cell function processes each entry in the clean_column_values column. This
function splits the values, filters out irrelevant entries, and deduplicates them. For each unique
label, it retrieves the Wikidata ID and associated subclass IDs. These subclass IDs are then
aggregated, and the most frequently occurring ones are selected as the final column type
annotation.</p>
      </sec>
      <sec id="sec-5-7">
        <title>5.2.4. Cache Management</title>
        <p>To maintain eficiency and reduce redundant API requests, the cache is saved to a file at the
end of the script execution using the save_cache function. When the script is restarted, the
load_cache function reloads the cache, preserving previously obtained results and ensuring
more eficient subsequent executions.</p>
        <p>In summary, the CTA process involves extracting, cleaning, and annotating column data
using the Wikidata knowledge graph, with caching and rate limiting employed to optimize
performance and resource utilization.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>We evaluated the performance of our methodology by applying it to the CEA and CTA tasks on
datasets such as tBiodivL and tBiomedL. This evaluation utilized the target datasets provided by
the SemTab organizers6. Our results underscore the efectiveness of our approach, particularly
regarding F1 and Precision scores.</p>
      <p>For the SemTab 2024 competition, we focused on two primary datasets:
tBiodiv-LargeRelational and tBiomed-Large-Relational. Our methodology demonstrated strong performance,
achieving F1 scores between 61% and 93% across both CTA and CEA tasks. These results are
summarized in Table 4.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>Our results demonstrate the efectiveness of our proposed methodology for CEA and CTA on the
SemTab 2024 datasets. The methodology utilized pre-existing semantic resources from Wikidata
to enhance table annotation tasks, showcasing significant improvements in both accuracy and
eficiency.</p>
      <sec id="sec-7-1">
        <title>7.1. Performance Insights</title>
        <p>The CEA task achieved impressive F1 scores, reaching up to 93.20% for the
tBiodiv-LargeRelational dataset and 92.50% for the tBiomed-Large-Relational dataset, indicating high precision
in linking cell values to Wikidata entities. These high scores reflect the robustness of our system
in identifying and annotating cell values accurately, which is crucial for integrating and enriching
tabular data with semantic information.</p>
        <p>In contrast, the CTA task showed a broader range of F1 scores, with the
tBiodiv-LargeRelational dataset reaching 61.50%. While this score is lower compared to CEA, it still represents
a significant achievement in classifying column types. The variability in CTA performance
could be attributed to the complexity and diversity of column types across diferent datasets,
which may afect the consistency of the annotations.</p>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Methodological Contributions</title>
        <p>Our approach leverages the rich semantic labels provided by Wikidata, enhancing the accuracy
of table annotations by providing standardized and comprehensive semantic details. The
integration of these labels allows for more precise and meaningful annotations, which improve
the interoperability and usability of the annotated data.
6https://sem-tab-challenge.github.io/2024/results.html</p>
        <p>The implementation of rate limiting and caching mechanisms has proven essential in
managing API usage and optimizing performance. By reducing redundant API requests and adhering
to rate limits, our system eficiently handles large-scale data processing, which is critical for
real-world applications involving extensive datasets.</p>
      </sec>
      <sec id="sec-7-3">
        <title>7.3. Future Work</title>
        <p>
          Future research could focus on integrating additional knowledge graphs or domain-specific
ontologies to overcome the limitations of relying solely on Wikidata. Enhancing the performance
of the CTA task may benefit from the development of more advanced classification models or
the inclusion of richer features from the datasets. Expanding the methodology to accommodate
multilingual and domain-specific datasets could further broaden its applicability across diverse
contexts and industries. Additionally, the current approach will be extended into a more
comprehensive framework based on our previous work [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], allowing for scalability and the
potential incorporation of machine learning techniques.
        </p>
        <p>In conclusion, our methodology presents a sound approach in the field of table annotation,
ofering a scalable and efective approach to integrating semantic information into tabular
data. The positive results achieved in both CEA and CTA tasks demonstrate the potential of
combining pre-existing semantic resources with innovative processing techniques to enhance
data interoperability and knowledge discovery.
[7] A. Thawani, M. Hu, E. Hu, H. Zafar, N. T. Divvala, A. Singh, E. Qasemi, P. A. Szekely,
J. Pujara, Entity linking to knowledge graphs to infer column types and properties,
volume 2553 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 25–32. URL: https:
//ceur-ws.org/Vol-2553/paper4.pdf.
[8] V. Huynh, J. Liu, Y. Chabot, T. Labbé, P. Monnin, R. Troncy, DAGOBAH: enhanced scoring
algorithms for scalable annotations of tabular data, volume 2775 of CEUR Workshop
Proceedings, CEUR-WS.org, 2020, pp. 27–39. URL: https://ceur-ws.org/Vol-2775/paper3.pdf.
[9] N. Abdelmageed, S. Schindler, Jentab: Matching tabular data to knowledge graphs, volume
2775 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 40–49. URL: https://ceur-ws.
org/Vol-2775/paper4.pdf.
[10] V. Huynh, J. Liu, Y. Chabot, F. Deuzé, T. Labbé, P. Monnin, R. Troncy, DAGOBAH: table
and graph contexts for eficient semantic annotation of tabular data, volume 3103 of CEUR
Workshop Proceedings, CEUR-WS.org, 2021, pp. 19–31. URL: https://ceur-ws.org/Vol-3103/
paper2.pdf.
[11] R. Avogadro, M. Cremaschi, Mantistable V: A novel and eficient approach to semantic
table interpretation, volume 3103 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp.
79–91. URL: https://ceur-ws.org/Vol-3103/paper7.pdf.
[12] M. Cremaschi, R. Avogadro, D. Chieregato, s-elbat: A semantic interpretation approach
for messy table-s, volume 3320 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp.
59–71. URL: https://ceur-ws.org/Vol-3320/paper7.pdf.
[13] K. Korini, R. Peeters, C. Bizer, SOTAB: the WDC schema.org table annotation benchmark,
volume 3320 of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 14–19. URL: https:
//ceur-ws.org/Vol-3320/paper1.pdf.
[14] M. Marzocchi, M. Cremaschi, R. Pozzi, R. Avogadro, M. Palmonari, Mammotab: A giant and
comprehensive dataset for semantic table interpretation, volume 3320 of CEUR Workshop
Proceedings, CEUR-WS.org, 2022, pp. 28–33. URL: https://ceur-ws.org/Vol-3320/paper3.pdf.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Shigarov</surname>
          </string-name>
          ,
          <article-title>Table understanding: Problem overview</article-title>
          ,
          <source>WIREs Data Mining Knowl. Discov</source>
          .
          <volume>13</volume>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1002/widm.1482.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Algergawy</surname>
          </string-name>
          ,
          <article-title>DREIFLUSS: A minimalist approach for table matching</article-title>
          , in: V.
          <string-name>
            <surname>Efthymiou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Ruiz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Cutrona</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Abdelmageed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hulsebos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Khatiwada</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Korini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Kruit (Eds.),
          <article-title>Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          ,
          <source>SemTab</source>
          <year>2023</year>
          ,
          <article-title>co-located with the 22nd International Semantic Web Conference</article-title>
          ,
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2023</year>
          , Athens, Greece, November 6-
          <issue>10</issue>
          ,
          <year>2023</year>
          , volume
          <volume>3557</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3557</volume>
          /paper4.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <article-title>Dbpedia - A crystallization point for the web of data</article-title>
          ,
          <source>J. Web Semant</source>
          .
          <volume>7</volume>
          (
          <year>2009</year>
          )
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          . URL: https://doi.org/10.1016/j.websem.
          <year>2009</year>
          .
          <volume>07</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Macbeth</surname>
          </string-name>
          , Schema.org:
          <article-title>Evolution of structured data on the web</article-title>
          ,
          <source>ACM Queue 13</source>
          (
          <year>2015</year>
          )
          <article-title>10</article-title>
          . URL: https://doi.org/10.1145/2857274.2857276.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>d'Aquin, ADOG - annotating data with ontologies and graphs</article-title>
          , volume
          <volume>2553</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2553</volume>
          /paper1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chieregato</surname>
          </string-name>
          ,
          <article-title>Mantistable: an automatic approach for the semantic table interpretation</article-title>
          , volume
          <volume>2553</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2553</volume>
          /paper3.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>