<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Nora Abdelmageed1,* , Jiaoyan Chen2, Vincenzo Cutrona3, Vasilis Efthymiou4, Oktie Hassanzadeh5, Madelon Hulsebos6, Ernesto Jiménez-Ruiz7,8, Juan Sequeda9 and Kavitha Srinivas5</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>SUPSI</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Switzerland</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>FORTH-ICS</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Greece</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IBM Research</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City, University of London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Friedrich Schiller University Jena</institution>
          ,
          <addr-line>Jena</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SIRIUS, University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Manchester</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>data.world</institution>
          ,
          <country country="US">US</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>SemTab 2022 was the fourth edition of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, successfully collocated with the 21st International Semantic Web Conference (ISWC) and the 17th Ontology Matching (OM) Workshop. SemTab provides a common framework to conduct a systematic evaluation of state-of-the-art systems. In this paper, we give an overview of the 2022's edition of the challenge and summarize the results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Tabular data</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Matching</kwd>
        <kwd>SemTab Challenge</kwd>
        <kwd>Semantic Table Interpretation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>
        Tabular data are the most frequent input to data analytics pipelines, thanks to their high storage
and processing efficiency. Also, the tabular format allows users to represent the information in a
compacted way, by exploiting the clear data structure defined by rows and columns. However, such
clear structure does not imply a clear understanding of the semantic structure (e.g., relationships
between columns), as well as the meaning of the content (e.g., if data are about a specific topic).
The lack of understanding hinders data analytics processes, requiring additional effort to properly
understand the data first. Gaining the semantic understanding is valuable for many applications,
including data cleaning, data mining, data integration, data analysis and machine learning, and
knowledge discovery. For example, the semantic understanding can help in assessing what kind
of transformations are more appropriate for a dataset, or which datasets can be integrated to
enable new analytics (e.g., marketing analysis) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>In addition to their efficiency, the huge availability of tabular data on the Web makes Web
tables a valuable source to consider for data miners (e.g., open data CSV files). Adding semantic
information to Web tables is useful for a wide range of applications, including web search,
question answering, and knowledge base construction.</p>
      <p>Tabular data to Knowledge Graph (KG) matching is the process of clarifying the semantic
meaning of a table by mapping its elements (i.e., cells, columns, rows) to semantic tags (i.e.,
entities, classes, properties) from KGs (e.g., Wikidata, DBpedia). The task difficulty increases
when table metadata (e.g., table captions, table description, or column names) being missing,
incomplete or ambiguous.</p>
      <p>The tabular data to KG matching process is typically broken down into the following tasks:
• cell to KG entity matching (CEA task),
• column to KG class matching (CTA task), and
• column pair to KG property matching (CPA task).</p>
      <p>
        Over the last decade several approaches made advances in addressing one or several of above
tasks, also constructing benchmark datasets ([
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ]). The creation of SemTab1 [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ] aimed
at putting this significant amount of work into a common framework, enabling the systematic
evaluation of state-of-the-art systems. The ambition is to make SemTab becoming the reference
challenge in the Semantic Web community, in the same way the OAEI2 is for the Ontology
Matching community.3
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The Challenge</title>
      <p>The SemTab 2022 challenge has been organised into two different tracks: the Accuracy Track,
which is the standard track proposed in previous editions; and the Datasets Track, which focuses
on applications in real-world settings where the output of matching systems can contribute. The
datasets track was also open to the submission of novel benchmark datasets. SemTab 2022 also
featured an Artifacts Availability Badge that was applicable to both tracks.</p>
      <sec id="sec-2-1">
        <title>2.1. Accuracy Track</title>
        <p>
          The Accuracy Track included 3 rounds, running from May 26 to October 15. Different target
KGs were used across rounds (see Table 1):
• Wikidata (WD, W) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]: https://zenodo.org/record/6643443
• Schema.org (SCH, S) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]: https://gittables.github.io/downloads/schema_20210528.pkl
• DBpedia (DBP, D) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]: http://downloads.dbpedia.org/wiki-archive/ (version 2016-10 &amp;
2022-03)
1http://www.cs.ox.ac.uk/isg/challenges/sem-tab/
2http://oaei.ontologymatching.org/
3http://ontologymatching.org/
2T
        </p>
        <sec id="sec-2-1-1">
          <title>HardTables</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>BiodivTab</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>GitTables</title>
          <p>The different rounds of SemTab 2022 have been organised to evaluate participating systems
on various datasets with variable difficulty. Unlike the previous editions of SemTab where the
rounds were run with the support of AIcrowd, this year, we asked the participants to submit their
solutions once per week via a Google Form. We created a submission form for each round and
we evaluated the submitted results at the beginning of each week during each round.</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>2.1.1. Datasets</title>
          <p>
            The different datasets used to run SemTab 2022 rounds are reported in Table 1, with some
statistics available in Table 2. Unlike the previous editions where the ground truth was hidden
from the participant, this year, we provided a partial ground truth data to the participants during
the challenge itself. The teams could use these ground-truth labels for evaluating their methods
locally. Thus, we report datasets’ statistics per split. All the datasets are available in Zenodo as
follows:
• Tough Tables (2T) [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]: a dataset featuring high-quality manually-curated tables with
non-obviously linkable cells, i.e., where values are ambiguous names, typos, and misspelled
entity names. These challenges are particularly relevant for the annotation of structured
legacy sources to existing KGs.
          </p>
          <p>
            Link: https://doi.org/10.5281/zenodo.7419275
• HardTables (HT) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]: datasets with tables generated using an improved version of our
data generator that creates realistic looking tables using SPARQL queries [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. It is the
largest dataset used in SemTab.
          </p>
          <p>
            Link: https://doi.org/10.5281/zenodo.7416036
• BiodivTab [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]: a dataset with tables from real-world biodiversity research datasets.
          </p>
          <p>Original tables have been adapted for the SemTab challenge. This year featured DBpedia
as a target KG instead of Wikidata</p>
          <p>
            Link: https://doi.org/10.5281/zenodo.7319654
• GitTables [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]: a large-scale corpus of tables extracted from CSV files in GitHub. The main
purpose of this dataset is to facilitate learning table representation models and applications
in e.g., data management. A subset of tables has been curated for benchmarking column
type detection methods in SemTab. The GitTables set for this SemTab edition was larger
than in 2021 to enable the training of potential data-driven methods.
          </p>
          <p>Link: https://zenodo.org/record/7091019
4Target column train and test splits for CTA are distributed across all tables.</p>
        </sec>
        <sec id="sec-2-1-5">
          <title>Validation</title>
          <p># Tables</p>
        </sec>
        <sec id="sec-2-1-6">
          <title>Avg. # Rows</title>
        </sec>
        <sec id="sec-2-1-7">
          <title>Avg. # Cols</title>
          <p># CTA Targets
# CPA Targets</p>
        </sec>
        <sec id="sec-2-1-8">
          <title>Test</title>
        </sec>
        <sec id="sec-2-1-9">
          <title>Tables #</title>
        </sec>
        <sec id="sec-2-1-10">
          <title>Avg. # Rows</title>
        </sec>
        <sec id="sec-2-1-11">
          <title>Avg. # Cols</title>
        </sec>
        <sec id="sec-2-1-12">
          <title>2.1.2. Participation</title>
        </sec>
        <sec id="sec-2-1-13">
          <title>2.1.3. Evaluation measures</title>
          <p>As per the previous editions, systems have been evaluated on a single annotation for each provided
target, for all the tasks; i.e., in CEA, target cells are to be annotated with a single entity from the
target KG; in CTA, target columns are to be annotated with a single type from the target KG (as
ifne-grained as possible).</p>
          <p>5.54
2.56
5.57
2.6</p>
          <p>2T
Round 2</p>
          <p>36
683.3
1, 373
4.31
4.42
81, 126
177, 453
97
111</p>
          <p>NA</p>
          <p>NA</p>
          <p>NA</p>
          <p>NA
where target annotations refer to the target cells for CEA, the target columns for CTA, and the
target column pairs for CPA. We consider an annotation as correct when it is included within the
ground truth set (a target cell usually has multiple annotations in the ground truth, because of
redirect and same-as links in KGs).</p>
          <p>
            Given the fine-grained type hierarchy in Wikidata, we adopted approximations of Precision and
Recall in the CTA evaluation [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Approximations adapt their numerators to consider partially
correct annotations, i.e., annotations that are ancestors or descendants of the ground truth (GT)
classes. The correctness score  of a CTA annotation  considers the distance between the
annotation and the GT classes in the type hierarchy, and it is defined as
          </p>
          <p>⎧⎪0.8( ), if  is in GT, or an ancestor of the GT, with ( ) ≤ 5
cscore( ) = ⎨0.7( ), if  is a descendant of the GT, with ( ) ≤ 3</p>
          <p>⎪⎩0, otherwise;
where ( ) is the shortest distance to one of the GT classes (as for CEA, also CTA GT columns
may have multiple classes). For example, ( ) = 0 if  is a class in the ground truth (cscore( ) =
1), and ( ) = 2 if  is a grandchild of a class in the ground truth (cscore( ) = 0.49). Types
in the higher level(s) of the KG type hierarchy are not considered in the GT (e.g., Q35120
[entity] in Wikidata). Given the correctness score , approximated Precision (AP),
Recall (AR), and F1-score (AF1) for the CTA evaluation are as follows:
5
7
NA
7</p>
          <p>NA
5
4
NA
5
(1)
(2)
5
6
 =</p>
          <p>∑︀ ( )
|System Annotations|
,  =</p>
          <p>
            ∑︀ ( )
|Target Annotations|
,  1 =
2 ×  × 
 + 
(3)
2.1.4. Results
that the dataset is almost the same as in SemTab 2020.5 The BiodivTab and GitTables datasets
brought additional complexity in Round 3, highlighting that real-world tables are challenging.
CEA task. Results for the CEA task are reported in Figure 1 for all the datasets. In Round 1,
almost all the systems performed well on the HardTables dataset (automatically generated).
Starting from Round 2, datasets have been selected to increase the number of tricky cases to solve.
We used a new version of HardTables, mainly built of tables that we knew participants failed to
annotate in previous rounds and editions, in addition to Tough Tables. As expected, performance
on HardTable decreases, showing that systems still fail in annotating tricky cases (even if they
have been already seen); by comparing results on Tough Tables with previous SemTab editions,
more systems achieved an F1-score over 0.8, possibly because of the availability of the validation
set. In this round, systems focused on Wikidata-based datasets; indeed, just 4 out of 8 systems
submitted a solution for the DBpedia-based Tough Table dataset. In Round 3, the complexity
brought by the (relatively small) tables in the BiodivTab dataset still represents a new problem
to solve, showing a reduced performance by all systems. However, the scores are generally
improved compared to the last year’s version of such a dataset [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. This year, the top F1-score is
91% by KGCODE-Tab compared to 60% by JenTab last year for CEA task.
          </p>
          <p>
            CTA task. In Figure 2, the results in the CTA task resemble the trend already seen from the CEA
results. This is an indicator that most of the systems solve the CTA task based on annotations
found in the CEA. Additional challenges have been included in Round 3 with the GitTables
dataset, where we can see a critical performance drop for all the involved systems. The column
type annotations from DBpedia seem harder than annotations from Schema.org. However, the
performance of systems for both ontologies improved compared to last year [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]: the top F1-score
for the CTA task for Schema.org is 66% compared to 19% last year, and for DBpedia 59% versus
0.04%. BiodivTab seems tough for s-elbat and SemInt, and moderate for JenTab. However,
other systems achieved very good results. It is worth emphasising that, given the general picture
provided by the results in CTA, more research is needed to make existing systems able to deal
with real-world tables, where the cells may be missing a correspondence to the target KG.
CPA task. Results for the CPA task are plotted in Figure 3. Currently, only HardTables provides
a GT for CPA. Results are overall positive for all the systems, with a slightly general decrease
5Minor changes to adapt Tough Tables to SemTab 2022: some Wikidata targets have been updated to the Wikidata
version adopted in SemTab 2022; the original dataset has been split into validation and test sets.
0.8
re0.6
o
c
S
-10.4
F
0.2
0.0
1.0
0.8
from Round 1 to Round 2 for all participants. Again, this behavior can be explained by the fact
that most systems use the CEA results to solve the CPA task, and the CEA scores for this dataset
are high, overall.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Datasets Track</title>
        <p>
          This new track aims at addressing applications in real-world settings that take advantage of the
output of the matching systems. Challenging dataset proposals have also been accepted and may
be included in future editions of SemTab.
2.2.1. Results
We opened the call for this track on 25 August 2022. We received four submissions, three of
them were accepted and the fourth one was accepted as a poster (2 pages) since it is still at a
0.8
preliminary stage. We give an overview of each of them as follows:
• Wikary [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] introduces a new gold standard for the Semantic Table Interpretation tasks.
        </p>
        <p>
          It proposes a promising expansion from binary to n-ary tuples to reflect the conditional
properties of certain (object, predicate, subject) predicates. This will likely be helpful for
matching tables with n-ary entities and lower KG coverage. Wikary contributes a large,
diverse, multilingual and metadata-rich set of Wikipedia tables that are matched to n-ary
statements and qualifiers from Wikidata. A subset of the dataset underwent a manual
quality evaluation by asking annotators to annotate some tables.
• MammoTab [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] is a benchmark with a large number of tables extracted from Wikipedia,
and annotations of Wikidata entities and semantic types (classes), and NIL (not matched
with any KG entities). In comparison with state-of-the-art datasets, the annotations consider
more key challenges including disambiguation, homonym, matching, NIL-mentions, literal
and named-entity, missing context, etc. The annotations can be used for the CEA and CTA
tasks of SemTab, but not for CPA. Such dataset is generated automatically, allowing it to
be of a large scale (almost a million tables). The authors reported evaluation results on one
of the top-performing SemTab participating systems, MTab [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], showing that this dataset
is more challenging than previous SemTab benchmarks.
• SOTAB [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] is a benchmark dataset for the CTA and CPA tasks of SemTab, generated
from the WDC Schema.org table corpus. The dataset consists of 88.5k tables, annotated
with Schema.org as a target KG, exploiting the already existing annotations of the WDC
table corpus. The subset of 88.5k tables was selected from the entire corpus, considering
the language of the tables (targeting English tables mostly), columns with missing values,
columns with heterogeneous formatting (e.g., date values expressed in different formats
in the same table column), and corner cases (i.e., columns containing cell values that are
tough to disambiguate). The dataset is split into train/validation/test sets, as usual, but more
than that, it is also further divided into more test sets, targeting different challenges. For
example, there is a test set just for addressing the missing values challenge, another one for
corner cases, and another one for heterogeneous formats. In addition to those splittings,
the authors offer a small subset of the training data to check the effectiveness of systems
that are trained with fewer examples.
• FCTables [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] is an interesting dataset that has the potential to be used in the SemTab
challenge. It presents a construction for Food Composition Tables (FCT) benchmark from
various data sources. The authors have the initiative of publishing such tables in CSV file
format which is unlike the previous sources (INFOODS and LanguaL) both have tables
in PDF. The benchmark covers various countries and languages. The authors highlighted
potential applications for such a benchmark. Such benchmark is currently accepted as a
poster paper (2 pages) since it is unclear how ready the dataset is in order to be included in
the challenge. For example, the annotations for CTA, CEA and CPA are in progress.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Artifacts Availability Badge</title>
        <p>In 2021, SemTab included a new track focusing on system usability. The main goal of this
track was to mitigate a pain point in the community: the lack of publicly available, easy-to-use,
and generic solution to address the needs of a variety of applications and settings. In 2022, the
usability track was replaced by the Artifact Availability Badge, that applies to both tracks in 2022.
This badge is given to submissions (regardless of the track) if all dependencies are verified to be
accessible and sufficiently reusable. The goal of this badge is to motivate authors to publish and
document their code and data, so that others can (re)use these artifacts and potentially reproduce
the results.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Evaluation measures</title>
          <p>The criteria used to assess submissions for eligibility of the Artifacts Availability Badge are:
• Publicly accessible data (if applicable).
• Publicly accessible source code.
• Clear documentation of the code and data.</p>
          <p>• Open-source dependencies.
2.3.2. Results
Almost all core participants obtained good results in this track, by performing well on at least
two criteria (open-source code and clear documentation). We report the evaluation details in
Table 5. In general, tools requirements vary in complexity, but they are reasonable overall
(e.g., pre-processing required, like creating new indexes or embeddings). JenTab provides its
pre-computed lookups and indices, thus, it has a checkmark under open-source data. Considering
the other criteria, the evaluation panel concluded that most of the tools are pre-configured and can
potentially be used out of the box: for example, JenTab has been packaged with Docker to ease
deployment on local premises. In addition, JenTab is the only system released as open source
under a permissive license (Apache 2.0).</p>
          <p>For the Datasets Track, all submissions have open-source data and code and vary in the details
of the provided documentation. FCTables is an exception, since the released code under its
GitHub repository is for the table parsing component only, and not for their entire pipeline.
2.4. Awards
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
3
5
As in previous editions, the best systems in each track were awarded across different tracks:
• Accuracy Track: KGCODE-TAB and DAGOBAH (1 prize) were the highest performing
systems in most of the tasks, showing appreciable improvements over previous years.
• Dataset Track: SOTAB won the first prize.
• Artifacts Availability Badge: SOTAB and Wikary got the badge from the dataset track,
while JenTab was the only system to get this badge.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Lessons Learned and Future Work</title>
      <p>Challenging HT Datasets. We have been using the same automated dataset generation process,
with some variations that make it more challenging, since the first SemTab challenge. However,
we applied a kind of filtration step to control the balance between the easy and hard cases. Unlike
the previous editions of the challenge, this year the average F1-score is below 90% for the HT
datasets (see Table 4). This proves the effectiveness of our filtration process and makes the dataset
much harder to solve.</p>
      <p>Ground-truth Quality. An accurate evaluation of systems is important for any benchmark or
challenge and relies on the quality of the ground truth annotations. SemTab features diverse sets
of tables that all are extracted and annotated differently. Some annotation procedures may yield
inconsistent or erroneous annotations, which introduces noise in the development and evaluation
of systems. Moreover, obtaining perfect ground truth for tables is hard as cell entities (CEA),
column types (CTA), and column pair relations (CPA) could possibly correspond to multiple
labels due to, for example, synonymy or hierarchy relations. In line with these challenges, a
few participants indicated that some ground truth annotations in SemTab datasets might be
questionable as well. This motivates an effort to improve the ground truth annotations for future
editions, possibly in collaboration with the community, as suggested by a participant.
Artifacts Availability Badge. The introduction of the Artifacts Availability Badge extends the
Usability Track from SemTab 2021. It encouraged the participating systems to provide publicly
accessible resources. Our goal was exactly to emphasize this, despite the competitive nature that
a challenge may have. We note that the current conditions for receiving this badge may have been
too restrictive, so we are considering providing more badges of a narrower scope in the future
(e.g., one badge for publicly accessible code, one for reproducible results, etc).
Dataset track. We believe that the call of the dataset track has grasped more attention from
the community by introducing their own datasets. Compared to the last year, “Applications
Track”, this year we received more contributions, four datasets, while last year we had only two
submissions. Not all of them are ready to be used in the challenge, but they show a promising
interest within the community. Such contributions from the community like Wikary and SOTAB
help in extending the SemTab benchmark with new challenges that are hard to reproduce in
synthetic datasets like HT. Thus, this new track has been an important addition to SemTab.
Visibility &amp; Increased Impact. SemTab gained an increasing and broader attention from the
community. This year, before the official start of the challenge, we presented SemTab in the
Knowledge Graph Construction (KGC)6 Workshop, co-hosted with the ESWC conference. In
addition, SemTab has contributed for four years to the Ontology Matching (OM)7 workshop,
co-hosted with the ISWC conference. Such dissemination activities, in combination with the
new Datasets Track, resulted in more contributions to SemTab 2022 overall. The proceedings in
2022 contains 12 papers (vs. 8 in 2021) with four of them belonging to the Datasets track. This
shows diverse artifacts from SemTab 2022. We also plan to continue the challenge and propose
a workshop with KG matching specificity. Either by organizing our own workshop or by joining
one of the existing ones.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>We thank the challenge participants, the ISWC &amp; OM organisers, and our sponsor IBM Research
that played a key role in the success of SemTab. We also thank Paul Groth and Çag˘atay
Demiralp for their contribution to GitTables, and Sirko Schindler and Birgitta König-Ries for their
contribution to BiodivTab. This work was also supported by the SIRIUS Centre for Scalable Data
Access (Research Council of Norway), Samsung Research UK, the EPSRC projects UK FIRES
and ConCur, and the HFRI project ResponsibleER (No 969). Finally, we like to acknowledge that
the organization was greatly simplified by using the EasyChair conference management system
and the CEUR-WS.org open-access publication service.
6https://kg-construct.github.io/workshop/2022/
7http://www.om2022.ontologymatching.org/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Paoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Košmerlj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Perales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <article-title>Semantically-Enabled Optimization of Digital Marketing Campaigns</article-title>
          , in: International Semantic Web Conference (ISWC), Springer,
          <year>2019</year>
          , pp.
          <fpage>345</fpage>
          -
          <lpage>362</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>030</fpage>
          -30796-7_
          <fpage>22</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -30796-7\_
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Limaye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Annotating and searching web tables using entities, types and relationships</article-title>
          ,
          <source>VLDB Endowment 3</source>
          (
          <year>2010</year>
          )
          <fpage>1338</fpage>
          -
          <lpage>1347</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lehmberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Matching HTML Tables to DBpedia, in
          <source>: Proceedings of the 5th International Conference on Web Intelligence</source>
          , Mining and Semantics,
          <string-name>
            <surname>WIMS</surname>
          </string-name>
          , ACM,
          <year>2015</year>
          , pp.
          <volume>10</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          :
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lehmberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Meusel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <article-title>A large public corpus of web tables containing time and context metadata</article-title>
          ,
          <source>in: WWW</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez-Muro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Christophides</surname>
          </string-name>
          ,
          <article-title>Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings</article-title>
          , in: ISWC, volume
          <volume>10587</volume>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>277</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , K. Srinivas,
          <string-name>
            <surname>SemTab</surname>
          </string-name>
          <year>2019</year>
          :
          <article-title>Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems</article-title>
          , in: The Semantic Web: ESWC, Springer International Publishing,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <source>Results of SemTab</source>
          <year>2020</year>
          ,
          <article-title>in: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 19th International Semantic Web Conference (ISWC</article-title>
          <year>2020</year>
          ),
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <source>Results of SemTab</source>
          <year>2021</year>
          ,
          <article-title>in: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 20th International Semantic Web Conference (ISWC</article-title>
          <year>2021</year>
          ), Virtual conference,
          <source>October</source>
          <volume>27</volume>
          ,
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3103</volume>
          /paper0.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: a free collaborative knowledge base</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Macbeth</surname>
          </string-name>
          , Schema.Org:
          <article-title>Evolution of Structured Data on the Web, Commun</article-title>
          . ACM
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          . URL: https://doi.org/10.1145/2844544. doi:
          <volume>10</volume>
          . 1145/2844544.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Ives,</surname>
          </string-name>
          <article-title>DBpedia: A Nucleus for a Web of Open Data</article-title>
          , in: The Semantic Web, Springer Berlin Heidelberg,
          <year>2007</year>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          , Tough Tables:
          <article-title>Carefully Evaluating Entity Linking for Tabular Data</article-title>
          , in: 19th
          <source>International Semantic Web Conference (ISWC)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schindler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>König-Ries</surname>
          </string-name>
          ,
          <article-title>BiodivTab: Semantic Table Annotation Benchmark Construction, Analysis, and New Additions</article-title>
          ,
          <source>in: Proceedings of the 17th International Workshop on Ontology Matching co-located with the 21st International Semantic Web Conference (ISWC</source>
          <year>2021</year>
          ),
          <article-title>CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          , Ç. Demiralp, P. Groth,
          <article-title>GitTables: A large-scale corpus of relational tables</article-title>
          ,
          <source>arXiv preprint arXiv:2106.07258</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , G. Zhang, C. Jiang,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>KGCODE-Tab Results for SemTab 2022, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          .,
          <article-title>From Heuristics to Language Models: A Journey Through the Universe of Semantic Table Interpretation with DAGOBAH, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEURWS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Chieregato, s-elBat: a Semantic Interpretation Approach for Messy taBle-s, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jiomekong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A. F.</given-names>
            <surname>Tagne</surname>
          </string-name>
          ,
          <article-title>Towards an Approach based on Knowledge Graph Refinement for Tabular Data to Knowledge Graph Matching, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          , S. Schindler,
          <article-title>JenTab: Do CTA solutions affect the entire scores?, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEURWS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>W.</given-names>
            <surname>Baazouzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kachroudi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Faiz</surname>
          </string-name>
          ,
          <article-title>Yet Another Milestone for Kepler-aSI at SemTab 2022, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mertens</surname>
          </string-name>
          ,
          <article-title>A low-resource approach to SemTab 2022, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dalal</surname>
          </string-name>
          , S. Jain,
          <source>SemInt at SemTab</source>
          <year>2022</year>
          ,
          <article-title>in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I.</given-names>
            <surname>Mazurek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wiewel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kruit</surname>
          </string-name>
          ,
          <article-title>Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marzocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          .,
          <article-title>MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , I. Yamada,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kertkeidkachorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ichise</surname>
          </string-name>
          , H. Takeda,
          <string-name>
            <surname>SemTab</surname>
          </string-name>
          <year>2021</year>
          :
          <article-title>Tabular Data Annotation with MTab Tool, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>K.</given-names>
            <surname>Korini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Peeters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Bizer, SOTAB: The WDC Schema.org Table Annotation Benchmark, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jiomekong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Etoga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Foko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Folefac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tsague</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Camara</surname>
          </string-name>
          ,
          <article-title>A large scale corpus of food composition tables, in: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab), CEUR-WS</article-title>
          .org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>