<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>s-elBat: a Semantic Interpretation Approach for Messy taBle-s</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Cremaschi</string-name>
          <email>marco.cremaschi@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Avogadro</string-name>
          <email>roberto.avogadro@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Chieregato</string-name>
          <email>david.chieregato@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Semantic Table Interpretation</institution>
          ,
          <addr-line>Tabular Data, SemTab Challenge, Knowledge Graph</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Milan - Bicocca</institution>
          ,
          <addr-line>viale Sarca 336, Edificio U14, 20126, Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes s-elBat, a Semantic Table Interpretation approach. The approach inherits and improves the part of the techniques belonging to the MantisTable, an approach used and tested in previous editions of the SemTab challenge. s-elBat adds an innovative and optimised lookup approach for generating candidate entities for the annotation. Sheets or Microsoft Excel 2.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
Workshop
Proceedings
2askwonder.com/research/number-google-sheets-users-worldwide-eoskdoxav</p>
      <p>© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        The table-to-KG matching problem, also referred to as Semantic Table Interpretation (STI),
provides explicit semantic annotations (e.g., identifying and annotating entities in cells, their
types/classes and the connections/properties between entities), thus capturing knowledge from
tables [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. STI has recently collected much attention in the research community [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ] and
is a key step to enrich data [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and construct and extend Knowledge Graphs (KGs) from
semi-structured data [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        The input of STI is i) a well-formed and normalised relational table (i.e., a table with headers
and simple values, thus excluding nested and figure-like tables), and ii) a Knowledge Graph (KG)
(e.g., Wikidata, DBpedia) which describes real-world entities in the domain of interest (i.e., a
set of types, datatypes, predicates, instances, and the relations among them). The output is a
semantically annotated table, obtained by mapping its elements (i.e., cells/mentions, columns,
rows) to semantic tags (i.e., entities, types, properties) from KGs as shown in Figure 1. This
process is typically broken down into the following tasks: (i) cell/mentions to KG entity matching
(CEA task), (ii) column to KG type matching (CTA task), and (iii) column pair to KG property
matching (CPA task) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Title
Jurassic
World
Superman
Returns
Batman
Begins
Avatar</p>
      <p>Director
Colin
Trevorrow
Bryan Singer 21/06/2006
15/06/2005
18/12/2009
Christopher
Nolan
James</p>
      <p>Cameron
Q11424
(Film)</p>
      <p>Q5
(Human)
Q3512046
(Jurassic World)</p>
      <p>Q5145625
(Colin Trevorrow)</p>
      <p>Domestic
Distributor
Universal
Pictures
Warner
Bros.</p>
      <p>Warner
Bros.
20 Century
Fox</p>
      <p>DATE
12/06/2015
length
in min
124
154
140
162</p>
      <p>Worldwide</p>
      <p>gross
1670400637
391081192
371853783
2744336793
Q1762059
(Film production
company)</p>
      <p>Q3512046
(Universal Pictures)
P57 (director)</p>
      <p>P577 (publication date)</p>
      <p>P750 (distributed by)</p>
      <p>P2047 (duration)</p>
      <p>P2142 (box office)</p>
      <p>Subject column (S-column)
Named-Entity column (NE-column)
Literal column (LIT-column)</p>
      <p>MAIN TASKS
CEA: Entity reconciliation</p>
      <p>Cell Entity Annotation
CPA: Identification of relationships</p>
      <p>Column Property Annotation
CTA: Identification of types</p>
      <p>Column Type Annotation
INTEGER</p>
      <p>INTEGER
124
1670400637</p>
      <p>As depicted in Figure 1 the majority of entities in the Title column are of type Film.
publication_date can be identified as the property connecting entities in the Title column
with elements in the Year column.</p>
      <p>Unfortunately, explicit situations like the ones in the example are not so common; therefore,
we need to set up strategies and algorithms to address several issues. An excellent STI approach
must consider and adequately balance the diferent features of a table (or a set of tables). The
annotation involves several key challenges: i) disambiguation: the type of the entities described
in a table are not known in advance, and those entities may correspond to more than one type
in the KG. ii) homonymy: this issue is related to the presence of diferent entities with the same
name and type. iii) matching: the mention in the table may be syntactically diferent from
the label of the entity in a KG (i.e., use of acronyms, aliases, and typos). iv) NIL-mentions: the
approach much also consider strings that refer to entities for which a representation has not
yet been created within the KG, namely NIL-mentions. v) literal and named-entity: in a table,
there can be columns that contain named-entity mentions (NE-column) and columns containing
strings (L-column). vi) missing context: it is often easier to extract the context from textual
documents than from tables due to the amount of content to be processed. For instance, the
header, the first row of a table, which usually contains descriptive attributes for the columns,
may or may not be present. vii) amount of data: the approach must consider large tables with
many rows and columns, and tables with very few mentions. viii) diferent domains : the tables
within a set can belong to very general or specific domains.</p>
      <p>
        s-elBat3 is an approach that employs several techniques to consider all of these challenges. It
is a new approach that inherits and improves what has been proposed by the MantisTable [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
MantisTable SE [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] STI approaches. The experiences acquired with the tools mentioned above
and their participation in the various editions of the SemTab challenge4 led to the definition of
new techniques for i) an eficient lookup approach, through the use of indexes on optimised
data structures, ii) an Information Retrieval-based Entity Linking, augmented with a type-based
ifltering feature, and iii) a feature vector based entity disambiguation approach.
      </p>
      <p>
        The rest of the paper is organised as follows. In Section 2 we describe s-elBat in detail.
Section 3 introduces the Gold Standards, the configuration parameters, and the evaluation
results. Finally, we conclude this paper and discuss the future direction in Section 4.
2. s-elBat approach
s-elBat provides an iterative process that performs Entity Linking (EL) on tables. Given a KG
containing a set of entities  and a collection of named-entity mentions  , the goal of EL is
to map each entity mention  ∈  to its corresponding entity  ∈  in the KG. As described
above, a typical EL service consists of the following modules [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]:
1. Entity Retrieval (ER). In this module, for each entity mention  ∈  , irrelevant entities
in the KG are filtered out to return a set   of candidate entities: entities that mention 
may refer to. To achieve this goal, state-of-the-art techniques have been used, such as
name dictionary-based techniques, surface form expansion from the local document, and
methods based on search engines.
2. Entity Disambiguation (ED). In this module, the entities in the set   are more accurately
ranked to select the correct entity among the candidate ones. In practice, this is a
reranking activity that considers other information (e.g., contextual information) besides
the simple textual mention  used in the ER module.
3The name comes from taBle-s and Semantic Entity Linking to BAtch Table.
      </p>
      <p>4www.cs.ox.ac.uk/isg/challenges/sem-tab/</p>
      <p>In the s-elBat these modules are integrated in a pipeline composed of 7 sequential phases:
Preprocessing and Data Preparation, Entity Retrieval, Cell Entity Annotation (CEA), Column
Property Annotation (CPA), Column Type Annotation (CTA), Revision, Export. The overall
framework is described in Figure 2.</p>
      <p>Dataset</p>
      <p>Preprocessing</p>
      <sec id="sec-1-1">
        <title>2.1. Preprocessing and Data preparation</title>
        <p>
          During this phase, as a first step, every table’s cells are converted into lowercase. The next step
performs a column classification, associating L-column (columns containing strings) and
NEcolumn (columns that contain named-entity mentions) tag to every column [
          <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
          ]. The potential
subject column (S-columns, the main column, the one all the others refer to) is identified [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
With the s-elBat approach, the selected subject is not determinant for the final annotation but
can positively influence the execution time. Eventually, the cells from NE-column are extracted
to generate the set  of mentions for the next step.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2. Entity Retrieval</title>
        <p>
          According to some state-of-the-art experiments [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], the role of the ER module is critical since
it should ensure the presence of the correct entity in the returned set to let the ED module find
it. For this reason, s-elBat integrates LamAPI (LAbel Matching API), a comprehensive tool for
Information Retrieval (IR)-based ER, augmented with type-based filtering features [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The
current version of LamAPI integrates the data of DBpedia (v. 2016-10 and v. 2022.03.01) and
Wikidata (v. 20220708). The KGs are indexed with ElasticSearch5, an engine that can search and
analyse huge volumes of data in near real-time. The ElasticSearch index was configured using
the IB similarity as similarity function6 with the default values of hyperparameters. LamAPI
is highly modular so it is possible to integrate any indexing engine (e.g., Apache Solr, Apache
Lucene, Arango DB). The entity’s identifier and label are indexed. For each entity, the length in
terms of characters, the number of tokens and the n-grams of the labels, and a value representing
the entity’s popularity are stored in the index.
        </p>
        <p>5www.elastic.co
6www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html</p>
        <p>Among the services provided by LamAPI to search and retrieve information from a KG, the
Lookup and Types are described, which are the relevant services for the STI tasks.
Lookup: given a string as input, it retrieves a set of candidate entities  from the reference KG.
The request can be qualified by setting some attributes:
limit: an integer value specifies the number of entities to retrieve. The default value is 100; it
has been empirically demonstrated how this limit allows a good level of coverage.
kg: specifies which KG and version to use. The default is dbpedia_2022_03_01, and other
possible values are dbpedia_2022_03_01, dbpedia_2016_10 or wikidata_latest.
fuzzy: a boolean value. When true, it matches tokens inside a string with an edit distance
(Levenshtein distance) less than or equal to 2. This gives a greater tolerance for spelling
errors. When false, the fuzzy operator is not applied to the input.
ngrams: a boolean value. When true, it permits to search n-grams. After many empirical
experiments, we set ‘n’ of n-grams equal to 3. A lower value can bring some bias in the search,
while a higher value could not be very efective in terms of spelling errors. Using n-grams
equal to 3 “albert einstein” is split in [’alb’, ’lbe’, ’ber’, ’ert’, ...]. When false, n-grams
search is not applied.
types: this parameter allows the specification of a list of types ( e.g., rdf:type for DBpedia and
Property:P31 [instance of] for Wikidata) associated with the input string to filter the
retrieved entities. This attribute plays a key role in re-ranking the candidates, allowing a
more accurate search based on input types.</p>
        <p>Types: given the unique id of an entity as input, it retrieves all the types of which the entity is
an instance. For DBpedia entities, the service returns direct types, transitive types, and Wikidata
types of the related entity, while for Wikidata, it returns only the list of concepts/types for the
input entity.</p>
        <p>For each mention  ∈  , the approach performs a search using the LamAPI Lookup service
to retrieve a set of entity  . During the service invocation, some heuristics are applied to handle
possible misspelt input. In detail, two diferent requests are made: i) using only the mention, ii)
modifying the mention with the removing of repeated letters and brackets. For instance, by
considering the mention “pariss”, the second query is created using “paris”. Repeated characters
are a frequent mistake, and the 3-gram search implemented by ElasticSearch will be badly
afected by this kind of mistake. At the same time, the fuzzy matching will easy overcome
possibly missing double characters. Brackets afect the edit distance, and their content can
frequently be irrelevant.</p>
        <p>For the selection of the best set   of candidates, LamAPI computes a similarity between the
mention  ∈  and the label () (i.e., rdf:label of  ) of each entity  ∈  :
   (, ()) = 1 −  ℎ(, ()) (1)</p>
        <p>{ℎ(), ℎ(()) }</p>
        <p>
          A threshold determines whether to remove an entity  from the candidate set  . To evaluate
the threshold empirically, four Gold Standard (GS) have been selected (2T [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], SemTab 2020
R4 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], HardTable 2021 R2, and HardTable 2021 R3 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]). The distribution of string similarity
values between the entity labels within the GS and the corresponding mention in the table was
analysed (Table 1). It turned out that setting a threshold of 0.40 is a good choice.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>2.3. Cell Entity Annotation</title>
        <p>During this phase, for each pair of candidates associated with two cells on the same row but
from diferent columns, the respective properties are extracted by using LamAPI.</p>
        <p>
          For each candidate entity  , a feature vector with the following items is created:
• string similarity: the score is based on Levenshtein edit distance; it is calculated between
the mention and the label associated with the candidate entity.
• jaccard: the score is the same as string similarity but calculated with Jaccard distance
instead of Levenshtein.
• object: this score is set only if there is a property between two candidate entities. In
this case, the subject entity considered in this pair receives a boost equal to the string
similarity score of that entity.
• relation: as the previous object score, the relation is set only if there is at least one
property between the considered entities. This is the exact antagonist of the object score;
in this case, the object entity receives a boost.
• literal: this score is applied for relations between the cell from the subject column and
the cell from the L-column. In this case date, number and string values are compared as
explained in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>Given a vector of features the final score for each entity is computed as follows:
# 
 () = ∑       () (2)</p>
        <p>=1</p>
        <p>The weights  are set as follow: string similarity 10, jaccard 8, object 3, relation 4, literal 7.
The final score allows to rank the candidates in order to sort them. Considering Table 2, the
candidates for the mention “Jurassic World” are reported in Listings 1. The entity with the
highest score is Entity:Q3512046 [Jurassic World]; it is used to annotate the mention.</p>
      </sec>
      <sec id="sec-1-4">
        <title>2.4. Cell Property Annotation</title>
        <p>In this phase, the information collected during the previous phase is used to aggregate predicates
by frequency. The CPA is relatively fast because all the necessary information was already
gathered in the previous phase using LamAPI. The first step consists of creating a dictionary</p>
        <p>Listing 2: Set of properties for the “director” column (Table 2).
11
1 "P57":</p>
        <p>"label": "director"
3 "confidence": 1.0</p>
        <p>"P58":
5 "label": "screenwriter"</p>
        <p>"confidence": 0.75
7 "P162":</p>
        <p>"label": "producer"
9 "confidence": 0.5
"P161":
"label": "cast member"
"confidence": 0.25
for every column containing all the winning properties and related frequencies. In the second
step, the most frequent property is selected for the CPA annotation. Given a director the most
frequent property is Property:P57 [director] (Listing 2).</p>
      </sec>
      <sec id="sec-1-5">
        <title>2.5. Cell Type Annotation</title>
        <p>In this phase, the information collected during the CEA phase is used. To get the CTA annotation
for a given column, all the cells of that column are iterated. During the process, the approach</p>
        <p>Listing 3: Example of the structure storing the most frequent classes for each column in Table 2.
"movie_table":
"0":
"1":
"3":
creates a dictionary with the frequency of all the classes of the winning entities obtained in the
previous step. The type with the maximum frequency will be selected as an annotation for the
column under analysis. An example for Wikidata is shown in Listing 3.</p>
      </sec>
      <sec id="sec-1-6">
        <title>2.6. Revision</title>
        <p>The revision process consists of setting constraints on types and predicates obtained in the
ifrst execution of annotation process. As this implies computing again all the phases; only
low confidence mentions are considered to optimise computational eficiency. An experiment
was conducted in order to identify the best criteria to be used for classifying these mentions.
The experiment was redacted using the datasets available from the previous editions of the
SemTab Challenge. For every dataset considered to revise the CEA results, all the mentions are
checked against the corresponding ground truth, and errors are noted. In Figure 3 the results of
this analysis are graphically represented by a chart. We can consider the x-axys as the cutof
threshold while the y-axes represent the number of wrong mentions that are not considered
for the revision step, e.g., with a threshold of 0.6 on average, it would bring an inner error
under 5%. Clearly, it is required to minimise the number of wrong mentions while considering
computational eficiency. For the SemTab challenge, the threshold was set at 0.4 with an inner
error under 2%.</p>
        <p>Additional features to the previous feature vector have been added to obtain a better ranker.
The new features were not known prior to the CEA phase; in detail:
• cta: This score is related to the types of candidates. Considering a candidate entity this
score consists of the intersection between the types of the candidate entity with the types
found on the whole column.
• cpa: In the same way as the previous score, the CPA considers the predicates when the
mention is the subject and uses the scores collected from the CPA phase.</p>
        <p>The key challenges presented in Section 1 are managed as follows: i) disambiguation: the
disambiguation is managed by the ED module presented in Section 2; ii) homonymy: the
homonym cases are generally resolved with row context, the scores “object”, “relation”, and
“literal” presented in the CEA phase help the resolution; iii) matching: LamAPI manages most
of the matching issues that are encountered during the annotation process; iv) NIL-mentions:
annotations with a confidence lower than 1 are highly likely to be NIL-mentions; v) literal and
named-entity: the data preparation phase manages the column classification; vi) missing context:
when the header context is missing, it is possible to use other kinds of context, such as the
column context used in the CTA score; vii) amount of data: in this Section, it was proved that
the proposed annotation process is growing linearly with the size of the data; viii) diferent
domains: the approach was validated with general-purpose datasets. In the future, it may
be fine-tuned for better performance on domain-specific tasks, for example, by reducing the
possible candidate set.</p>
      </sec>
      <sec id="sec-1-7">
        <title>2.7. Export</title>
        <p>In this phase, the objective is to export the annotated mentions. For every mention, the system
needs to decide whether the given annotation is correct or not based on confidence. This can
lead to three possible scenarios: i) there is a clear winning candidate: if there is a score diference
higher than 0.3 between the first-ranked candidate and the second one, the system is confident
enough to annotate with the first-ranked candidate; ii) the final score is lower than 1, so the
mention is considered a “No-annotation” because the system cannot be confident enough to
provide an annotation; iii) the first two candidates that may be considered as winning have
a final score which is too close to decide which one is correct. This can lead to unresolved
mentions. In this case, three possible resolution methods are considered: a) the first one is
taken, this can lead to a higher number of wrong annotations, but it is a fast way to annotate
more mentions; b) the mentions are not annotated, this will lead to a higher recall; c) a ranking
system based on the KG data is used to decide what annotation is the winning one. In Wikidata,</p>
        <p>Listing 4: New candidates entities for the cell “Jurassic World” of the movies table (Table 2).
when annotating on general-purpose data, one possible criterion is to take the lowest identifier
because more popular entities were generally created before newer ones.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Validation</title>
      <p>on 2T and HardTable datasets. Overall, these results show that s-elBat tool achieves a significant
performance across multiple datasets from diferent domains and generation methods. The
results show that the methodology presented in this paper is a general-purpose approach that
can be applied to any dataset.</p>
      <p>2T 2022
F1</p>
      <p>Experiments were carried out to determine the computational eficiency of the diferent
phases and to validate the assumption that the computation time grows linearly based on
the data size. The data from the previous editions of the challenge is used to validate those
assumptions. In Figure 4, it is possible to see how diferent datasets with an increasing number
of mentions have a nearly linear outcome regarding execution time.</p>
      <p>In Table 4 the complete data about execution time is available. From this analysis, the results
show that the most computationally expensive phase is the candidate generation. More in
detail, it is possible to notice how the candidate generation consists of at least 96% of the whole
processing time for any dataset while the rest of the phases aggregated as “computation time”
use less than 4%.</p>
      <p>A further contribution from this paper consists of the definition of a format for a generic API
specification useful for STI tasks. In Listing 5 the JSON format specification is reported; after
the “name”, “dataset”, and “header” properties, there is the “rows” array. In this array, there is
an object where the first element “idRow” is a numbered identifier for the row and the other
element “data” contains the row content. The key “semanticAnnotations” allows to specify
prior knowledge regarding the annotation of the table. For example, the “cta” key can be filled
if the column types are already known. In the same way, also the “cta” and “cea” keys can be
populated before the computation.</p>
      <p>The experiments were conducted using 16 parallel processes: i) the ER is performed on a
server with 40 CPU(s) Intel Xeon 4114 CPU @ 2.20GHz and 40GB RAM; ii) the ED is performed
on a server with 32 CPU(s) Intel Xeon E5-2650 @ 2.00GHz and 94GB RAM.</p>
      <p>The tool and all the resources used for the experiments are released following the FAIR
Guiding Principles7. s-elBat is released under the Apache 2.0 licence8.
4. Conclusion and Future Works
s-elBat, is a new approach that inherits and improves what was proposed by MantisTable. The
results show an improvement in terms of the quality of the annotations and scalability. The
formalisation of a STI API specification is an interesting addition to state-of-the-art. Regarding
future developments, we want to discover a way to improve the computation time of entity
retrieval. Another potentially interesting research would be to analyse how the features used
for entity retrieval and disambiguation impact the results on diferent datasets.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgement</title>
      <p>This work has received funding from the European Union’s Horizon Europe research and
innovation programme under grant agreement No 101070284 - enRichMyData.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Parreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          <article-title>, Multi-level semantic labelling of numerical values</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2016</source>
          , Springer International Publishing, Cham,
          <year>2016</year>
          , pp.
          <fpage>428</fpage>
          -
          <lpage>445</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Marzocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Mammotab: a giant and comprehensive dataset for semantic table interpretation, Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab2022</article-title>
          (in press).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De Paoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Spahiu</surname>
          </string-name>
          ,
          <article-title>A fully automated approach to a complete semantic table interpretation</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>112</volume>
          (
          <year>2020</year>
          )
          <fpage>478</fpage>
          -
          <lpage>500</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <year>Semtab 2019</year>
          :
          <article-title>Resources to benchmark tabular data to knowledge graph matching systems</article-title>
          ,
          <source>in: The Semantic Web</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>514</fpage>
          -
          <lpage>530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <source>Results of semtab</source>
          <year>2020</year>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>2775</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <source>Results of semtab</source>
          <year>2021</year>
          , in: 20th
          <source>International Semantic Web Conference</source>
          , volume
          <volume>3103</volume>
          ,
          <string-name>
            <given-names>CEUR</given-names>
            <surname>Proceedings</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciavotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Paoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Palmonari, ASIA: a tool for assisted semantic interpretation and annotation of tabular data</article-title>
          ,
          <source>in: Proceedings of the ISWC 2019 Satellite Tracks</source>
          , volume
          <volume>2456</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ciavotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De Paoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Košmerlj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <article-title>Ew-shopp project: Supporting event and weather-based data analytics and marketing along the shopper journey</article-title>
          ,
          <source>in: Advances in Service-Oriented and Cloud Computing</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>187</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <article-title>Machine knowledge: Creation and curation of comprehensive knowledge bases</article-title>
          ,
          <source>Found. Trends Databases</source>
          <volume>10</volume>
          (
          <year>2021</year>
          )
          <fpage>108</fpage>
          -
          <lpage>490</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kejriwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs: Fundamentals, techniques, and applications</article-title>
          , MIT Press,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barazzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chieregato</surname>
          </string-name>
          ,
          <article-title>Mantistable se: an eficient approach for the semantic table interpretation</article-title>
          ., in: SemTab@ ISWC,
          <year>2020</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J. Han,
          <article-title>Entity linking with a knowledge base: Issues, techniques, and solutions</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>27</volume>
          (
          <year>2015</year>
          )
          <fpage>443</fpage>
          -
          <lpage>460</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hachey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nothman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Curran</surname>
          </string-name>
          ,
          <article-title>Evaluating entity linking with wikipedia</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>194</volume>
          (
          <year>2013</year>
          )
          <fpage>130</fpage>
          -
          <lpage>150</lpage>
          . Artificial Intelligence, Wikipedia and
          <string-name>
            <surname>Semi-Structured Resources</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>D'adda</article-title>
          , F. De Paoli,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Lamapi: a comprehensive tool for string-based entity retrieval with type-base filters</article-title>
          ,
          <source>in: 17th ISWC workshop on ontology matching (OM)</source>
          , in press.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Tough tables: Carefully evaluating entity linking for tabular data</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2020</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <article-title>Mantistable v: A novel and eficient approach to semantic table interpretation</article-title>
          ., in: SemTab@ ISWC,
          <year>2021</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>