<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Kepler-aSI at SemTab 2021</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wiem Baazouzi</string-name>
          <email>wiem.baazouzi@ensi-uma.tn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marouen Kachroudi</string-name>
          <email>marouen.kachroudi@fst.rnu.tn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sami Faiz</string-name>
          <email>sami.faiz@insat.rnu.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universiet ́ de Tunis El Manar, Ecole Nationale d'Inegn ́ieurs de Tunis, Laboratoire de Teel ́d ́et ́ection et Sysetm`es d'Informationa` Reef ́r ́ence Spatiale</institution>
          ,
          <addr-line>99/UR/11-11, 2092, Tunis, Tunisie</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universiet ́ de Tunis El Manar, Faculet ́ des Sciences de Tunis</institution>
          ,
          <addr-line>Informatique Programmation Algorithmique et Heuristique, LR11ES14, 2092, Tunis, Tunisie</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universiet ́ de la Manouba, Ecole Nationale des sciences de l'informatique, Laboratoire de Recherche en egn ́Ie logiciel, Application Distribuee ́s, Sysetm`es dec ́isionnels et Imagerie intelligente</institution>
          ,
          <addr-line>LR99ES26, Manouba 2010, Tunis, Tunisie</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>In this paper, we present our system Kepler-aSI, for the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2021). This system is participating for the second time in this campaign, bringing improvements and new technical aspects. KepleraSI analyzes tabular data to be able to detect correct matches in Wikidata and DBPedia. It should be noted that each data resource, or each round of the campaign imposes a certain number of constraints, requiring advanced techniques. The aforementioned task turns out to be dificult for the machines, which requires an additional efort in order to deploy the cognitive capacity in the matching methods. Kepler-aSI still relies on the SPARQL query to semantically annotate tables in Knowledge Graphs (KG), in order to solve the critical problems of matching tasks. The results obtained during the evaluation phase are encouraging and show the strengths of the proposed system.</p>
      </abstract>
      <kwd-group>
        <kwd>Tabular Data - Knowledge Graph - Kepler-aSI - SPARQL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        It is evident that the World Wide Web encompasses and conveys very large
volumes of textual information, in several forms: unstructured text, semi-structured
model-based web pages (which represent data in the form widely recognized by
key-value notation and lists). In this broad context, the methods aiming to
extract information from these resources to convert them in a structured form have
been the subject of several works [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. As an observation, it is evident that there
is a lack of understanding of the semantic structure which can hamper the
process of data analysis. This observation reveals a gap between data amounts.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Copyright © 2021 for this paper by its authors. Use permitted under Creative</title>
    </sec>
    <sec id="sec-3">
      <title>Commons License Attribution 4.0 International (CC BY 4.0). 2 Wiem Baazouzi, Marouen Kachroudi, and Sami Faiz</title>
      <p>
        Indeed, acquiring this semantic reconciliation will therefore be very useful for
data integration, data cleansing, data mining, machine learning and knowledge
discovery tasks. For example, understanding the data can help assess the
appropriate types of transformation. Depending on the use and deployment scenario,
tabular data are carefully conveyed to the Web in various formats. The majority
of these datasets are available in tabular form (e.g., CSV (Comma-Separated
Values)). The main reason for the popularity of this format is its simplicity:
many common ofice tools are available to facilitate their generation and use.
Tables on the Web are a very valuable data source. Thus, injecting semantic
information into arrays on the web has the potential to boost a wide range of
applications, such as web searching, answering queries, and building Knowledge
Bases (KB). Research reports that there are various issues with tabular data
available on the Web, such as learning with limited labeled data, defining or
updating ontologies, exploiting prior knowledge, and/or scaling up existing
solutions. Therefore, this task is often dificult in practice, due to missing, incomplete
or ambiguous metadata (e.g., table and column names). In recent years, we have
identified several works that can be mainly classified as supervised (in the form
of annotated tables to carry out the learning task) [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7">3–7</xref>
        ] or unsupervised (tables
whose data is not dedicated to learning) [
        <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
        ]. To solve these problems, we
propose a global approach named Kepler-aSI, which addresses the challenge of
matching tabular data to knowledge graphs.This method is based on previous
work, which deals with ontology alignment [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref15 ref9">9–15</xref>
        ].
      </p>
      <p>This year’s SemTab campaign difers from the last two sessions 4 5, in that
it deals with Wikidata and DBPedia. In this challenge, the input is a CSV file,
but three diferent challenges had to be met :
1. CTA : A type of the Wikidata (or eventually DBPedia) ontology had to
be assigned a class KG to a column (Column-Type Annotation ).
2. CEA : A Wikidata or DBPedia entity had to be matched to the diferent
cells (Cell-Entity Annotation).
3. CPA : A KG (Wikidata or DBPedia) property had to be assigned to the
relationship between two columns (Columns Property Annotation).</p>
      <p>
        Data annotation is a fundamental process in tabular data analysis [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], it
allows to infer the meaning of other information. Then deduce the meaning of
tabular data relating to a Knowledge Graph. The data we used was based both
on Wikidata and DBPedia. It should be noted that in a broader context, the
data used and manipulated obey the triples format representation : subject (S),
a predicate (P) and an object (O). This notation ensures semantic navigability
through the data and makes all data manipulation more fluid, explicit and
reliable. Indeed, Cell Entity Annotation (CEA) matches a cell to a KG entity. At
this level, we have to annotate each individual element of the subject (S) and
the object (O). Column Property Annotation (CPA) assigns a KG property to
      </p>
      <sec id="sec-3-1">
        <title>4 https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2019/</title>
      </sec>
      <sec id="sec-3-2">
        <title>5 https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020/</title>
        <p>the relationship between two columns. The task is to find out which property of
the two columns are connected to either Wikidata or DBPedia. Column Type
Annotation (CTA) assigns connected semantic type to a column. Our goal is to
design a fast and eficient approach to annotate tabular data with entities from
Wikidata or DBPedia. Our approach combines a multitude of NLP and search
and filter strategies, based on text preprocessing techniques. Experiments
carried out in the context of SemTab 2021 for all tasks have shown encouraging
results.
2</p>
        <p>Kepler-aSI approach
In this section, we will describe in detail the diferent stages of our system, while
presenting some basic notions to highlight the technical issues identified.
2.1</p>
        <sec id="sec-3-2-1">
          <title>Key notions</title>
          <p>– Tabular Data : S is a two-dimensional tabular structure made up of an
ordered set of N rows and M columns, as depicted by Figure 1. ni is a row of the
table (i = 1 ... N), mj is a column of the table (j = 1 ... M). The intersection
between a row ni and a column mj is ci,j , which is a value of the cell Si,j .
The table contents can have diferent types (string, date, float, number, etc.).
• Target Table (S): M × N.
• Subject Cell: S(i,0) (i = 1, 2 ... N).
• Object Cell: S(i,j) (i = 1, 2 ... M),(j = 1, 2 ... N).</p>
          <p>Col0
Row1  S1,0 . . .</p>
          <p> ... . . .</p>
          <p> ... . . .</p>
          <p>Rowj  Sj,0 . . .</p>
          <p> ...... .. .. ..</p>
          <p>RowM SM ,0 . . .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Coli</title>
      <p>. . .
. . .
. . .</p>
      <p>Sj,i
. . .
. . .
. . .</p>
      <p>
        ColN
. . . S1,N 
. . . ... 
. . . ... 
. . . Sj,N 
.. .. .. ...... 
. . . SM ,N
– Knowledge Graph : Knowledge Graphs have been in the focus of research
since 2012, resulting in a wide variety of published descriptions and
definitions. The lack of a common core, a fact that is also indicated by
Paulheim [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] in 2015. Paulheim listed in his survey of Knowledge Graph
refinement, the minimum set of characteristics that must be present to
distinguish Knowledge Graphs from other knowledge collections, which basically
restricts the term to any graph based knowledge representation. In the
online reviewing [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], authors agreed that a more precise definition was hard
to find at that point. This statement points out the need of a closer
investigation and deeper reflection in this area. Farber and al. defined a Knowledge
Graph as a Resource Description Framework (RDF) graph and stated that
the term KG was coined by Google to describe any graph-based Knowledge
Base (KB) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Although this definition is the only formal one, it contradicts
with more general definitions as it explicitly requires the RDF data model.
In the following we present a detailed description of our contribution, namely
Kepler-aSI.
2.2
      </p>
      <sec id="sec-4-1">
        <title>System description</title>
        <p>In order to address the above mentioned SemTab challenge tasks, Kepler-aSI is
designed according to the workflow depicted by Figure 2. There are three major
complementary modules which consist of, respectively, Preprocessing,
Annotation context and Tabular data to KG matching. The aforementioned steps are
the same for each round, but the changes remain minimal depending on the
variations observed in each case.</p>
        <p>As shown in Figure 2 Preprocessing aims to prepare the data inside the
considered table. While Annotation Context, seeks to create a list of terms denoting
the same context.</p>
        <p>Preprocessing It should be noted that the content of each table can be
expressed according to diferent types and formats, namely: numeric, character
strings, binary data, date/time, boolean, addresses, etc. Indeed, with the great
diversity of data types, the preprocessing step is crucial. Therefore, the goal of
preprocessing is to ensure that the processing of each table is triggered without
errors. The efort is especially accentuated when the data contain spelling errors.
In other words, these issues must be resolved before we apply our approach. In
order to well carry out this step, we used several techniques and libraries such
as (Textblob6, Pyspellchecker7, etc.) to rectify and correct all the noisy textual</p>
        <sec id="sec-4-1-1">
          <title>6 https://textblob.readthedocs.io/en/dev/</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>7 https://pypi.org/project/pyspellchecker/</title>
          <p>data in the considered tables. As an example, we detect punctuation,
parentheses, hyphen and apostrophe, and also stop words by using the Pandas8 library
to remove them. Like a classic treatment in this register, we ended this phase
by transforming all the upper case letters into lower case.</p>
          <p>Annotation context This phase allows to explicitly extract the candidates
for the annotation process. The priming is carried out by an analysis of the
processing columns, which aims to understand and delimit the set of regular
expressions which contains a set of units: the area, the currency, the density, the
electric current, the energy, flow rate, force, frequency, energy eficiency, unit
of information, length, density, mass, numbers, population density, power,
pressure, speed, temperature, time, torque, voltage and volume. This step allows
to identify multiple Regextypes using regular expressions (e.g. numbers,
geographic coordinates, address, code, color, URL). Since all values of type text are
selected, preprocessing for natural languages was performed using the langrid9
library to detect 26 languages in our data. By the way, it’s a novelty for this
year’s SemTab campaign, i.e., which makes the task more dificult with the
introduction of natural language barriers. The langrid library is a stand-alone
language identification tool. It is preformed on a large number of languages(97
currently). Doing so, correction, data type and language detection is performed.
This can considerably reduce the efort and the cost of executing our approach
by avoiding the massive repetition of these treatments for all the table cells, and
this in each subtask.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Assigning a semantic type to a column (CTA) As depicted by Figure</title>
        <p>3, the task is to annotate each entity column with elements from Wikidata (or
possibly DBPedia) as its type identified during the preprocessing phase. Each
item is marked with the tag in Wikidata or DBPedia. This treatment allows
semantics identification. The CTA task can be performed based on Wikidata or
DBPedia APIs which allows us to search for an item according to its description.
The main information collected about a given entity and used in our approach
are: a list of instances (expressed by the instanceOf primitive and accessible
by the P31 code), the subclass of (expressed by the subclassOf primitive and
accessible by code P279) and overlaps (expressed by the partOf primitive and
accessible by code P361). At this point, we are able to process the CTA task
using a SPARQL query. The SPARQL query is our interrogation mean fed from
the main information of the entity which governs the choice of each data type,
since they are a list of instances (P31), of subclasses (P279) or a part of a class
(P361). The result of the SPARQL query may return a single type, but in some
cases the result is more than one type, so in this case no annotation is produced
for the CTA task.</p>
        <sec id="sec-4-2-1">
          <title>8 https://pandas.pydata.org</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>9 https://github.com/openlangrid</title>
          <p>Matching a cell to a KG entity (CEA) The CEA task aims to annotate the
cells of a given table to a specific entity listed on Wikidata or DBPedia. Figure 4
gathers the CEA task that can be performed based on the same principle of CTA
task. Our approach reuses the results of the CTA task process by introducing
the necessary modifications on the SPARQL query. If the operation returns
more than one annotation, we run a process based on examining the context
of the considered column, relative to what was obtained with the CTA task, to
overcome the ambiguity problem.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Matching a property to a KG entity (CPA) After having annotated the</title>
        <p>cell values as well as the diferent types of each of the considered entities, we
will identify the relationships between two cells appearing on the same row via a
property using a SPARQL query, as flagged by Figure 5. Indeed, the CPA task
looks for annotating the relationship between two cells in a row via a property.
Similarly, this latter task can be performed in an analogous manner to the CTA
and CEA tasks. The only diference in the CPA task is that the SPARQL query
must select both the entity and the corresponding attributes. The properties are
fairly easy to match since we have already determined they during CEA and
CTA task processing.</p>
        <p>Kepler-aSI performance and results
In this section we will present the results of Kepler-aSI for the diferent
matching tasks in the 3 rounds of SemTab 2021. We would like to report that the
results are presented according to two scenarios, i.e., before deadline and after
the deadline (since the organizers allow participants a period of 1 month before
freezing the values). Values are improved after the deadlines, as we finish the
investigating work about the data specifics, thus adjusting our filters for the
candidates identification. These results highlight the strengths of Kepler-aSI
with its encouraging performance despite the multiplicity of issues.
3.1</p>
      </sec>
      <sec id="sec-4-4">
        <title>Round 1</title>
        <p>In this first round, and in this version of SemTab 2021, four tasks are presented:
CTA-WD, CEA-WD, CTA-DBP and CEA-DBP. The column type annotation
(CTA -WD) assigns a Wikidata semantic type (a Wikidata entity) to a column.
Cell Entity Annotation (CEA-WD) maps a cell to a KG entity. The processing
carried out to search for correspondence on Wikidata is carried out in a similar
way on Dbpedia.</p>
        <p>Data for the CTA-WD and CEA-WD tasks were focused on Wikidata. As
we explained in section 1, Wikidata is structured according to the RDF
formalism, i.e., subject (S), predicate (P) and Object (O). Each element considered is
marked with a label in Wikidata, thus guaranteeing to take maximum advantage
of its semantics. The CTA-WD and CEA-WD task data contains 180 tables. In
Table 1, an example input table is provided. The first column contains an entity
label, while the other columns contain the associated attributes.</p>
        <p>The column type annotation (CTA -DBP) assigns a DBPedia semantic type
(a DBPedia entity) to a column. Cell Entity Annotation (CEA-DBP) matches a
cell to an entity on the Knowledge Graph. The CTA-DBP and CEA-DBP task
data also contains 180 tables. The results are summarized in Table 2.</p>
        <p>In Round 1, we focused particularly on the preprocessing phase in order to
choose and validate the spellchecker according to textual information, which can
significantly improve the relative results of the CEA and CTA tasks. In
summary, our review resulted in the use of two correctors, namely, Texteblob and
Pyspellchecker. Both of these tools are intuitive, easy to use, and perform well
in terms of Natural Language Processing (NLP).</p>
        <p>
          During Round 1, the data size factor was impacting. We recognize that this
round highlights the limits of machines in the face of such information volumes.
Therefore, we can conclude that faced with this situation, the computing power
and the speed of access to the external resources representing the Knowledge
Graphs (i.e., Wikidata and DBPedia) are decisive. In addition, we consider that
the introduction of the cross-lingual aspect of this campaign has accentuated
the challenge and allowed us to approach real scenarios that open and unlock
the eventualities of the diferent proposed approaches applicability. Indeed, to
support the cross-lingual aspect we acted at the level of the SPARQL query, as
indicated on the code listing 1.1 , to automatically change the language label,
and collect the candidates in any language. Thus, we have ensured the genericity
of our SPARQL query, based on previous contributions [
          <xref ref-type="bibr" rid="ref15 ref20 ref21">20, 15, 21</xref>
          ].
e n d p o i n t u r l = ”###########”
query = ”””
SELECT ? i t e m L a b e l ? c l a s s ? p r o p e r t y
WHERE {
? item ? i t e m D e s c r i p t i o n ”%s ”@en .
? item wdt : P31 ? c l a s s
        </p>
        <p>Code Listing 1.1. SPARQL query
In Round 2, despite the distinction of the data and their grouping into two
different families, they had a biological tint. Due to advances in biological research
techniques, new data are constantly being generated in the biomedical field and
they are routinely published in unstructured or tabular form. These data are not
easy to integrate semantically, due not only to their size, but also to the
complexity of the biological relationships maintained between the entities. Summary
of metrics for this round is in Table 3.</p>
        <p>Specifically, for tabular data annotation, the data representation can have a
significant impact on performance since each entity can be represented by
alphanumeric codes (e.g. chemical formulas or gene names) or even have multiple
synonyms. Therefore, the studied field would greatly benefit from automated
methods to map entities, entity types, and properties to existing datasets to
speed up the process of integrating new data into the domain. In this round the
focus was on Wikidata, through two test cases: BioTable and HardTable. The
diferent tasks: BioTable-CTA-WD, BioTable-CEA-WD and BioTable-CPA-WD
on the one hand, to which we add Hard-CTA-WD, Hard-CEA-WD and
HardCPA-WD, are all carried out on 110 tables.</p>
        <p>
          During Round 2, we focused on the disambiguation problem. We have to
decide when obtaining several candidates after querying the KGs. Indeed, our
approach put in place during Round 1 was very useful and allowed us to reuse
certain achievements. At this stage, we afirm that the automatic elements
disambiguation processing remains a tedious task, given what it generates as an
efort of semantic analysis and interpretation. Indeed, we have opted for the use
of an external resource, namely Uniprot10 [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. UniProt integrates, interprets
and standardizes data from multiple selected resources to add biological
knowledge and associated metadata to protein records and acts as a central hub from
which users can connect to 180 other resources. UniProt was recognized as an
ELIXIR core data resource in 2017 [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and received CoreTrustSeal certification
in 2020. The data resource fully supports Findable, Accessible, Interoperable and
Reusable, thus concretizing the (FAIR) data principles [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], for example by
making data available in a number of community recognized formats, such as text,
XML and RDF and through application programming interfaces (APIs) and
FTP (File Transfer) downloads Protocol, providing traceable identifiers for
protein sequences and protein sequence characteristics and fully highlighting data
sources. The UniProt 2020 version contains over 189 million sequence records,
with over 292,000 proteins, the complete set of proteins assumed to be expressed
by an organism, derived from viral, bacterial, Archean and eukaryotic genomes
10 https://www.uniprot.org
complete sequences available via UniProtKB Portail Proteomes11. In our case,
Uniprot is used to support our disambiguation process. In other words, if there
is a multiplicity of candidates in the matching process, or if there are no
candidates, access to Uniprot allows us to overcome this problem.
        </p>
        <p>Doing so, we end up with the scenario represented by the Figure 6. In fact,
logically the processing of Kepler-aSI ends at the stage, by obtaining the
candidates likely to meet the need for matching. However, in some cases this answer
may require some refinement.In case of multiple answers, Uniprot can help us to
decide, given its richness and its ample description. In addition, in the absence
of matching candidates (name diferences, formulas, etc.), we can get the answer
from Uniprot. Steps 4 and 5 are in addition to the regular Kepler-aSI process,
ensuring the redirection to Uniprot and the collection of any responses.
Round 3 has 3 main test families:
– BioDiv: represented by 50 tables;
– GitTables: represented by 1100 tables;
– HardTables: represented by 7207 tables.</p>
        <p>It should be noted that the stakes are the same for this round, moreover the
evaluation is blind, i.e., the participants do not have access to the evaluation
11 https://www.uniprot.org/proteomes/
platform and its options. In other words, there is no test opportunity to adjust
the parameters of the approach, according to the characteristics of the input.
In this round too, we have opted for Uniprot to carry out treatments similar to
those described in Round 2.</p>
        <p>Out of the 7 proposed tasks, Kepler-aSI managed to process 3. In the
CTABioDiv task, we are ranked first, for the GIT-DBP task we are ranked second and
for CTA-HARD we are ranked sixth. For the other cases, our method produced
outputs containing duplications, whereas these correspondences do not allow us
to obtain evaluation metrics in order to be ranked.
4</p>
        <p>Conclusion &amp; Future Work
To summarize and conclude, we have presented in this paper the second
version of our Kepler-aSI approach. Our system is participating in the challenge
for the second time, it is approaching maturity and achieving very encouraging
performance. We have succeeded in combining several strategies and treatment
techniques, which is also the strength of our system. We boosted the
preprocessing and spellchecking steps that got the system up and running.</p>
        <p>In addition, despite the data size, which is quite large, we managed to get
around this problem by using a kind of local dictionary, which allows us to reuse
already existing matches. Thus, we realized a considerable saving of time, which
allowed us to adjust and rectify after each execution. We also participated in all
the tasks without exception, which allowed us to test our system on all facets,
i.e., to identify its strengths and weaknesses.</p>
        <p>We tackled the several proposed tasks. Our solution is based on a generic
SPARQL query using the cell contents as a description of a given item. In each
round, despite the time allocated by the organizers running out, we continued
the work and the improvements, having the conviction that each efort counts
and brings us closer to the good control of the studied field.</p>
        <p>
          Kepler-aSI is a promising approach, but which will be further improved:
First, we will apply several methods yet to correct spelling mistakes and other
typos in the source data. Finally, we will try to develop our system by integrating
new data processing techniques (some Big Data oriented paradigms). Indeed, the
parallel implementation will allow us to circumvent the data size problem, which
is the major gap for our current machines. Eventually, the idea of moving to a
data representation using indexes [
          <xref ref-type="bibr" rid="ref25 ref26">25, 26</xref>
          ] would be a good track to investigate
in order to master the search space, formed by the considered tabular data.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Jimen´ez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Colnet: Embedding the semantics of web tables for column type prediction</article-title>
          .
          <source>In: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          . Volume
          <volume>33</volume>
          . (
          <year>2019</year>
          )
          <fpage>29</fpage>
          -
          <lpage>36</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Malyshev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Kort¨zsch,
          <string-name>
            <given-names>M.</given-names>
            , Gonazl´ez, L.,
            <surname>Gonsior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Bielefeldt</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          , Springer (
          <year>2018</year>
          )
          <fpage>376</fpage>
          -
          <lpage>394</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alse</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Semantic labeling: a domainindependent approach</article-title>
          . In: International Semantic Web Conference, Springer (
          <year>2016</year>
          )
          <fpage>446</fpage>
          -
          <lpage>462</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Taheriyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>Learning the semantics of structured data sources</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>37</volume>
          (
          <year>2016</year>
          )
          <fpage>152</fpage>
          -
          <lpage>169</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ramnandan</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Assigning semantic labels to data sources</article-title>
          .
          <source>In: European Semantic Web Conference</source>
          , Springer (
          <year>2015</year>
          )
          <fpage>403</fpage>
          -
          <lpage>417</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lerman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muslea</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taheriyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mallick</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Semi-automatically mapping structured sources into the semantic web</article-title>
          .
          <source>In: Extended Semantic Web Conference</source>
          , Springer (
          <year>2012</year>
          )
          <fpage>375</fpage>
          -
          <lpage>390</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cremaschi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Paoli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spahiu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A fully automated approach to a complete semantic table interpretation</article-title>
          .
          <source>Future Generation Computer Systems</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Efective and eficient semantic table interpretation using tableminer+</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ) (
          <year>2017</year>
          )
          <fpage>921</fpage>
          -
          <lpage>957</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Mephu Nguifo</surname>
          </string-name>
          , E.: OACAS:
          <article-title>Ontologies alignment using composition and aggregation of similarities</article-title>
          .
          <source>In: Proceedings of the 1st International Conference on Knowledge Engineering and Ontology Development (KEOD</source>
          <year>2009</year>
          ), Madeira, Portugal (
          <year>2009</year>
          )
          <fpage>233</fpage>
          -
          <lpage>238</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Moussa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Zghal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Ben Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Ldoa results for oaei 2011</article-title>
          .
          <source>In: Proceedings of the 6th International Workshop on Ontology Matching (OM2011) Colocated with the 10th International Semantic Web Conference (ISWC2011)</source>
          , Bonn, Germany (
          <year>2011</year>
          )
          <fpage>148</fpage>
          -
          <lpage>155</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>OAEI 2017 results of KEPLER</article-title>
          .
          <source>In: Proceedings of the 12th International Workshop on Ontology Matching co-located with the 16th International Semantic Web Conference (ISWC</source>
          <year>2017</year>
          ), Vienna, Austria, October
          <volume>21</volume>
          ,
          <year>2017</year>
          . Volume 2032 of CEUR Workshop Proceedings., CEUR-WS.org (
          <year>2017</year>
          )
          <fpage>138</fpage>
          -
          <lpage>145</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Dealing with direct and indirect ontology alignment</article-title>
          .
          <source>J. Data Semant</source>
          .
          <volume>7</volume>
          (
          <issue>4</issue>
          ) (
          <year>2018</year>
          )
          <fpage>237</fpage>
          -
          <lpage>252</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>KEPLER at OAEI 2018</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Ontology Matching co-located with the 17th International Semantic Web Conference, OM@ISWC</source>
          <year>2018</year>
          , Monterey, CA, USA, October
          <volume>8</volume>
          ,
          <year>2018</year>
          . Volume 2288 of CEUR Workshop Proceedings., CEUR-WS.org (
          <year>2018</year>
          )
          <fpage>173</fpage>
          -
          <lpage>178</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Bridging the multilingualism gap in ontology alignment</article-title>
          .
          <source>International Journal of Metadata, Semantics and Ontologies</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ) (
          <year>2014</year>
          )
          <fpage>252</fpage>
          -
          <lpage>262</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Using linguistic resource for cross-lingual ontology alignment</article-title>
          .
          <source>International Journal of Recent Contributions from Engineering</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Jimen´ez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Learning semantic annotations for tabular data</article-title>
          .
          <source>arXiv preprint arXiv:1906</source>
          .
          <volume>00781</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Efthymiou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez-Muro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Matching web tables with knowledge base entities: from entity lookups to entity embeddings</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          , Springer (
          <year>2017</year>
          )
          <fpage>260</fpage>
          -
          <lpage>277</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ehrlinger</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woß</surname>
          </string-name>
          ,¨ W.:
          <article-title>Towards a definition of knowledge graphs</article-title>
          .
          <source>SEMANTiCS (Posters</source>
          , Demos, SuCCESS)
          <volume>48</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Fa¨rber,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bartscherer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Menne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Rettinger</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago</article-title>
          .
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ) (
          <year>2018</year>
          )
          <fpage>77</fpage>
          -
          <lpage>129</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Zghal</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Damo - direct alignment for multilingual ontologies</article-title>
          .
          <source>In: Proceedings of the 3rd International Conference on Knowledge Engineering and Ontology Development (KEOD)</source>
          ,
          <fpage>26</fpage>
          -
          <lpage>29</lpage>
          October, Paris, France (
          <year>2011</year>
          )
          <fpage>110</fpage>
          -
          <lpage>117</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>When external linguistic resource supports cross-lingual ontology alignment</article-title>
          .
          <source>In: Proceedings of the 5th International Conference on Web and Information Technologies (ICWIT</source>
          <year>2013</year>
          ),
          <fpage>9</fpage>
          -
          <lpage>12</lpage>
          , May, Hammamet, Tunisia (
          <year>2013</year>
          )
          <fpage>327</fpage>
          -
          <lpage>336</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ruch</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teodoro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consortium</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          , et al.:
          <source>Uniprot. Technical report</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Drysdale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petryszak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baillie-Gerritsen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barlow</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasteiger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gruhl</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanfear</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>The elixir core data resources: fundamental infrastructure for the life sciences</article-title>
          .
          <source>Bioinformatics</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolleman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehant</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redaschi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Fair adoption, assessment and challenges at uniprot</article-title>
          .
          <source>Scientific data 6(1)</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diallo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Yahia</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Initiating cross-lingual ontology alignment with information retrieval techniques</article-title>
          . In: Actes de la 6em`
          <article-title>e Edition des Journee´s sur les Ontologies (JFO'</article-title>
          <year>2016</year>
          ), Bordeaux, France (
          <year>2016</year>
          )
          <fpage>57</fpage>
          -
          <lpage>68</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zghal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kachroudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Damak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Alignement d'ontologies base d'instances indexee´s. In: Actes de la 6 em`es Edition des Journee´s Francophones sur les Ontologies (JFO'</article-title>
          <year>2016</year>
          ), Bordeaux, France (
          <year>2016</year>
          )
          <fpage>69</fpage>
          -
          <lpage>74</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>