<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Semi-Automatic Mapping and Extraction of RDF Triples From Wikipedia Tables</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adriana Concha</string-name>
          <email>adrianaconcha.s@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universidad de Chile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Santiago</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Party</institution>
          ,
          <addr-line>Party, Candidate, Votes, %, ± Party, Party, Candidate, Votes, % Review scores</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>00000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Tables within Wikipedia articles contain a substantial volume of rich information. Unfortunately this information is dificult to exploit due to the large number of tables on Wikipedia, as well as the diverse schemas and formats these tables use in order to be visually appealing to users. Consequently, manual extraction of structured information from these tables becomes impractical, and automating the process becomes very complex. To address this, our solution suggests clustering tables with similar headers and employing mapping languages for structured information extraction from these clusters. We compare mapping languages and processors for semantic relation extraction from Wikipedia tables, aiming to enhance integration with knowledge graphs like Wikidata. Our analysis - unique for such a data corpus - identifies Tarql as the most eficient processor, generating 984,260 triples from the top ten largest clusters by the number of tables, with 791,021 being novel in Wikidata. Assessing 500 relations, we achieve an average precision of 84.6%, improving the precision of previous automated methods at 81.5% and 70%. Excluding specific clusters could further improve precision.</p>
      </abstract>
      <kwd-group>
        <kwd>Web Tables</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Wikidata</kwd>
        <kwd>Mapping languages</kwd>
        <kwd>Knowledge graphs</kwd>
        <kwd>Information extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>https://aidanhogan.com/ (A. Hogan)</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
have very diverse structures and formats. Previous research has attempted to address this problem
using various automated techniques such as machine learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], probabilistic graphical models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
knowledge-based approaches [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], and table clustering [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Despite these eforts, the precision of the
triples extracted using such automated methods is below the typical precision expected for knowledge
graphs like DBpedia, Wikidata and YAGO. Therefore, further improvements are necessary to enhance
the accuracy and reliability of these extraction methods in order to increase the potential for successfully
incorporating the extracted triples into existing knowledge graphs.
      </p>
      <p>
        Based on previous research, we explore a diferent approach towards extracting a large corpus of
triples with higher precision from Wikipedia. Specifically, we propose a solution that involves applying
user-defined mappings over the tables of Wikipedia in order to extract triples. Given that there are
millions of such tables, it is unreasonable to assume that users will define a mapping for each table.
Thus we rather propose and explore a method that applies mappings over clusters of Wikipedia tables
grouped by similar headers [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Such clusters form naturally due to the use of templates for tables in
Wikipedia, as well as the editorial practices of copying-and-pasting similar table structures between
articles. In our proposal, one mapping can be defined for each cluster, which may contain thousands of
tables and result in multitudinous triples being extracted at (we hypothesize) high precision.
      </p>
      <p>
        A key part of this approach is to choose a suitable mapping language for the Wikipedia setting,
which has some idiosyncratic requirements not often considered. In this paper, we compare various
RDF mapping language processors, applying them over clusters of Wikipedia tables. We evaluate
the processors’ performance and the precision and novelty of the triples extracted in Wikidata. Our
contributions are twofold: firstly, we compare RDF mapping languages and processors in a real-world
setting, and secondly, we propose a novel framework for extracting triples from Wikipedia tables.
Motivating example: Figure 1 displays two Wikipedia tables that share the same schema. These
tables were grouped by Luzuriaga et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] into the same cluster of tables, alongside others tables
with the same schema. Table 1 provides a sample of post-processed tabular data available for this
cluster, where formatting is removed and links in Wikipedia tables are used by Luzuriaga et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to
identify relevant Wikidata entities. In this sample, the entity of the Wikipedia article where the table
is embedded is prepended as a protagonist column. For this cluster, we would like to define a single
mapping to be applied to each table of the cluster to extract RDF triples following the graph pattern
shown in Figure 2. To extract these relations, we need a mapping language and processor capable of
managing  -ary relations and being able to process literal values to filter noise within the rating column.
Other clusters will exhibit distinct requirements for a comprehensive extraction of triples.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>We discuss previous works relating to Web tables and Mapping languages.</title>
        <sec id="sec-2-1-1">
          <title>2.1. Web tables</title>
          <p>Extracting information from Web tables involves two main tasks: table detection and interpretation.</p>
          <p>Table interpretation The goal of this step is to identify the entities and relationships present within
the tables. This involves parsing and normalizing the tables. Next, the entities and attributes within
the table cells are identified, and relations between previously identified entities and attributes are
extracted. Works typically focus on interpreting horizontal relational tables only (per, e.g., Figure 1).
Here we focus on some of the most relevant approaches proposed for Wikipedia tables.</p>
          <p>
            Limaye et al. (2010) [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] propose a method that applies machine learning techniques to annotate Web
tables in the YAGO knowledge base. To do this, they annotate the entities contained in each table
cell, associate each table column with a type, and extract relationships between pairs of columns. The
method proposed by Limaye et al. (2010) for a dataset of Wikipedia tables (excluding infoboxes), reaches
an accuracy of 83% for entity annotation, 56% for type annotation and 68% for relation annotation.
          </p>
          <p>
            Another method that infers types and relationships between pairs of columns is proposed by Mulwad
et al. (2013) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], through a probabilistic graphical model. Their approach outperformed the method
proposed by Limaye et al. (2010) in terms of accuracy, achieving a relation annotation F-score of 97%
compared to the 68% achieved by Limaye et al. However, it should be noted that the experiment was
conducted on a limited dataset of only 36 non-infobox tables extracted from Wikipedia.
          </p>
          <p>
            Muñoz et al. (2013) [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] propose extracting relations that exist in DBpedia between entities in diferent
columns of the same row and then extending that relation to other rows of the table. With this method,
24.4 million raw triples were extracted from Wikipedia’s tables with an estimated precision of 52%.
Muñoz et al. (2014) [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] extend the previous method, testing machine learning methods for classifying
correct/incorrect triples, extracting 7.9 million novel RDF triples over one million Wikipedia tables with
an estimated precision of 81.5% and a F-score of 79.4%.
          </p>
          <p>
            Luzuriaga et al. (2023) [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] extend the work of Muñoz et al. (2014) by first clustering tables according
to their content and structure, with the aim of increasing the quantity and precision of the relations
extracted. This method extracted 7.5 million novel triples for Wikidata over a more up-to-date collection
of 3.6 million Wikipedia tables, reaching a precision of 70%.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. Mapping languages to RDF</title>
          <p>
            Iglesias-Molina et al. (2022) [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] categorize RDF mapping languages into three groups based on syntax:
RDF-based, SPARQL-based, and others. Our focus will be on the first two categories.
RDF-based mapping languages These mapping languages are defined as ontologies. One example
is R2RML [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]: a mapping language for defining custom mappings from relational data to RDF. The
mappings generated with this language enable viewing relational data as RDF, with a structure and
lexicon customized by the author of the mapping. Another key example is RML (RDF Mapping
Language) [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], inspired by R2RML, which is a language used to define mappings between non-RDF
sources and RDF. RML is more general than R2RML, and allows for defining mappings from several
semi-structured forms (JSON, XML, CSV) to RDF, rather than only from relational databases.
          </p>
          <p>Several tools implement the RML specification. The RML Implementation report 2 evaluates some of
these tools, testing their coverage of RML’s features. The tools tested are RMLMapper [14], CARML3,
RocketRML [15], SDM-RDFizer [16], RMLStreamer [ 17], Chimera [ 18], and Morph-KGC [19].
SPARQL-based mapping languages Several mapping languages are based on the SPARQL query
language whose CONSTRUCT feature can convert tables (typically of solutions) into RDF graphs. Some
of them are Tarql 4, SPARQL-Generate [20], SPARQL Anything [21], XSPARQL [22] and SMS25.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.3. Novelty</title>
          <p>Previous studies have primarily focused on developing automated techniques for interpreting web
tables, particularly infoboxes from Wikipedia or a limited number of Wikipedia tables that are not
infoboxes. However, these methods face a trade-of between the precision and recall of the relationships
extracted. Manual methods, while accurate, require significant human efort to map tables individually.
This process can be tedious and time-consuming, especially when dealing with a large number of tables.
Our goal is to extract RDF triples from Wikipedia tables with high precision at large scale. We propose
a novel method using hand-crafted RDF mappings applied to clusters of tables instead of individual
tables. We evaluate this method by comparing various mapping languages to RDF.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data corpus</title>
      <p>
        The data corpus, prepared by Luzuriaga [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] in 2019, includes 3,631,228 (non-infobox) Wikitables from
997,842 English Wikipedia articles. Among these articles, the one with the most tables contains 676,
      </p>
      <sec id="sec-3-1">
        <title>2https://rml.io/implementation-report/ 3https://github.com/carml/carml 4https://tarql.github.io/ 5https://docs.stardog.com/virtual-graphs/mapping-data-sources#sms2-stardog-mapping-syntax-2</title>
        <p>№ of
Cols.</p>
        <p>№ of
Rows
№ of Rows with
multiple entities</p>
        <p>Columns
Cluster
ID
while 51.6% have only a single table. A total of 3,514,373 tables (35,428,465 rows) have an associated
Wikidata entity for the article, which is added as a protagonist column that often has pertinent relations
to other columns [ 23] (per ArticleEntity in Table 1).</p>
        <p>
          The dataset consists of 1,169,682 clusters of tables that share the same schema. These schemas
are defined by a subset of attributes, ensuring that all tables within a cluster adhere to this common
schema [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. While most clusters (82.7%) contain only one table, some extensive clusters contain a
considerable number of tables. Table 2 lists the top 10 clusters by their number of tables, including
data about the columns, rows, and the number of rows with multiple entities across the columns (a
key target for extracting triples). This analysis reveals the presence of substantial clusters from which
a large amount of data can be extracted. For example, the largest cluster contains 81,277 tables and
1,202,635 rows, with 536,939 rows exhibiting multi-column entities. Moreover, the columns that feature
literal values can serve as objects in newly extracted relationships. As such, with a single mapping, it
may be feasible to extract more than half a million triples from this cluster.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Features of mapping languages</title>
      <p>We selected mapping languages and processors based on specific criteria. The chosen language must
accept CSV input and have an open-source processor (Table 3). For RML processors, we prioritized those
with top results in the RML Implementation report, focusing on tests related to blank node generation
for reliable extraction of  -ary relations (Table 4). We excluded the use of custom functions, such as
those in RML + FnO. Despite RMLStreamer’s success in tests, it had issues with CSV input length limits
and failed to use the bulk option for writing triples from a single input record.</p>
      <p>Analyzing these top 10 clusters, we identified specific requisites for extracting information from
horizontal relational tables, multivalued tables, and matrix tables. The features are defined as follows,
while Table 5 summarizes the ability of the selected tools to fulfill these requirements. We publish
mappings testing these features on GitHub7 for the top 10 largest clusters of Wikipedia tables, with
Wikidata as a target. To satisfy the requirements for fulfilling these features, we primarily use SPARQL
functions in SPARQL-based mapping languages and SQL in RML Views [24], implemented by
MorphKGC. We also used built-in functions provided by RMLMapper and Morph-KGC.</p>
      <sec id="sec-4-1">
        <title>6The implementation is currently unavailable 7https://github.com/AdrianaConcha/Wikitables-RDF-mappings</title>
        <p>• F1: Extraction of triples between columns: A basic requirement is the ability to extract a
triple with a subject in one column and the object entity in another column within the same row
in horizontal tables using a single predicate type. For example, we wish to extract the relation
Drew Hutton[Q5307201] member of political party[P102] Greens[Q781486] from Figure 3a.
• F2: Extraction of relations considering all entities within a single cell: In multivalued
tables, we need to extract relations for all entities within a cell. This requires the tool to split
the string in the multivalued cell. For example, in Figure 3a, first row, we want to extract
the triples Margaret Reynolds[Q6759833] member of political party[P102] Labor[Q216082] and Mal
Colston[Q15506241] member of political party[P102] Labor[Q216082].
• F3: Extraction of relations considering the  th entity within a cell: In multivalued tables,
extracting relations for the  th entity in a cell requires splitting the cell content and selecting the
appropriate index or processing the output accordingly. For instance, in Figure 3c, we wish to
extract Vegard Ulvang[Q370499] participant in[P1344] 1992 Albertville[Q1042417] from the first entity
in the “Gold” column of the first row, ignoring the second entity (that indicates a country).
• F4: Extraction of relations between entities within a single cell: Similar to F3, this requires
splitting the cell content and processing the output to extract relations, for instance, extracting
the triple Vegard Ulvang[Q370499] country of citizenship[P27] Norway[Q20] from Figure 3c.
• F5: Extraction of  -ary relations without defining a template: 8 Due to potentially missing
values in tables, ensuring a distinct key for each  -ary relation isn’t feasible. Hence, we evaluate the
processor’s capability to generate blank node identifiers without needing a predefined template.
• F6: Extraction of relations across diferent rows: This requirement involves extracting a
triple with a subject in one row and the object entity in the following row within the same column.
To achieve this, the tool needs to support join operations, provide row identifiers to sequence
rows, and allow arithmetic operations. For example, in Figure 3c, we aim to extract the triple
1992 Albertville[Q1042417] followed by[P156] 1994 Lillehammer[Q602473].
• F7: Extraction of relations in matrix tables: This requirement evaluates the ability of the
tools to extract  -ary relations for an entity in a cell, considering the headers of its respective
column and row, for example, extracting relations from Figure 3d.
• F8: Handling of literal values within the mapping: We often need to preprocess and manage
literal values (using built-in functions or other available functions), for example, to convert the
mm:ss values in the “Length” column of Figure 3b to seconds.
8We acknowledge that the RML specification stipulates that a term map (constant, column/reference or a template) must be
provided while generating blank nodes, but this constraint doesn’t align with our use case.</p>
        <p>(b) Multivalued table
(c) Multivalued table with relations across rows
(a) Horizontal relational multivalued table
(d) Matrix table</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Comparison of mapping languages and processors</title>
      <p>Using the mappings we previously defined for the top 10 clusters, we now compare the performance
and output of diferent languages and processors.</p>
      <p>Figure 4 compares execution times for the top ten largest clusters and each tested processor. The
timeout (TMO) of 3600 seconds is set lower in the figure for visualization purposes. Tarql and CARML
had the fastest execution times in most experiments. However, CARML sometimes extracted fewer
triples due to the need for a blank node template, an issue also seen in SDM-RDFizer and Morph-KGC</p>
      <sec id="sec-5-1">
        <title>9The default function grel:string_split appears to not be working in RMLMapper v6.2.1</title>
        <p>10This tool provides some built-in functions for handling strings, but they were either not used or not functioning correctly
during the experiments.
[]s 200
e
m
i
itonT 150
u
c
e
xE 100
50
0</p>
        <p>Tarql
SPARQ</p>
        <p>L-Gen
erate</p>
        <p>SPARQ</p>
        <p>L Anything</p>
        <p>CARML</p>
        <p>RMLMapper
Processors</p>
        <p>SDM
-RDFizer</p>
        <p>Chimera
orph-KGC
M
mappings, as shown in Figure 5a. Additionally, SDM-RDFizer and Morph-KGC encountered illegal
characters in blank node templates, unsupported by RDFLib, necessitating data preprocessing.</p>
        <p>Figure 5a compares the number of triples extracted per processor for the top 10 clusters. Each stacked
bar also shows new triples not present in Wikidata. SPARQL-based mapping languages extracted the
most triples, with Tarql extracting 984,260 triples, including 791,021 novel ones.</p>
        <p>Figure 5b compares the number of relations extracted per processor for the top 10 clusters. The
number of relations is fewer than the number of triples since a relation may contain multiple triples.</p>
        <p>We also evaluated the precision of novel relations extracted from the top 10 clusters. These novel
relations are those not previously present in Wikidata. Within each of these clusters, we randomly
sampled 50 relations from the output of Tarql, the mapping language with the most extracted triples, to
manually assess their precision, resulting in a total of 500 relations evaluated. Precision is calculated as
the ratio of correct extracted relations to the total number of extracted relations. Table 6 presents the
results obtained. We classify relations as either correct or incorrect. Relations that require additional
qualifiers to fully convey the information are still deemed correct due to the open world assumption
in Wikidata, where missing information is treated as unknown rather than false. Consequently, such
relations are considered valid, with the potential for these qualifiers to be added in future.</p>
        <p>
          Some of the reasons why relations were considered as incorrect are as follows:
• Incorrect Wikipedia link: The link in the Wikipedia table (from which the Wikidata entity was
extracted by Luzuriaga [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]) is incorrect. For example, the triple Gerardo Flores[Q5550262] position
played on team[P413] defender[Q336286] extracted from Cluster 5 is correct for the subject Gerardo
Flores[Q5550269] the footballer, but the entity extracted was Gerardo Flores[Q5550262] the murderer.
• Subject or object doesn’t have an entity: The subject or object of the triple doesn’t have a
Wikidata entity, but there is a diferent entity in the same cell. For example, the song “ Another
Day” from Cluster 4 doesn’t have a Wikidata entity. Therefore, the entity extracted in the “ Title”
column in Figure 6 for that cell was M. J. Cole[Q708129]. When we executed the mapping for this
cluster we obtained the incorrect triple M. J. Cole[Q708129] form of creative work[P7937] song[Q7366].
• Incorrect assumption in mapping: When defining the mappings, we make some assumptions
about the articles from which the tables were extracted. While in most cases these assumptions
were accurate, in some cases they were not. For instance, we assume that articles in Cluster 9
refer to albums, but for a table from the “ The Time of the Oath (song)” article, we obtained the
incorrect triple The Time of the Oath[Q760753] instance of[P31] album[Q482994].
• Columns with diverse entity types: A column may contain subtly diferent types of entities.
        </p>
        <p>For example, in Cluster 8, some tables indicate a navy instead of a location in the “State” column.
For example, from the table in Figure 7, we extracted the incorrect triple Rossia[Q690108] country
of registry[P8047] Soviet Navy[Q796754]. We also included incorrect facts in this category.
For clusters 1, 3, and 7, all the sampled relations were correct, obtaining a precision of 100%. The
average precision of the sampled relations of the top ten largest clusters is 84.6%.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>
        In this work, we propose a novel semi-automatic method for extracting triples from Wikipedia tables at
large scale and with high precision. Experimental evaluations involved applying mappings to table
clusters, measuring triple and relation extraction, processor performance, and the novelty of extracted
triples not present in Wikidata. Our study identified Tarql as the most efective processor, generating
984,260 triples with 791,021 novel to Wikidata. Sampling 50 random relations extracted by Tarql from
each of the top ten largest clusters yielded an average precision of 84.6%, surpassing previous methods
which achieved 81.5% [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and 70% [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] precision. However, as the imprecision is primarily associated
with specific clusters, excluding these clusters could significantly improve the precision achieved.
      </p>
      <p>We presented a comparative analysis of RDF mapping languages and processors to determine their
suitability for extracting RDF triples from Wikipedia tables, building on previous work that merged tables
with identical headers. Our analysis of the Wikipedia table corpus provided insights into individual
tables and clusters, revealing substantial extractable information. The largest cluster, comprising 81,277
tables with 1,202,635 rows, included 536,939 rows suitable for relation extraction due to multi-column
entity relationships. We emphasized the importance of selecting appropriate languages and processors
for this data corpus. Through examples with various table schemas – horizontal relational, multivalued,
and matrix tables – we assessed the expressiveness and performance of SPARQL- and RML-based
mapping languages. SPARQL-based languages and the use of SQL in RML Views showed significant
advantages in processing, filtering, and handling complex schemas.</p>
      <p>These findings support our hypothesis that applying RDF mapping languages over clusters of tables
can extract significant volumes of high-precision novel triples for knowledge graphs like Wikidata.</p>
      <p>
        While this work has provided valuable insights into the extraction of RDF triples from Wikipedia
tables, there are some limitations to mention. First, our study used a 2019 dataset by Luzuriaga [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ];
updating the corpus could enhance the relevance and accuracy of extracted relations. Moreover,
improving the granularity of extracted information, such as entity positions within cells and associating
entities with table titles, could further enrich the data. Future work could focus on developing a
systematic approach for relation extraction using our findings, empowering expert users to apply
mappings across clusters with minimal manual efort. Integrating the novel relations into Wikidata and
post-processing the triples to verify entity types could further improve precision.
      </p>
      <p>Another potential direction involves leveraging large language models (LLMs) for relation extraction
from Wikipedia tables. Progress has been made in web table interpretation through instruction tuning
of LLMs, with a framework like TURL [ 25] – focused on relational web tables from Wikipedia –showing
promise. However, more research is needed to manage varying table schemas and address challenges
such as manipulating table data (e.g., handling literal values) and fine-tuning LLMs for this specific
corpus of Wikipedia tables [26]. Furthermore, employing LLMs to assist users in creating mappings
(per the approach used in this paper) is also a promising avenue for future exploration [27].</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by Fondecyt No. 1221926 and by ANID – Millennium Science Initiative
Program – Code ICN17_002.
[14] A. Dimou, T. D. Nies, R. Verborgh, E. Mannens, R. V. de Walle, Automated Metadata Generation for
Linked Data Generation and Publishing Workflows, in: S. Auer, T. Berners-Lee, C. Bizer, T. Heath
(Eds.), Proceedings of the Workshop on Linked Data on the Web, LDOW 2016, co-located with
25th International World Wide Web Conference (WWW 2016), volume 1593 of CEUR Workshop
Proceedings, CEUR-WS.org, 2016. URL: https://ceur-ws.org/Vol-1593/article-04.pdf.
[15] U. Simsek, E. Kärle, D. Fensel, RocketRML - A NodeJS Implementation of a Use Case Specific RML
Mapper, in: D. Chaves-Fraga, P. Heyvaert, F. Priyatna, J. F. Sequeda, A. Dimou, H. Jabeen, D. Graux,
G. Sejdiu, M. Saleem, J. Lehmann (Eds.), Joint Proceedings of the 1st International Workshop
on Knowledge Graph Building and 1st International Workshop on Large Scale RDF Analytics
co-located with 16th Extended Semantic Web Conference (ESWC 2019), Portorož, Slovenia, June
3, 2019, volume 2489 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 46–53. URL: https:
//ceur-ws.org/Vol-2489/paper5.pdf.
[16] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M. Vidal, SDM-RDFizer: An RML
Interpreter for the Eficient Creation of RDF Knowledge Graphs, in: M. d’Aquin, S. Dietze,
C. Hauf, E. Curry, P. Cudré-Mauroux (Eds.), CIKM ’20: The 29th ACM International Conference on
Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, ACM, 2020,
pp. 3039–3046. URL: https://doi.org/10.1145/3340531.3412881. doi:10.1145/3340531.3412881.
[17] S. M. Oo, G. Haesendonck, B. D. Meester, A. Dimou, RMLStreamer-SISO: An RDF Stream Generator
from Streaming Heterogeneous Data, in: U. Sattler, A. Hogan, C. M. Keet, V. Presutti, J. P. A.
Almeida, H. Takeda, P. Monnin, G. Pirrò, C. d’Amato (Eds.), The Semantic Web - ISWC 2022
21st International Semantic Web Conference, Virtual Event, October 23-27, 2022, Proceedings,
volume 13489 of Lecture Notes in Computer Science, Springer, 2022, pp. 697–713. URL: https:
//doi.org/10.1007/978-3-031-19433-7_40. doi:10.1007/978-3-031-19433-7\_40.
[18] M. Belcao, E. Falzone, E. Bionda, E. D. Valle, Chimera: A Bridge Between Big Data Analytics and
Semantic Technologies, in: A. Hotho, E. Blomqvist, S. Dietze, A. Fokoue, Y. Ding, P. M. Barnaghi,
A. Haller, M. Dragoni, H. Alani (Eds.), The Semantic Web - ISWC 2021 - 20th International Semantic
Web Conference, ISWC 2021, Virtual Event, October 24-28, 2021, Proceedings, volume 12922 of
Lecture Notes in Computer Science, Springer, 2021, pp. 463–479. URL: https://doi.org/10.1007/
978-3-030-88361-4_27. doi:10.1007/978-3-030-88361-4\_27.
[19] J. Arenas-Guerrero, D. Chaves-Fraga, J. Toledo, M. S. Pérez, O. Corcho, Morph-KGC: Scalable
knowledge graph materialization with mapping partitions, Semantic Web (2022). doi: 10.3233/
SW-223135.
[20] M. Lefrançois, A. Zimmermann, N. Bakerally, A SPARQL extension for generating RDF from
heterogeneous formats, in: E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler, O. Hartig
(Eds.), The Semantic Web - 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28
- June 1, 2017, Proceedings, Part I, volume 10249 of Lecture Notes in Computer Science, 2017, pp.
35–50. URL: https://doi.org/10.1007/978-3-319-58068-5_3. doi:10.1007/978-3-319-58068-5\_3.
[21] L. Asprino, E. Daga, A. Gangemi, P. Mulholland, Knowledge Graph Construction with a Façade: A
Unified Method to Access Heterogeneous Data Sources on the Web, ACM Trans. Internet Techn.
23 (2023) 6:1–6:31. URL: https://doi.org/10.1145/3555312. doi:10.1145/3555312.
[22] S. Bischof, S. Decker, T. Krennwallner, N. Lopes, A. Polleres, Mapping between RDF and XML
with XSPARQL, J. Data Semant. 1 (2012) 147–185. URL: https://doi.org/10.1007/s13740-012-0008-7.
doi:10.1007/S13740-012-0008-7.
[23] E. Crestan, P. Pantel, Web-scale table census and classification, in: I. King, W. Nejdl, H. Li (Eds.),</p>
      <p>Web Search and Web Data Mining (WSDM), ACM, 2011, pp. 545–554. DOI:10.1145/1935826.1935904.
[24] J. Arenas-Guerrero, A. Alobaid, M. Navas-Loro, M. Pérez, O. Corcho, Boosting Knowledge Graph
Generation from Tabular Data with RML Views, in: Proceedings of the 20th Extended Semantic
Web Conference, volume 13870, Springer Nature Switzerland, 2023, pp. 484–501. URL: https://link.
springer.com/chapter/10.1007/978-3-031-33455-9%5f29 . doi:10.1007/978- 3- 031- 33455- 9\_29,
ontology Engineering Group OEG.
[25] X. Deng, H. Sun, A. Lees, Y. Wu, C. Yu, TURL: Table Understanding through Representation</p>
      <p>Learning, CoRR abs/2006.14806 (2020). URL: https://arxiv.org/abs/2006.14806. arXiv:2006.14806.
[26] W. Lu, J. Zhang, J. Zhang, Y. Chen, Large Language Model for Table Processing: A Survey, CoRR
abs/2402.05121 (2024). URL: https://doi.org/10.48550/arXiv.2402.05121. doi:10.48550/ARXIV.2402.
05121. arXiv:2402.05121.
[27] M. Hofer, J. Frey, E. Rahm, Towards self-configuring Knowledge Graph Construction Pipelines
using LLMs - A Case Study with RML, in: D. Chaves-Fraga, A. Dimou, A. Iglesias-Molina,
U. Serles, D. V. Assche (Eds.), Proceedings of the 5th International Workshop on Knowledge Graph
Construction co-located with 21th Extended Semantic Web Conference (ESWC 2024), Hersonissos,
Greece, May 27, 2024, volume 3718 of CEUR Workshop Proceedings, CEUR-WS.org, 2024. URL:
https://ceur-ws.org/Vol-3718/paper6.pdf.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Luzuriaga</surname>
          </string-name>
          , E. Muñoz,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rosales-Méndez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <article-title>Merging Web Tables for Relation Extraction With Knowledge Graphs</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>1803</fpage>
          -
          <lpage>1816</lpage>
          . URL: https://doi.org/ 10.1109/TKDE.
          <year>2021</year>
          .
          <volume>3101479</volume>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2021</year>
          .
          <volume>3101479</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          , P. van Kleef,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , C. Bizer, DBpedia
          <article-title>- A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web 6 (</article-title>
          <year>2015</year>
          )
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          . URL: https://doi.org/10.3233/SW-140134. doi:
          <volume>10</volume>
          . 3233/SW- 140134.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          , G. Weikum,
          <article-title>YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia, Artif</article-title>
          . Intell.
          <volume>194</volume>
          (
          <year>2013</year>
          )
          <fpage>28</fpage>
          -
          <lpage>61</lpage>
          . URL: https://doi.org/10.1016/j. artint.
          <year>2012</year>
          .
          <volume>06</volume>
          .001. doi:
          <volume>10</volume>
          .1016/J.ARTINT.
          <year>2012</year>
          .
          <volume>06</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Limaye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <article-title>Annotating and Searching Web Tables Using Entities, Types and Relationships</article-title>
          , PVLDB
          <volume>3</volume>
          (
          <year>2010</year>
          )
          <fpage>1338</fpage>
          -
          <lpage>1347</lpage>
          . doi:
          <volume>10</volume>
          .14778/1920841.1921005.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mulwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>Semantic Message Passing for Generating Linked Data from Tables</article-title>
          , in: International Semantic Web Conference (ISWC), Springer,
          <year>2013</year>
          , pp.
          <fpage>363</fpage>
          -
          <lpage>378</lpage>
          . DOI:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>642</fpage>
          -41335-3_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mileo</surname>
          </string-name>
          ,
          <article-title>Triplifying Wikipedia's Tables</article-title>
          ,
          <source>in: First International Conference on Linked Data for Information Extraction -</source>
          Volume
          <volume>1057</volume>
          ,
          <issue>LD4IE</issue>
          '
          <volume>13</volume>
          , pages
          <fpage>26</fpage>
          -
          <lpage>37</lpage>
          , Aachen, Germany, Germany,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mileo</surname>
          </string-name>
          ,
          <article-title>Using linked data to mine RDF from Wikipedia's tables, in: Web Search and Web Data Mining (WSDM)</article-title>
          , ACM,
          <year>2014</year>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>542</lpage>
          . doi:
          <volume>10</volume>
          .1145/2556195.2556266, dOI:10.1145/2556195.2556266.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Luzuriaga</surname>
          </string-name>
          ,
          <article-title>Merging HTML Tables for Extracting Relations</article-title>
          , Tesis de Magíster, Universidad de Chile, Santiago, Chile,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.-P.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Labbé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monnin</surname>
          </string-name>
          ,
          <article-title>From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>76</volume>
          (
          <year>2023</year>
          )
          <article-title>100761</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S1570826822000452. doi:https://doi.org/10.1016/j.websem.
          <year>2022</year>
          .
          <volume>100761</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Cimmino</given-names>
            <surname>Arriaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruckhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. García</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>An Ontological Approach for Representing Declarative Mapping Languages</article-title>
          , Semantic
          <string-name>
            <surname>Web</surname>
          </string-name>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3233/SW- 223224.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          , https://www.w3.org/ TR/r2rml/ ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          , B. De Meester,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          , T. Delva, RDF Mapping Language (RML), https://rml.io/specs/rml/ ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>