<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop on Knowledge Graph Construction, May</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Results for Knowledge Graph Creation Challenge 2024: SDM-RDFizer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enrique Iglesias</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria-Esther Vidal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Center</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leibniz University of Hannover</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TIB Leibniz Information Centre for Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>27</volume>
      <issue>2024</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The volume of data generated in recent years has increased drastically, necessitating a unified schema to integrate multiple data sources into a single format. The RDF Mapping Language (RML) was developed to define the structure of knowledge graphs (KGs). Over time, various extensions have been introduced to enhance RML's functionality, creating a need for a new specification that consolidates these extensions. Track 1 of the KGCW 2023 Challenge dataset addresses this need by providing a comprehensive set of test cases to ensure that knowledge graph creation engines comply with the updated RML specification. This paper reports on the conformance evaluation of SDM-RDFizer using this dataset, highlighting its capabilities and areas for improvement in achieving full RML compliance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph Creation</kwd>
        <kwd>Data Integration System</kwd>
        <kwd>RDF Mapping Languages</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>evaluates the results using the Track 1 dataset.</p>
      <p>This paper is organized into three additional sections. Section 2 provides an overview of
SDMRDFizer, including its techniques, data structures, and physical operators for optimizing KG
creation. Section 3 details the results of the challenge, including the dataset definition and the
necessary improvements to meet the test cases. Finally, Section 4 presents the conclusions and
future steps for SDM-RDFizer.</p>
    </sec>
    <sec id="sec-2">
      <title>2. SDM-RDFizer</title>
      <p>
        SDM-RDFizer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a KG creation engine that is RML compliant. SDM-RDFizer is comprised
of two modules: Triples Maps Planning (TMP) and Triples Maps Execution (TME). Each
module has diferent data structures that optimize diferent aspects of the KG graph creation
process. TMP determines the execution order for the triples maps (TM) to keep memory usage
to a minimum. TME generates KG following the order established by TMP. Multiple novel
operators are defined to transform diferent types of TMs. Simple Object Map (SOM) operator
executes rml:template and rml:reference, Object Reference Map (ORM) executes parent triples
maps, and Object Join Map (OJM) executes joins. All generated triples are compared to the
corresponding Predicate Tuple Table (PTT) to determine if it is a duplicate and Dictionary
Table (DT) compress the resources stored in PTT. Predicate Join Tuple Table (PJTT) stores
the result of executing join. SDM-RDFizer is publically available on GitHub 3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Test Cases of the Knowledge Graph Creation Challenge</title>
      <p>Track 1 of the KGCW 2024 Challenge4 aims to inspire new methods and techniques for
incorporating the new RML specification into existing KG creation engines. This dataset comprises five
sets of test cases. RML-Core: This set includes basic test cases originally defined in the RML
test cases5 to test the compliance of KG creation engines. These cases have been updated to
reflect the new specification and utilize CSV, JSON, XML files, and relational databases (MySQL
and Postgres) as data sources. RML-FNML: This set contains test cases that use functions to
transform data, employing a series of pre-defined functions to execute these transformations.
RML-Star: This set incorporates RDF-Star6 test cases, generated from the RML-Star test cases7
in accordance with the new specification. RML-IO: This set includes a wide variety of remote
data sources such as endpoints, compressed files, and JSON and XML files. It also defines
outputs for specific properties in various formats like Turtle, RDF/JSON, JSON-LD, and multiple
compressed formats like Zip and Tar. The name reflects its focus on Input and Output. RML-CC:
This set is comprised of collections and containers.</p>
      <p>This work presents the results of executing RML-Core, RML-FNML, RML-Star, and RML-IO
with SDM-RDFizer. RML-CC will be incorporated at a later date. Table 1 shows the total number</p>
      <sec id="sec-3-1">
        <title>3https://github.com/SDM-TIB/SDM-RDFizer</title>
        <p>4https://zenodo.org/records/10973433
5https://rml.io/test-cases/
6https://www.w3.org/2021/12/rdf-star.html
7https://zenodo.org/records/6518802</p>
        <p>Set
RML-Core
RML-FNML
RML-Star
RML-IO
RML-CC</p>
        <p>Total
of test cases in the dataset and which cases were passed and failed by SDM-RDFizer. The full
results are available on GitHub 8.</p>
        <sec id="sec-3-1-1">
          <title>3.1. Results of RML-Core</title>
          <p>RML-Core comprises test cases covering the fundamentals of RML, such as the definition of
classes, the use of rml:template and rml:reference, the execution of parent triples maps and
joins, and the definition of data types, languages, and graphs. SDM-RDFizer implements three
operators to execute diferent types of mappings.</p>
          <p>To extract data from various data sources, SDM-RDFizer employs several Python libraries: csv
for CSV files, json for JSON files, xml for XML files, mysql-connector for connecting to MySQL
databases, and psycopg2 for connecting to Postgres databases.</p>
          <p>To parse the new specification, the parser query is updated to replace the rml prefix with its
new namespace, replace all mentions of the R2RML namespace with the RML namespace, and
update the definition of rml:logicalSource to include the use of the rml:path and rml:root clauses,
replacing rml:query with rml:iterator. The rdflib library is used to execute the parser query.
SDM-RDFizer successfully performs all 238 test cases from this set.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Results of RML-FNML</title>
          <p>
            RML-FNML consists of test cases that use functions to transform values, including tasks like
replacing strings, transforming strings to lower and upper case, concatenating strings, and
more. These test cases demonstrate the use of RML+FnO in the new specification. SDM-RDFizer
incorporates strategies from FunMap [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] to execute functions. FunMap is a TM translator that
converts TMs containing functions and their corresponding data sources into TMs without
functions, reflecting the execution of the functions.
          </p>
          <p>To parse TMs with functions, a new parser query is explicitly defined for extracting the functions
from the TMs. This allows for proper handling of nested functions, as each function is extracted
individually, and nested functions are called from the function’s parameters.
SDM-RDFizer successfully performs all 14 test cases from this set.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>8https://github.com/SDM-TIB/SDM-RDFizer/tree/master/kgcw_2024_challenge</title>
        <sec id="sec-3-2-1">
          <title>3.3. Results of RML-Star</title>
          <p>RML-Star comprises test cases that use RDF-Star, an extension of RDF that introduces a new term,
the quoted triple, which can be used as either the subject or the object of a triple. Therefore,
RML-Star presents rml:quotedTriplesMap, enabling the definition of quoted triples in a KG.
SDM-RDFizer implements a new operator to generate quoted triples, allowing for recursive
application since quoted triples can contain other quoted triples.</p>
          <p>Another challenge in these test cases was using joins in the rml:subjectMap. These cases were
handled similarly to joins in the rml:objectMap, using the OJM operator for join execution and
PJTT for storing the results. The parser query was expanded to recognize rml:quotedTriplesMap.
SDM-RDFizer successfully performs all 18 test cases from this set.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.4. Results of RML-IO</title>
          <p>RML-IO consists of test cases that cover a wide range of remote data sources, including
compressed files, JSON and XML files, and data extracted from SPARQL endpoints. Some cases
introduce the concept of outputting certain triples to specific output files, which may need to
be compressed or translated into diferent formats, such as JSON-LD, Turtle, etc.
SDM-RDFizer uses the requests library to collect data from remote sources. For SPARQL
endpoints, the SPARQLWrapper library connects and executes SPARQL queries, with the results
converted to a format similar to CSV. When dealing with compressed files, SDM-RDFizer
downloads the file locally and decompresses it using the appropriate library for the format, such as
the zip library for Zip files.</p>
          <p>These test cases introduce the concept of defining an alternate output within the TM. These
can be specified in the rml:subjectMap, rml:predicateMap, rml:objectMap, rml:languageMap,
rml:datatypeMap, or rml:graphMap. Based on its location, triples will be outputted to the
alternate output file. If defined in the rml:subjectMap, all valid triples generated from this TM will be
sent to that particular output. If the alternate output is defined elsewhere, only triples with that
specific property will be outputted there. Any remaining triples not destined for an alternate
output will be sent to the original output file, which is defined at the start of the SDM-RDFizer
execution. When handling these alternative output files, SDM-RDFizer prioritizes generating
them over the standard output and can compress them. Finally, SDM-RDFizer converts the
output files into various RDF formats, such as RDF/XML, JSON-LD, etc., using the rdflib library.
SDM-RDFizer successfully executes 65 of the 67 test cases. The two failures occurred because
SDM-RDFizer cannot upload the generated triples into a SPARQL endpoint.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>The KGCW 2024 Challenge Track 1 dataset evaluates the compliance of state-of-the-art engines
with the new formulation of RML. It comprises 366 test cases across five categories: RML-Core,
RML-IO, RML-FNML, RML-Star, and RML-CC. SDM-RDFizer successfully executes 335 of these
test cases, fully covering RML-Core, RML-FNML, RML-Star, and all but two from RML-IO.
To achieve this, SDM-RDFizer introduced a new parsing query, strategies from FunMap, and
an operator for generating inner triples for RML-Star. Moving forward, SDM-RDFizer aims
to address the remaining RML-CC cases by incorporating new methods. For this purpose, a
new operator will defined, which will behave diferently based on whether it is transforming
a list, bag, or sequence. Additionally, a new data structure will be developed for intermediate
results in RML-Star to avoid repeated inner triple generation, and an optimized parser will be
implemented to manage the increasing complexity.</p>
      <p>With these improvements, SDM-RDFizer is set to become a fully compliant RML engine.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the Federal Ministry for Economic Afairs and Energy
of Germany (BMWK) in the project CoyPu (project number 01MK21007[A-L]). Leibniz
Association partially funds Maria-Esther Vidal in the "Leibniz Best Minds: Programme for Women
Professors", project TrustKG-Transforming Data in Trustable Insights with grant P99/2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          ,
          <source>in: Workshop on Linked Data on the Web</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Mannens,</surname>
          </string-name>
          <article-title>An ontology to semantically declare and describe functions</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          , Ó. Corcho,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Rmlstar: A declarative mapping language for rdf-star generation</article-title>
          , in: O.
          <string-name>
            <surname>Seneviratne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
          </string-name>
          , L. Etcheverry (Eds.),
          <source>Proceedings of the ISWC 2021 Posters</source>
          ,
          <article-title>Demos and Industry Tracks: From Novel Ideas to Industrial Practice co-located with 20th International Semantic Web Conference (ISWC</article-title>
          <year>2021</year>
          ), Virtual Conference,
          <source>October 24-28</source>
          ,
          <year>2021</year>
          , volume
          <volume>2980</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2980</volume>
          /paper374. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Collarana</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>SDM-RDFizer: An RML Interpreter for the Eficient Creation of RDF Knowledge Graphs</article-title>
          , in: CIKM,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1145/3340531.3412881.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          , Ó. Corcho, Funmap:
          <article-title>Eficient execution of functional mappings for knowledge graph creation</article-title>
          , in: J.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A. M.</given-names>
            <surname>Tamma</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          , L. Kagal (Eds.),
          <source>The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference</source>
          , Athens, Greece, November 2-
          <issue>6</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , volume
          <volume>12506</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>276</fpage>
          -
          <lpage>293</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -62419-4_
          <fpage>16</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -62419-4\_
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>