<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEMANTiCS</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Workbench: A collaborative platform for mapping complex XML data to RDF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eugeniu Costetchi</string-name>
          <email>eugen@meaningfy.ws</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jana Ahmad</string-name>
          <email>jana.ahmad@meaningfy.ws</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Csongor I. Nyulas</string-name>
          <email>csongor.nyulas@meaningfy.ws</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rashif Rahman</string-name>
          <email>rashif.rahman@meaningfy.ws</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dumitru Prijilevschi</string-name>
          <email>dumitru.prijilevschi@meaningfy.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mapping Workbench Home Page</institution>
          ,
          <addr-line>Mapping Workbench Demo Application</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Meaningfy SARL</institution>
          ,
          <addr-line>61 Route de Fischbach, Lintgen, L-7447</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>XML</institution>
          ,
          <addr-line>RML, RDF, OWL, Data Mapping, Data Transformation</addr-line>
          ,
          <country>Knowledge Graph Construction</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>20</volume>
      <fpage>17</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Mapping Workbench (MWB) facilitates the transformation of XML data into RDF using the RML mapping language, based on a sound test-driven methodology. This paper presents MWB as a comprehensive solution for Semantic Engineers and Data Modellers, ofering eficient data mapping, management and validation against ontologies and data shapes. We discuss MWB's key features, the mapping process, and its potential impact on the fields of knowledge graph construction, semantic data interoperability and data integration in general.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Semantic data integration and transformation are crucial tasks in various domains involving
knowledge graph (KG) construction, including bio-informatics, healthcare, finance, government
open data and beyond. Mapping XML data against ontologies (and data shapes) is a common
requirement in such domains. It enables data harmonisation, interoperability, knowledge
sharing, and semantic data enrichment. While much attention was given to development of
declarative mapping languages like RML[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (an extension of1 R2RML) and transformation
engines to execute the transformation rules, little work has been done on tools that facilitate
mapping rules development, testing, and management eficiently and accurately.
      </p>
      <p>Mapping Workbench (MWB) 2 addresses this need by providing a user-friendly graphical
user interface (GUI) and powerful functionalities for developing and testing RML mapping rules,
managing complexity of large data structures, and collaborative editing and validating with
domain experts, in the manner of an Integrated Development Environment (IDE). By adopting
a test-driven approach for knowledge engineering and incorporating the notion of Conceptual
Mapping (CM), MWB allows for evaluating the correctness of both the mapping rules and the
transformed data against OWL/RDFS ontologies, SHACL data shapes and semi-automatically
generated validation rules derived from the CM.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        MWB is positioned as an IDE for RML mapping and therefore also finds itself in the space
for RML GUI (web) applications (apps). RMLEditor [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides a simple, graph-based visual
editing web app for domain experts to model knowledge from diferent data sources, using
RML under the hood. However, managing complex data and mappings can become challenging
with its basic GUI and limited RML features. Karma [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a more advanced ”information
integration tool”, with comprehensive functionalities for loading data from multiple sources
and automatic model alignment. Its tabular approach can complicate the creation of mappings
and make interlinking between tables unnecessarily complex. Ontopic3 Studio is a ”low-code”
front-end to the Ontop [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] Virtual Knowledge Graph (VKG) system, which exposes (primarily
relational) data dynamically as an active RDF graph ready for querying, without materializing
the transformation. It translates SPARQL queries into SQL using R2RML mappings, and support
for tree-structured source input formats (XML, JSON) is absent. RMLx [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is yet another GUI
for RML, but it facilitates the mapping with form-based input, which does little to alleviate
mental efort (something GUIs should aim to do). Although MWB does not prioritize visual
editing, it aims to incorporate automation and complexity management features that will ease
the user’s cognitive load.
      </p>
    </sec>
    <sec id="sec-4">
      <title>What is Mapping Workbench?</title>
      <p>
        MWB is designed to simplify the complex task of converting XML data to RDF, involving model
mapping between XML schemes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and RDFS/OWL ontologies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It aims to improve eficiency
and accuracy in large-scale data mapping projects by bringing together all necessary resources
in one place. This integration streamlines the entire process of mapping development and
management, starting from the initial planning stages to the final deployment or dissemination.
By involving business stakeholders early on through writing Conceptual Mapping (CM) rules,
MWB ensures that domain interpretations and practical needs align smoothly with eventual
technical implementation, which helps to minimize unnecessary revisions and costs.
      </p>
      <p>
        MWB also efectively handles the challenges posed by evolving XML schemes across revisions,
ensuring that mapping rules are created with high precision across data versions through
rigorous validation processes. This collaborative platform encourages teamwork between
domain experts and Semantic Engineers, supported by role-based access controls that maintain
strict data security and integrity standards. Its structured four-stage mapping approach includes
(i) Conceptual Mapping using “ontology fragments”,4 (ii) Technical Mapping using RML rules
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], (iii) validation using SHACL shapes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], SPARQL assertions [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and XPath queries [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and
(iv) the export of mapping packages or suites for seamless deployment into data transformation
workflows. Users of MWB benefit from its sound methodology [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which distinguishes
between Conceptual and Technical Mappings, automated quality checks, and a user-friendly
3https://ontop-vkg.org/
4A custom dialect of SPARQL Path patterns [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that includes the classes of intermediary nodes.
interface conducive to agile workflows. These features collectively enhance eficiency, reduce
mapping time and costs, and increase satisfaction among stakeholders.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. Features and Workflow</title>
      <p>
        The workflow begins with the Project Setup, which includes adding test data, ontologies
and other resources. There is a convenient interface for defining XML elements based on
their XPaths. For ontologies, MWB ofers an automatic detection mechanism of ontology
terms. The next step is Defining Conceptual Mappings . MWB provides a user-friendly
interface for determining correspondences between the elements from input data and the target
terms from an ontology, allowing users to select models and view already created rules. In
the Technical Mapping Definition step, the user imports or writes RML rules implementing
what is designed and specified by the Conceptual Mapping Rules. The user is also able to
transform (via RMLMapper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) one or more test files to observe output in short validation
cycles. The Mapping Suite Validation in MWB involves a set of automatic tools that generates
the mapping results, analyses them, and displays a set of reports. These reports include views
with statistics and messages to support experts in making decisions about the correctness of
the mappings. The reports are divided into:
• SHACL Report - shows the validity of data according to SHACL constraints.
• SPARQL Report - helps to understand the correctness of defined CM Rules.
• XPath Report - a detailed view that shows the coverage of the XPaths from test data.
At this point, MWB facilitates an iterative process of making changes, transforming them, and
analysing the results. Then, the mapping suite can be Exported in an archived (ZIP) format.
      </p>
    </sec>
    <sec id="sec-6">
      <title>4. Innovative Aspects</title>
      <p>The platform presents several groundbreaking features not commonly found in other mapping
development tools. Notably, the integration of ontologies, sample test data, mapping rules,
and validation mechanisms into a single, cohesive platform marks a significant innovation
in this field. A robust methodology has been developed for the creation and management of
mapping rules throughout the mapping lifecycle. This structured approach enhances the
accuracy and efectiveness of the mapping process. Moreover, the platform simplifies the mapping
configuration and execution processes, enabling users to define mappings, establish rules, and
transform data with minimal technical expertise required. This user-friendly approach lowers
the barrier to entry for knowledge engineers and domain experts engaging with complex data
transformation tasks. A particularly novel aspect is the bifurcation of mapping development
into two distinct layers: Conceptual Mapping and Technical Mapping. This dual-layer structure
maintains domain experts involved through an intuitive user interface, which facilitates their
assessment of mapping rules for domain-specific soundness. Furthermore, CMs provide a basis
for generating unit tests, thereby supporting the rigorous testing of technical rule
implementations. This innovative separation not only enhances the manageability of mappings but also
ensures their relevance and accuracy by involving domain knowledge at every step.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Benefits and Future Directions</title>
      <p>MWB guarantees XML-RDF data mapping quality by measuring mapping validity, accuracy,
and coverage. It significantly speeds up the mapping process and reduces maintenance costs.
The platform identifies conceptual mapping issues early by involving domain experts, handling
the complexity of large schemes, contextual mappings, and evolving schema versions. MWB
provides a collaborative mapping environment for semantic engineers and domain experts, and
prepares data for advanced KG, ML and AI applications, unlocking new use cases.</p>
      <p>MWB aims to evolve into a dynamic Software as a Service (SaaS) platform, serving
comprehensively domain experts and semantic engineers. Planned enhancements include advanced RML
editing capabilities, generation of RML rules form the CM rules, further automation including
GenAI-assisted mapping, generalized support for tree-structured data by way of expanded
support for mapping JSON schemes, and continuous improvements in user interface and overall
user experience. These advancements aim to make data integration eforts easier,
strengthening MWB’s position as a top solution that addresses technical data mapping challenges and
promotes transparency, and adherence to evolving data interoperability standards.</p>
    </sec>
    <sec id="sec-8">
      <title>6. Conclusion</title>
      <p>Mapping Workbench stands at the forefront of semantic data integration, ofering a framework
to map complex XML data to RDF with unparalleled eficiency and accuracy. By centralising
resources and fostering collaboration between stakeholders, MWB bridges the gap between
conceptual understanding and technical implementation, significantly optimizing the mapping
development life-cycle. Its structured approach ensures high-quality mapping rules, validated
through sophisticated mechanisms like SHACL, SPARQL, and XPath, leading up to a seamless
deployment in data transformation pipelines.</p>
      <p>Embrace the future of semantic mapping with MWB — where eficiency, accuracy, and
innovation converge to redefine data integration.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>RML: a generic language for integrated RDF mappings of heterogeneous data</article-title>
          , in: C. Bizer,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          Berners-Lee (Eds.),
          <source>Proceedings of the 7th Workshop on Linked Data on the Web</source>
          , volume
          <volume>1184</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2014</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1184</volume>
          /ldow2014_paper_01.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Herregodts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurman</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>Rmleditor: A graph-based mapping editor for linked data mappings</article-title>
          , in: H.
          <string-name>
            <surname>Sack</surname>
            , E. Blomqvist, M. d'Aquin,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ghidini</surname>
            ,
            <given-names>S. P.</given-names>
          </string-name>
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Lange (Eds.),
          <source>The Semantic Web. Latest Advances and New Domains</source>
          , Springer International Publishing, Cham,
          <year>2016</year>
          , pp.
          <fpage>709</fpage>
          -
          <lpage>723</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ambite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muslea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taheriyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mallick</surname>
          </string-name>
          ,
          <article-title>Semi-automatically mapping structured sources into the semantic web</article-title>
          , volume
          <volume>7295</volume>
          ,
          <year>2012</year>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>642</fpage>
          - 30284- 8_
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez-Muro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , Quest:
          <article-title>Eficient sparql-to-sql for rdf and owl</article-title>
          , volume
          <volume>914</volume>
          <source>of CEUR Workshop Proceedings</source>
          , RWTH, Aachen,
          <year>2012</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Aryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ekaputra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kurniawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kiesling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Tjoa</surname>
          </string-name>
          ,
          <article-title>Rmlx : Mapping interface for integrating open data with linked data exploration environment</article-title>
          ,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .31227/osf. io/qhdc9.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Paoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Sperberg-McQueen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Maler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yergeau</surname>
          </string-name>
          ,
          <article-title>Extensible markup language (xml) 1.0 (fith edition</article-title>
          ),
          <source>W3C Recommendation</source>
          ,
          <year>2008</year>
          . Available at http://www.w3.org/TR/ REC-xml/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. van Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Patel-Schneijder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>OWL Web</given-names>
            <surname>Ontology Language Reference</surname>
          </string-name>
          , Recommendation,
          <source>World Wide Web Consortium (W3C)</source>
          ,
          <year>2004</year>
          . See http://www.w3.org/TR/owl-ref/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[8] SPARQL 1</source>
          .
          <article-title>1 Query Language</article-title>
          ,
          <source>Technical Report, W3C</source>
          ,
          <year>2013</year>
          . URL: http://www.w3.org/TR/ sparql11-query.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] Shapes constraint language (SHACL)</article-title>
          ,
          <source>Technical Report, W3C</source>
          ,
          <year>2017</year>
          . URL: https://www.w3. org/TR/shacl/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux, A. Seaborne, SPARQL Query Language for RDF</article-title>
          ,
          <source>W3C Recommendation</source>
          ,
          <year>2008</year>
          . URL: http://www.w3.org/TR/rdf-sparql-query/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. DeRose</surname>
          </string-name>
          ,
          <article-title>Xml path language (xpath) version 1</article-title>
          .0, World Wide Web Consortium,
          <source>Recommendation REC-xpath-19991116</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Costetchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vassiliades</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. I. Nyulas</surname>
          </string-name>
          ,
          <article-title>Towards a mapping framework for the tenders electronic daily standard forms</article-title>
          .,
          <source>in: KGCW@ ESWC</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          , T. De Nies,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mechant</surname>
          </string-name>
          , R. Van de Walle,
          <article-title>Automated metadata generation for linked data generation and publishing workflows, in: LDOW2016, CEUR-WS</article-title>
          . org,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>