Towards the Definition of a Language-Independent Mapping
             Template for Knowledge Graph Creation
                           Ana Iglesias-Molina                                                               David Chaves-Fraga
                    Ontology Engineering Group                                                       Ontology Engineering Group
               Universidad Politécnica de Madrid, Spain                                         Universidad Politécnica de Madrid, Spain
                        ana.iglesiasm@upm.es                                                              dchaves@fi.upm.es

                               Freddy Priyatna                                                                   Oscar Corcho
                    Ontology Engineering Group                                                       Ontology Engineering Group
               Universidad Politécnica de Madrid, Spain                                         Universidad Politécnica de Madrid, Spain
                         fpriyatna@fi.upm.es                                                              ocorcho@fi.upm.es

ABSTRACT                                                                              to establish relationships between the global schema and the data
The use of knowledge graphs is spreading in the scientific commu-                     sources. Examples of mappings languages are the W3C recommen-
nity across different domains, from social sciences to biomedicine.                   dation R2RML [7] and its extension RML [9].
The creation of knowledge graphs usually needs the integration                           The use of declarative mappings for semantic web non-experts
of multiple heterogeneous data sources in different formats and                       is often complicated. That is one of the reasons why the mapping
schemas. One common way to achieve this process is using declara-                     creation is usually carried out by knowledge engineers. This poses
tive mappings, which establish the relationships between the source                   a barrier for potential users from other domains. To face this issue,
data and the ontology, improving relevant aspects such as main-                       several mapping editors have been proposed. They aim at making
tainability, readability and understandability. Learning how to use                   the mapping creation and editing easier and more intuitive [11, 16].
and create mappings is not an easy task, hindering the use of this                    Despite these efforts, users prefer to use tools like OpenRefine1 ,
technology to anyone outside the area. As a result, this task is usu-                 which is non-declarative, thus hindering the reproducibility and
ally carried out by experts. To ease the mapping creation, several                    maintainability of the transformations performed.
mapping editors have been developed, but their success is limited.                       Mapping languages consist of common elements to be created
In this paper, we devise the use of a well-known tool commonly                        (e.g. the source data, subjects, predicates and objects). In this pa-
used in the scientific community, the spreadsheets, to specify the                    per we propose the use of spreadsheets to specify these elements,
mapping rules in a language-independent way. Our aim is to ease                       the mapping rules, in a language-independent way, so it can be
the mapping creation and make it more accessible for the commu-                       translated into the most convenient specification [6]. Spreadsheets
nity. We also show a real use case, in which using spreadsheets                       are a well-known tool commonly used in the scientific community,
helps in the mapping creation process and enables a handy way for                     versatile and easy to understand, what makes them a suitable target
editing and visualizing mapping rules.                                                to specify mapping rules. With this proposal, our aim is to lower the
                                                                                      barrier of mapping creation and motivate the scientific community
CCS CONCEPTS                                                                          to use this technology.
                                                                                         This paper is organized as follows: Section 2 presents the related
• Computing methodologies → Artificial intelligence; Knowl-
                                                                                      work done on mapping creation. Section 3 shows the common
edge representation and reasoning.
                                                                                      mapping structure. Section 4 describes the spreadsheet template
                                                                                      we propose for the creation of mapping rules. Section 5 shows a
KEYWORDS
                                                                                      real case in which we use spreadsheets to create mappings. Finally,
Knowledge graph, spreadsheet, declarative mapping                                     section 6 presents the conclusions and areas for future work.

1    INTRODUCTION                                                                     2     RELATED WORK
The expansion of the Semantic Web technologies has reached users                      A wide variety of mapping languages has been proposed over the
across several domains, such as legal and biomedical. An increasing                   last decades [8]. The W3C Recommendation is R2RML [7], a declar-
number of knowledge graphs from these areas are being created,                        ative mapping language that allows the generation of adapters to
restructuring knowledge in a machine-readable way [4]. For their                      transform relational databases into RDF. There are other declara-
construction it is necessary to integrate different data sources; then                tive languages that enable dealing with more data formats, such as
they allow search optimization and the possibility of applying ma-                    RML [9] (extension of R2RML for CSV, JSON and XML), YARRRML
chine learning techniques to obtain new knowledge, among other                        [10] (a user-friendly serialization of RML), xR2RML [15] (for non-
possibilities. Some examples are DBpedia [1] and Wikidata [18].                       SQL databases) and RMLC-Iterator [5] (for statistical data).
   There are multiple approaches to create knowledge graphs, from                        There are not as many mapping editors as languages; in fact, the
using ad-hoc tools to declarative mappings. The later defines rules                   majority of them support R2RML or RML. Some of the most used
Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                    1 http://openrefine.org/
SciKnow’19, November, 2019,
                                                                                                                                                            Iglesias-Molina et al.


                  <PERSON>
                 <PERSON>
                     rml:logicalSource[ [
                  rml:logicalSource                                                                <SPORT>
                                                                                                  <SPORT>
                         rml:source"/home/user/data/people.csv"
                      rml:source      "/home/user/data/people.csv"; ;                                 rml:logicalSource[ [
                                                                                                   rml:logicalSource
                         rml:referenceFormulationql:CSV
                      rml:referenceFormulation           ql:CSV; ;                                       rml:source"/home/user/data/sports.csv"
                                                                                                       rml:source    "/home/user/data/sports.csv"; ;
                  ]; ];                                                                                  rml:referenceFormulationql:CSV
                                                                                                       rml:referenceFormulation       ql:CSV; ;
                     rr:subjectMap[ [
                  rr:subjectMap                                                                    ]; ];
                         rr:classex:Person;
                      rr:class     ex:Person;                                                         rr:subjectMap[ [
                                                                                                   rr:subjectMap
                         rr:template"http://ex.com/Person/{name}";
                      rr:template     "http://ex.com/Person/{name}";                                     rr:classex:Sport;
                                                                                                       rr:class   ex:Sport;
                  ]; ];                                                                                  rr:template"http://ex.com/Sport/{sport}";
                                                                                                       rr:template    "http://ex.com/Sport/{sport}";
                  rr:predicateObjectMap
                     rr:predicateObjectMap[ [                                                      ]; ];
                      rr:predicateMap
                         rr:predicateMap [ rr:constant
                                               [ rr:constantex:name
                                                              ex:name]; ];                         rr:predicateObjectMap
                                                                                                      rr:predicateObjectMap[ [
                      rr:objectMap
                         rr:objectMap [ rml:reference
                                          [ rml:reference"name"
                                                             "name"]; ];                               rr:predicateMap
                                                                                                         rr:predicateMap [ rr:constant
                                                                                                                              [ rr:constantex:name
                                                                                                                                             ex:name]; ];
                  ]; ];                                                                                rr:objectMap
                                                                                                         rr:objectMap [ rml:reference
                                                                                                                          [ rml:reference"sport"
                                                                                                                                            "sport"]; ];
                  rr:predicateObjectMap
                     rr:predicateObjectMap[ [                                                      ]; ];
                      rr:predicateMap
                         rr:predicateMap [ rr:constant
                                               [ rr:constantex:sport
                                                              ex:sport]; ];                        rr:predicateObjectMap
                                                                                                      rr:predicateObjectMap[ [
                      rr:objectMap
                         rr:objectMap [ rr:parentTriplesMap
                                          [ rr:parentTriplesMap<SPORT>;
                                                                      <SPORT>;                         rr:predicateMap
                                                                                                         rr:predicateMap [ rr:constant
                                                                                                                              [ rr:constantex:code
                                                                                                                                             ex:code]; ];
                        rr:joinCondition
                           rr:joinCondition[ rr:child
                                             [ rr:child"sport_id";
                                                         "sport_id";rr:parent
                                                                      rr:parent"id";
                                                                                 "id";]; ];            rr:objectMap
                                                                                                         rr:objectMap [ rml:reference
                                                                                                                          [ rml:reference"id";
                                                                                                                                            "id";]; ];
                      ]; ];                                                                        ]; ];
                  ]; ];

                               (a) Triples Map for PERSON                                                         (b) Triples Map for SPORT

Figure 1: RML mapping. Fig. 2a shows the triples map that generates instances of the class ex:Person and two predicate-object maps, the
latest a join to the Triples Map shown in Fig. 2b, that creates the instances for the class ex:Sport and two predicate-object maps.


"name","birthdate","sport_id"
        "name","birthdate","sport_id"                              "id","sport"
                                                                          "id","sport"        3    STRUCTURE OF DECLARATIVE MAPPINGS
"Serena"Serena
          Williams",19810926,1
                 Williams",19810926,1                              1,"Tennis"
                                                                          1,"Tennis"          The mapping languages have usually a similar structure, as many
"Alexander   Ovechkin",19850917,4
        "Alexander  Ovechkin",19850917,4                           2,"Ice 2,"Ice
                                                                          skating"
                                                                                 skating"
"Emily Scarratt",19900208,3
        "Emily Scarratt",19900208,3                                3,"Rugby"
                                                                          3,"Rugby"           of them are based on the standard. The earliest (e.g. R2O [2]) or the
"Javier "Javier
        Fernández",19910415,2
                Fernández",19910415,2                              4,"Hockey"
                                                                          4,"Hockey"          non-declarative languages (e.g. SPARQL-Generate [14]) differ in
                                                                                              structure, but they all share the same elements: identifier of data
                (a) people.csv                                     (b) sports.csv             sources (URL, path, table name) and the rules for generating the
                                                                                              corresponding RDF triples. An RML mapping example is shown in
Figure 2: CSV data example. Example of the source data in CSV                                 Figure 1. It organizes the transformation rules in two triple maps,
format for the RML mapping example form Figure 1.                                             one for each data source (Figure 2) used to generate RDF triples.
                                                                                                 We define more in detail the essential elements that declara-
                                                                                              tive mapping rules contain, providing examples based on the RML
                                                                                              mappings showed in Figure 1:
tools implement graphical visualization and editing of the mappings                                • An element that specifies where the data sources are stored.
as graphs, such as Karma [13] and Map-On [17] for R2RML, and                                         In the case of RML, these elements are defined using the
RMLEditor [11] for RML. Others provide an environment to write                                       property rml:logicalSource.
them, like OntopPro2 , an extension of Protégé that allows mapping                                 • A set of rules that defines the subjects and classes of the
creation in their custom language and import/export R2RML.                                           triples. In RML, the rr:subjectMap property is used to spec-
   The current mapping editors are language-oriented or create                                       ify these characteristics.
the mapping rules through graphical visualization. Thus, the user                                  • Pairs (rr:predicateObjectMap property in RML) that spec-
either knows the language, or creates the mapping building a vi-                                     ify rules for generating predicate (rr:predicateMap) and
sual graph. Using spreadsheets enables a language-independent                                        object (rr:objectMap) of the triples.
declarative approach to write concisely the mapping rules taking                                   • Join condition to another triple map, where the subject of the
advantage of the functionalities of a spreadsheet. In other words,                                   referenced triples map is to be the object in the new triple.
the rules can be created specifying only the essential elements with-                                This is defined in RML using rr:joinCondition property.
out knowing any mapping language, and the repetitive elements                                    As we show in the example mapping, these rules usually contain
can be autocompleted. Moreover, its compact structure allows a                                multiple and repetitive elements to describe the rules. This char-
quick visualization of all the rules.                                                         acteristic makes it easy to commit mistakes when writing them
   There are other approaches that use spreadsheets to capture                                manually. Using a spreadsheet template can ease this process to
knowledge of domain experts [12, 19]. This kind of tools enable                               non-experts in mapping creation. It enables manual writing, while
the specification of ontologies in tables and generate the corre-                             helping with the repetitive parts with autocompleting functions.
sponding RDF. Similarly, the mapping rules for data conversion are                            Moreover, all the language’s syntax and formatting is later auto-
declared in spreadsheets with our proposal, to be later translated                            matically written by the tool, not the user.
into different mapping languages.


2 https://github.com/ontop/ontop/wiki/ontopProUserManual
Towards the Definition of a Language-Independent Mapping                                                                   SciKnow’19, November, 2019,
Template for Knowledge Graph Creation


4    SPREADSHEET DESIGN                                                    Table 3: Subject sheet. The class of the subject is specified in
                                                                           Class, along with the URI that is to be created in URI and a unique
In this section we show the designed spreadsheet template3 that
                                                                           identifier in ID. In the latest, the words between brackets refer to
contains the essential elements to create a mapping. It consists of at
                                                                           fields in the data.
least four sheets: prefixes, source data, subject and predicate-object
maps; and optionally, a sheet with transformation functions.
   Prefixes sheet. In this sheet the namespace prefixes for URLs                     ID          Class                    URI
are specified. They can be found at the beginning in most of map-                 PERSON       ex:Person     http://ex.com/Person/{name}
ping languages, as they make it easier and shorter to write the                    SPORT        ex:Sport      http://ex.com/Sport/{sport}
mappings. This sheet is composed of two columns, in the column
Prefix the prefix is defined, and in the column URI the whole link
is written (Table 1).                                                      subject to join (ReferenceID), and the fields of the source data they
                                                                           share (InnerRef for the field of the current triple, and OuterRef for
Table 1: Prefix sheet. The whole link is written in the column
                                                                           the field of the referred subject). These fields are left blank until this
URI, and its abbreviation in the column Prefix.
                                                                           case happens. When it does, the aforementioned fields referring
                                                                           to the object are not necessary (Object and Data type). The last
      Prefix                         URI                                   item to specify is which subject each triple belongs to. For that
       rdf        http://www.w3.org/1999/02/22-rdf-syntax-ns#              purpose the column ID exists. It links each predicate-object to its
        ex                      http://ex.com/                             correspondent subject.
       sql                   http://w3.org/ns/sql#                            Function sheet. Some languages support the use of transforma-
                                                                           tion functions over the data (e.g. FnO+RML), so the template allow
   Source sheet. Here we specify where the data is taken from              to include an additional sheet to detail these functions (Table 5). The
(Table 2). It consists of three columns, ID, Feature, Value. The           most used are the SQL and GREL functions, but any can be used.
column Value contains path to the source data, the format, and             The functions are referred from the Predicate Object map sheet
optionally the iterator (the loop used to map the data of JSON             or other function row with the identifier specified in FunctionID.
and XML files). In Feature we declare the type of information              The function to use is defined in Function, and the parameters in
provided in Value. Finally, ID refers to its correspondent subject in      Params (if there are several, they are written separated by commas).
the Subject Sheet.
                                                                           5    USE CASE: THE BIO2RDF PROJECT
Table 2: Source sheet. The information about the source data
it’s specified, such as where the data is stored and its format. The       Bio2RDF [3] is an open source project, started in 2008, that inte-
kind of information is defined in Feature, the information itself in       grates heterogeneous sources of biomedical data into Linked Data.
Value, and to which subject it refers in ID.                               For each biological database in its catalogue, Bio2RDF provides an
                                                                           ontology and a PHP script to transform data into RDF. With the
                                                                           aim of enhancing the maintainability and understandability of the
            ID          Feature                Value
                                                                           transformation, we show the first steps to change the RDF transfor-
         PERSON          source       /home/user/data/people.csv
                                                                           mation methodology from using ad-hoc PHP scripts to declarative
         PERSON          format                 CSV
                                                                           mappings using spreadsheets.
          SPORT          source       /home/user/data/sports.csv
                                                                              In this use case, we create mappings for the datasets of the project
          SPORT          format                 CSV
                                                                           that have their data published as CSVs and relational databases.
                                                                           With the information provided by the PHP sripts and the source
   Subject sheet. The subjects of the triples to generate and their        data, the mapping rules are specified in the spreadsheets. Then, they
correspondent classes are defined in three columns (Table 3). In           are translated into the most suitable mapping language depending
ID is specified an identifier for each subject so it can be referred       on the format of the data source, and which engine is used to build
from other sheets; in Class, the class which the subject belongs to;       the knowledge graph. In this specific case, we translate them into
and in URI, the template for the URI of the subjects that are to be        R2RML for relational databases and RML for CSVs.
created. In the latest field, there is a variable part between curly          For most of the data sources more than one subject is created,
braces that refers to a field in the data (in the first line, name, and    or the database is distributed in several files, or there is a high
in the second, sport).                                                     number of triples (predicate-object maps) to generate. Moreover,
   Predicate-Object Maps sheet. In this sheet, the triples are de-         there are joins between the subjects within the same and in others
fined through the predicates and its correspondent objects (Table          datasets. The need to represent so many mapping rules arises the
4). The columns Predicate and Object are responsible for their             necessity to visualize them quickly, and write the repetitive parts of
specification. The kind of data declared in Object is defined in Data      the mappings easily, which can be done thanks to the structure and
type (e.g. string, float, etc.). When there is a referencing object map,   functions of the spreadsheets. Moreover, the fact that the spread-
the triple is defined otherwise. There are three fields that are able      sheets are an intermediate step in the mapping creation process
to specify the join between the object of the new triple and the ref-      makes it possible to write the transformation rules only once, and
erenced subject. They specify which is the ID correspondent to the         translate it into one or more languages. The tool developed to per-
3 https://doi.org/10.5281/zenodo.3526141                                   form the translation, Mapeathor, is still under development, and
SciKnow’19, November, 2019,
                                                                                                                                                                 Iglesias-Molina et al.


Table 4: Predicate-Object Map sheet. Here there are specified the Predicates (Predicate), Objects (Object), kind of data of the object
(DataType), the references to other subjects (ReferenceID, InnerRef, OuterRef) and the subject that forms the triple (ID).

                               Predicate            Object          DataType          ReferenceID         InnerRef         OuterRef             ID
                                ex:name             {name}           string                                                                  PERSON
                              ex:birthdate        {birthdate}         date                                                                   PERSON
                                ex:sport                                                    SPORT          sport_id              id          PERSON
                                ex:name             {sport}            string                                                                 SPORT
                                ex:code               {id}            integer                                                                 SPORT
                              ex:comment            <Fun1>                                                                                    SPORT

Table 5: Function sheet. The function sql:upper is specified. It                                    Topics in Semantic Technologies: ISWC 2018 Satellite Events (Studies on the Semantic
only takes one parameter, the field sport from the source data.                                     Web), Vol. 36. IOS Press, 235–244.
                                                                                                [6] Oscar Corcho, Freddy Priyatna, and David Chaves-Fraga. 2019. Towards a New
                                                                                                    Generation of Ontology Based Data Access. Semantic Web Journal (2019).
                   FunctionID           Function         Params                                 [7] Souripriya Das, Seema Sundara, and Richard Cyganiak. [n. d.]. R2RML: RDB to
                                                                                                    RDF Mapping Language. https://www.w3.org/TR/r2rml/
                     <Fun1>             sql:upper         {sport}                               [8] Ben De Meester, Pieter Heyvaert, Ruben Verborgh, and Anastasia Dimou. 2019.
                                                                                                    Mapping Languages: Analysis of Comparative Characteristics. In 1st International
                                                                                                    Workshop on Knowledge Graph Building.
                                                                                                [9] Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik
it is available in GitHub4 , along with the spreadsheets mappings                                   Mannens, and Rik Van de Walle. 2014. RML: A Generic Language for Integrated
created for this use case.                                                                          RDF Mappings of Heterogeneous Data. In LDOW.
                                                                                               [10] Pieter Heyvaert, Ben De Meester, Anastasia Dimou, and Ruben Verborgh. 2018.
                                                                                                    Declarative Rules for Linked Data Generation at Your Fingertips!. In European
6    CONCLUSIONS AND FUTURE WORK                                                                    Semantic Web Conference. Springer, 213–217.
This paper shows a first approach to design a template spread-                                 [11] Pieter Heyvaert, Anastasia Dimou, Aron-Levi Herregodts, Ruben Verborgh, Dim-
                                                                                                    itri Schuurman, Erik Mannens, and Rik Van de Walle. 2016. RMLEditor: a graph-
sheet able to specify the mapping rules used to create knowledge                                    based mapping editor for linked data mappings. In European Semantic Web Con-
graphs. The full design is described in detail to show all the es-                                  ference. Springer, 709–723.
                                                                                               [12] Simon Jupp, Matthew Horridge, Luigi Iannone, Julie Klein, Stuart Owen, Joost
sential elements contained in a mapping file that can be specified                                  Schanstra, Katy Wolstencroft, and Robert Stevens. 2012. Populous: a tool for
in a spreadsheet in a language-independent manner. Moreover,                                        building OWL ontologies from templates. BMC bioinformatics 13, 1 (2012), S5.
we present a real use case in which the use of spreadsheets has                                [13] Craig A Knoblock, Pedro Szekely, José Luis Ambite, Aman Goel, Shubham Gupta,
                                                                                                    Kristina Lerman, Maria Muslea, Mohsen Taheriyan, and Parag Mallick. 2012. Semi-
facilitated the mapping construction and editing.                                                   automatically mapping structured sources into the semantic web. In Extended
   Both the template spreadsheet and tool developed to translate                                    Semantic Web Conference. Springer, 375–390.
the spreadsheets to different mapping languages are still under                                [14] Maxime Lefrançois, Antoine Zimmermann, and Noorani Bakerally. 2017. A
                                                                                                    SPARQL extension for generating RDF from heterogeneous formats. In European
development. Our objective is to keep on improving the template’s                                   Semantic Web Conference. Springer, 35–50.
structure in order to erase the existing influence of the current                              [15] Franck Michel, Loïc Djimenou, Catherine Faron Zucker, and Johan Montagnat.
                                                                                                    2015. Translation of relational and non-relational databases into RDF with
mapping languages, and make it language-independent. For that                                       xR2RML. In 11th International Confenrence on Web Information Systems and
purpose, it’s necessary to make a design able to contain the essen-                                 Technologies (WEBIST’15). 443–454.
tial information to express the mapping rules, and take for each                               [16] Kunal Sengupta, Peter Haase, Michael Schmidt, and Pascal Hitzler. 2013. Editing
                                                                                                    R2RML mappings made easy. (2013).
language the necessary elements in the translation.                                            [17] Álvaro Sicilia, German Nemirovski, and Andreas Nolle. 2017. Map-On: A web-
   Moreover, an evaluation has to be carried out to test that using                                 based editor for visual ontology mapping. Semantic Web 8, 6 (2017), 969–980.
spreadsheets really helps in the mapping creation process, and give                            [18] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative
                                                                                                    knowledge base. Commun. ACM 57, 10 (2014), 78–85.
some guidelines on how the template can be improved. The tool has                              [19] Katy Wolstencroft, Stuart Owen, Matthew Horridge, Olga Krebs, Wolfgang
to be developed as well, as the template changes, with the aim of                                   Mueller, Jacky L Snoep, Franco du Preez, and Carole Goble. 2011. RightField:
being able to translate the spreadsheets to any mapping language.                                   embedding ontology annotation in spreadsheets. Bioinformatics 27, 14 (2011),
                                                                                                    2021–2022.

REFERENCES
 [1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,
     and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In The
     semantic web. Springer, 722–735.
 [2] Jesús Barrasa Rodríguez, Óscar Corcho, and Asunción Gómez-Pérez. 2004. R2O,
     an extensible and semantically based database-to-ontology mapping language.
     (2004).
 [3] François Belleau, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault, and
     Jean Morissette. 2008. Bio2RDF: towards a mashup to build bioinformatics
     knowledge systems. Journal of biomedical informatics 41, 5 (2008), 706–716.
 [4] Christian Bizer, Tom Heath, and Tim Berners-Lee. 2011. Linked data: The story so
     far. In Semantic services, interoperability and web applications: emerging concepts.
     IGI Global, 205–227.
 [5] David Chaves-Fraga, Freddy Priyatna, Idafen Perez-Santana, and Oscar Corcho.
     2018. Virtual Statistics Knowledge Graph Generation from CSV files. In Emerging

4 https://github.com/oeg-upm/Mapeathor