Mapeathor: Simplifying the Specification of Declarative Rules for Knowledge Graph Construction Ana Iglesias-Molina, Luis Pozo-Gilo, Daniel Doña, Edna Ruckhaus, David Chaves-Fraga, and Oscar Corcho Ontology Engineering Group, Universidad Politécnica de Madrid, Spain {ana.iglesiasm,luis.pozo}@upm.es, ddona@delicias.dia.fi.upm.es, {eruckhaus,dchaves,ocorcho}@fi.upm.es Abstract. In recent years we have observed an increasing interest by the scientific community, from social sciences to biomedicine, in the gen- eration and publication of RDF-based knowledge graphs. One possibility for creating knowledge graphs consists in using declarative mappings to- gether with their associated parsers. These mappings describe the rela- tionship between the source data and a reference ontology. However, the learning curve to create these mapping files is steep, hindering its use by a wider community. In this paper we present a user-friendly mapping- language-independent tool, Mapeathor, to declare transformation rules based on spreadsheets and translate them into two different mapping languages with the purpose of easing the mappings creation process. Keywords: Knowledge Graph · Declarative mapping · Spreadsheet 1 Introduction In the last few decades, we have seen a significant increase in the publication of data in a machine understandable manner following Linked Data principles1 (e.g., DBpedia2 , Wikidata3 ). Knowledge Graph construction requires integrating different data sources in a structured way, usually following the schema of an ontology or group of ontologies. This facilitates the posterior task of mining the knowledge graph with several applications, such as searching recommendations and learning implicit data patterns. Knowledge graphs can be built in diverse ways. One option is creating ad-hoc scripts to transform data, which requires the user to repeat the process of script Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 1 https://5stardata.info/en/ 2 https://wiki.dbpedia.org/ 3 https://www.wikidata.org/ A. Iglesias-Molina et al. writing in every specific use case. Another option is using tools like OpenRe- fine4 to perform data transformation through the creation of an RDF skeleton, which includes proprietary transformation rules and a functionality for knowl- edge graph construction. Lastly, there is an option to keep the transformation rules in specific files that can be later processed by engines that either transform the data to RDF or create a virtual knowledge graph that can be queried with- out transforming the source data. These rules can be written in a wide variety of languages (e.g., R2RML [2], RML [3]) that cover different user’s needs (e.g., the source data format or the engine that will be used). Although the use of these mapping files is more flexible and independent, since they can be processed by a wide variety of engines, their creation is still not easy for new users. Experts are usually needed to carry out these tasks, hindering the use of semantic web technologies across the scientific community. That is why it is necessary to lower the learning curve and improve mapping reuse and reproducibility. Since mapping languages started to be used by the community, there have been multiple approaches for the development of editors to ease their specifica- tion. Most of them enable editing through graphical visualization [4, 6], others provide a writing environment (e.g. the Protégé extension OntopPro). These editors are language-oriented, they help to create one kind of mapping, not tak- ing into account the wide variety of mapping languages that currently exist. Moreover, when managing a considerable amount of mapping rules, a graphical approach may not be easily handled. Our work focuses on providing a straightforward way to create these map- pings, specifying the transformation rules in spreadsheets, so they are later trans- lated into one of the implemented mapping languages. The purpose of this pro- posal is to increase the interoperability between these languages [1] as well as to ease the creation process. To perform the mapping rules translation we de- veloped Mapeathor5 , a tool able to parse the spreadsheets and generate the corresponding mappings in two different languages. This work is an extension and improvement on the work previously presented in [5], where the first ver- sion of the spreadsheet design and the tool were presented. The spreadsheet includes now more options to maintain the language’s expressiveness, and the implementation has become simpler to use and more accurate in the translation. This paper is structured as follows: Section 2 describes the design of the spreadsheet. Section 3 explains the functionalities of the tool and a real-world use case. Finally, section 4 presents the main conclusions and future work. 2 Spreadsheet design The rules required to generate a knowledge graph can be specified in multiple languages. The language is chosen by the user depending on the specific use case. However, the rules themselves are equivalent across languages, so they can be written in a language-independent way, in this case, we chose a spreadsheet for 4 http://openrefine.org/ 5 https://morph.oeg.fi.upm.es/demo/mapeathor Mapeathor: Simplifying Declarative Rules Specification for KGC (a) Prefix sheet (b) Subject sheet Prefix URI ID Class URI http://v.ciudadesabiertas.es/ noise noise-res:estacion- cont-acustica# Station noise:EstacionMedida medida/{id} noise- http://v.ciudadesabiertas.es/res/ res cont-acustica# noise-res:observa Observation noise:Observacion sosa http://www.w3.org/ns/sosa/ cion/{idx} (c) Source sheet ID Feature Value (e) Function sheet SELECT id, name FunctionID Feature Value Station query fno:executes grel:replace FROM Station Observation source data/station.json ex:param1 {obsProperty} Observation format JSON ex:param2 “” Observation iterator $ ex:param3 “-” (d) Predicate_Object sheet ID Predicate Object DataType ReferenceID InnerRef OuterRef Station dcterms:identifier {id} string Station schema:name {name} string geosparql:has noise-res: Station iri Geometry punto/{id} Observation sosa:resultTime {resTime} Time Observation sosa:madeBySensor Station {madeBySensor} {id} Observation sosa:observedProperty 3 Fig. 1: Example of a spreadsheet representing the (a) Prefix sheet, (b) Subject sheet, (c) Source sheet, (d) Predicate Object sheet and (e) Function sheet. rule specification. The spreadsheet template is devised to contain the rules in a compact and understandable way, in a format widely used by the scientific com- munity. The design is aimed to be language-independent and to ease the writing process so the user does not have to learn a mapping language. In addition, the functionalities of a spreadsheet editor can be used to speed up the writing process. Reusing mappings for similar use cases is also easier in this specification format. The spreadsheet contains the mapping essential elements structured in five different sheets: Prefix, Source, Subject, Predicate Object and Function. Prefix sheet: This sheet contains the namespaces and corresponding pre- fixes used when declaring the transformation rules (Figure 1a). It is composed of two columns: Prefix for the prefix and URI for the corresponding namespace. Subject sheet: This sheet defines the subjects to be generated and the key ID that links the information in the sheets (Figure 1b). It is organized in three columns: ID, Class and URI. URI defines the template URI for the subject, its class is specified in Class. ID contains a unique identifier for each subject’s set of rules in order to relate to information on these rules in the remaining sheets. Source sheet: Here we specify where the data is retrieved from (Figure 1c). The information is organized in three columns: ID, Feature and Value. Feature declares the type of information provided in Value. In Value it can be specified the path to the source data (with the feature source), the format (format), the iterator (iterator, loop used to map the data from JSON and XML files), database table (table), SQL query (query) and SQL version (SQLVersion). Any language option may be included. Finally, ID indicates the rule it refers to. Predicate Object sheet: This sheet defines the triples through the pred- icates and its correspondent objects (Figure 1d). The columns Predicate and Object specify the predicate and object in a rule. The XSD datatype of Object A. Iglesias-Molina et al. @prefix rr: . @prefix xsd: . [Prefixes] [Prefixes] @prefix rml: . <#Observation> <#Fun1> @prefix ql: . rml:logicalSource [ a rr:TriplesMap; @prefix noise: . rml:source "data/station.json"; a fnml:FunctionTermMap; @prefix noise-res: . rml:referenceFormulation ql:JSONPath; fnml:functionValue [ @prefix sosa: . rml:iterator "$"; rml:logicalSource [ ]; rml:source "data/station.json"; <#Station> rr:subjectMap [ rml:referenceFormulation ql:JSONPath rr:logicalSource [ a rr:Subject; ]; rr:sqlQuery """SELECT id, name FROM Station"""; rr:termType rr:IRI; rr:predicateObjectMap [ rr:sqlVersion rr:SQL2008 rr:template "noise-res:observacion/{idx}"; rr:predicate fno:executes ; ]; rr:class noise:Observacion; rr:objectMap [ rr:constant grel:replace ] rr:subjectMap [ ]; ]; rr:template "noise-res:estacion-medida/{id}"; rr:predicateObjectMap [ rr:predicateObjectMap [ rr:class noise:EstacionMedida; rr:predicateMap [ rr:constant sosa:resultTime ]; rr:predicate ex:param1 ; ]; rr:objectMap [ rml:reference "resTime"; rr:datatype xsd:Time] rr:objectMap [ rml:reference "obsProperty"] rr:predicateObjectMap [ ]; ]; rr:predicateMap [ rr:constant dcterms:identifier ]; rr:predicateObjectMap [ rr:predicateObjectMap [ rr:objectMap [ rml:reference "id"; rr:datatype xsd:string ] rr:predicateMap [ rr:constant sosa:madeBySensor ]; rr:predicate ex:param2 ; ]; rr:objectMap [ rr:objectMap [ rr:constant " " ] rr:predicateObjectMap [ rr:parentTriplesMap <#Station>; ]; rr:predicateMap [ rr:constant schema:name ]; rr:joinCondition [ rr:child " madeBySensor"; rr:parent "id"; ]; rr:predicateObjectMap [ rr:objectMap [ rml:reference "name"; rr:datatype xsd:string ] ]; rr:predicate ex:param3 ; ]; ]; rr:objectMap [ rr:constant "-" ] rr:predicateObjectMap [ rr:predicateObjectMap [ ]; rr:predicateMap [ rr:constant geosparql:hasGeometry]; rr:predicateMap [ rr:constant sosa:observedProperty] ; ]. rr:objectMap [ rr:template "noise-res:punto/{id}"; rr:termType rr:IRI] rr:objectMap <#Fun1> ];. ];. (a) Triple map for Station (b) Triple map for Observation (c) Triple map for Fun1 Fig. 2: Output RML mapping file resulting from the translation of the rules shown in Figure 1. The following sets of rules are shown: (a) Station, (b) Obser- vation and the function (c) Fun1. is defined in DataType. When the object refers to a subject defined in another rule, the rule is written differently. There are three fields that allow the specifica- tion of the linking condition between the object of the triple and the referenced subject. They specify which is the ID of the target subject (ReferenceID), and the ”join” fields in the source data (InnerRef for the field of the object of the current triple, and OuterRef for the field of the referred subject). Lastly, the column ID indicates the rule it belongs to. Function sheet: Some languages are able to process transformation func- tions over the data (e.g. FnO+RML), which can be detailed in this sheet (Figure 1e). Some well known options are the SQL and GREL functions, but any option can be used. The functions are referred in the Predicate Object sheet or in other function rows with the identifier specified in FunctionID. The column Feature is used to specify the type of information provided in Value, where the name of the function and the value of the parameters are written. 3 Demonstration The spreadsheet containing the transformation rules is processed by the tool Mapeathor to create a mapping file. For example, Figure 2 depicts the mapping file written in the RML language that results when translating the rules in Figure 1. Currently, this tool translates Google spreadsheets and XLSX files to the following languages: the W3C recommendation R2RML [2], RML [3], and its serialization, YARRRML6 . It can be used as a web service7 and as a CLI8 . 6 https://rml.io/yarrrml/ 7 https://morph.oeg.fi.upm.es/tool/mapeathor/swagger/ 8 https://github.com/oeg-upm/Mapeathor Mapeathor: Simplifying Declarative Rules Specification for KGC Currently, Mapeathor is being used to generate mappings for city open data publication related to traffic, public bus transport, budget and noise pollution in the context of the Ciudades Abiertas project. Six spreadsheets have been completed, containing 31 subjects and 104 predicate-objects rules. The process of spreadsheet completion and mapping creation for the languages implemented will be shown in the demo with data from this real-world use case. 4 Conclusions and future work This paper presents Mapeathor, a tool able to translate transformation rules specified in spreadsheets to three different mapping languages. The key part of the work are the spreadsheets containing the mapping rules, since they are designed to facilitate the specification process for the user. Currently, the tool is being tested in several use cases from the Ciudades Abiertas project. The purpose of this work is to create a framework to declare in a user-friendly manner the transformation rules in a language-independent way and to be able to generate these rules in any mapping language. Future work includes a user study to test the usefulness of this tool and find guidelines for improvement, extend the tool to cover more languages, and implement changes that make rule specification more user-friendly. Acknowledgements. The work presented in this paper is supported by the Spanish Ministerio de Economı́a, Industria y Competitividad and EU FEDER funds under the DATOS 4.0: RETOS Y SOLUCIONES - UPM Spanish national project (TIN2016-78011-C4-4-R). References 1. Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a New Generation of Ontology Based Data Access. Semantic Web 11, 153–160 (2020) 2. Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012, https://www.w3.org/TR/r2rml/ 3. Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF mappings of heterogeneous data. In: Ldow (2014) 4. Heyvaert, P., Dimou, A., Herregodts, A.L., Verborgh, R., Schuurman, D., Mannens, E., Van de Walle, R.: Rmleditor: a graph-based mapping editor for linked data mappings. In: European Semantic Web Conference. pp. 709–723. Springer (2016) 5. Iglesias-Molina, A., Chaves-Fraga, D., Priyatna, F., Corcho, O.: Towards the defini- tion of a language-independent mapping template for knowledge graph creation. In: Proceedings of the Third International Workshop on Capturing Scientific Knowl- edge. pp. 33–36 (2019) 6. Sicilia, Á., Nemirovski, G., Nolle, A.: Map-On: A web-based editor for visual ontol- ogy mapping. Semantic Web 8(6), 969–980 (2017)