Preface for the 5th Edition of the International Knowledge Graph Construction Workshop David Chaves-Fraga1,2 , Anastasia Dimou2,3,4 , Ana Iglesias-Molina5 , Umutcan Serles6 and Dylan Van Assche7 1 Universidade de Santiago de Compostela, Departamento de Electrónica e Computación, Santiago de Compostela, Spain 2 KU Leuven, Department of Computer Science, Sint-Katelijne-Waver, Belgium 3 Flanders Make – DTAI-FET 4 Leuven.AI – KU Leuven institute for AI, B-3000 Leuven, Belgium 5 Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, Spain 6 Semantic Technology Institute Innsbruck, Universität Innsbruck, Austria 7 IDLab, Dept of Electronics and Information Systems, Ghent University – imec, Belgium More and more knowledge graphs are constructed for private use, e.g., the Amazon Prod- uct Graph [1] or the Fashion Knowledge Graph by Zalando1 ,or public use, e.g., DBpedia2 or Wikidata3 . While techniques to automatically construct KGs from existing Web objects exist (e.g., scraping Web tables), there is still room for improvement. So far, constructing knowledge graphs was considered an engineering task, however, more scientifically robust methods keep on emerging. These methods were widely questioned for their verbosity, low performance or difficulty of use, while the data sources’ variety and complexity cause further syntax and semantic interoperability issues. Declarative methods (mapping languages) for describing rules to construct knowledge graphs and approaches to execute those rules keep on emerging. Nevertheless constructing knowledge graphs is still not a straightforward task because several existing challenges remain and yet the barriers to construct knowledge graphs are not lowered enough to be easily and broadly adopted by industry. These reasons and the vastly populated knowledge graph construction W3C Community Group4 show that there are still open questions that require further investigation to come up with groundbreaking solutions. Addressing challenges related to knowledge graphs construction requires well-founded research, including the investigation of concepts and development of tools as well as methods for their evaluation. R2RML was recommended in 2012 by W3C, and since then, different extensions, alternatives and implementations were proposed [2, 3, 4]. Certain approaches followed the ETL-like paradigm, e.g., SDM-RDFizer [5], RocketRML [6], and FunMap [7], while Fifth International Workshop On Knowledge Graph Construction Co-located with the ESWC 2024, 27th May 2024, Crete, Greece Envelope-Open david.chaves@upm.es (D. Chaves-Fraga); anastasia.dimou@kuleuven.be (A. Dimou); ana.iglesiasm@upm.es (A. Iglesias-Molina); umutcan.serles@sti2.at (U. Serles); dylan.van.assche@ugent.be (D. V. Assche) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 1 https://engineering.zalando.com/posts/2018/03/semantic-web-technologies.html 2 https://www.dbpedia.org/resources/knowledge-graphs/ 3 https://www.wikidata.org/wiki/Wikidata:Main_Page 4 http://w3.org/community/kg-construct CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings others the query-answering paradigm, e.g., Ultrawrap [8], Morph-RDB [9] and Ontop [10]. Besides R2RML-based extensions, alternatives were proposed, e.g., SPARQL-Generate [11] and ShExML [12], as well as methods to perform data transformations while constructing knowledge graphs, e.g., FnO [13] and FunUL [14]. The fifth edition of the knowledge graph construction workshop5 has a special focus on time on novel techniques, frameworks, architectures, and tools for the new extensions of RML such as RDF Collections and Containers, and RDF-Star support and the 2023 release of the RDF Mapping Language (RML) [15] in general. It also included: • Keynote. The workshop includes the keynote from Lionel Tailhardat (Orange): “Anomaly Detection For Telco Companies: Challenges And Opportunities In Knowledge Graph Construction” • The Second Knowledge Graph Construction Challenge. The edition of this year’s challenge has a double objective: benchmarking systems to (i) find which RDF graph construction system optimizes for metrics i.e. execution time, CPU and memory usage; and (ii) how compliant are they with the 2023 revision of RML and its new modules. The final goal of the event is to provide a venue for scientific discourse, systematic analysis and rigorous evaluation of languages, techniques and tools, as well as practical and applied experiences and lessons-learned for constructing knowledge graphs from academia and industry. Eight papers were submitted. The reviews were open and public, and hosted at Open Review6 . Each paper received at least three reviews from reviewers with different background and status. Each paper received a review from a senior, a junior and an industry researcher. Five papers were accepted and one was conditionally accepted. Five of the accepted papers were long papers and one was a short paper. The following papers were accepted for publication and presented at the workshop: • Not Everybody Speaks RDF: Knowledge Conversion between Different Data Representa- tions [16]. • BURPing Through RML Test Cases [17]. • Propagating Ontology Changes to Declarative Mappings in Construction of Knowledge Graphs [18]. • RML-view-to-CSV: A Proof-of-Concept Implementation for RML Logical Views [19]. • R2[RML]-ChatGPT Framework [20]. • Towards Self-Configuring Knowledge Graph Construction Pipelines using LLMs - A Case Study with RML [21]. During the workshop, the second edition of the Knowledge Graph Construction Challenge was organized with two different tracks: (i) conformance with the new RML modules, and (ii) performance of engines on the same hardware. The first track around conformance with the new RML modules encouraged developers of RML engines to support the specifications of the new RML modules by evaluating their engines 5 http://w3id.org/kg-construct/workshop/2024 6 https://openreview.net/group?id=eswc-conferences.org/ESWC/2024/Workshop/KGCW against 365 test cases provided by the maintainers of each RML module. RML-Core (238 test cases), which focus on the core parts of RDF generation, provides the biggest number of test cases, followed by RML-IO (67 test cases) to access various data sources and targets. Data transformations with FnO were also present through the RML-FNML (13 test cases) module. Newer modules e.g. RML-Star (18 test cases) for RDF-Star support and RML-CC (29 test cases) to generate RDFS Collections & Containers provided new challenges for existing engines as they impact the RDF generation process. We had 5 participating engines for the first track: RMLMapper [2], SDM-RDFizer [5], mapping-template [16], RPT/SANSA [22], and BURP [17]. The second track around performance was similar to the previous edition except that now each participant had access to a common hardware environment. This way, each engine had the same restrictions regarding CPU and RAM. Through this track, we wanted to not only focus on execution time but also resource consumption of each engine. This track consisted of 2 parts: (i) artificial data for analyzing specific parameters of the construction process e.g. joins, data size, mappings, and (ii) real-life data of the GTFS Madrid Benchmark to evaluate approaches in real use cases. We had 6 participating engines for the second track: mapping- template [16], FlexRML [23], RMLWeaver-js [24], RPT/Sansa [22], RMLStreamer [25], and RML-view-to-CSV+RMLStreamer [19]. Several participants also submitted a report of their participation in one or both tracks. The following reports are included in the proceedings: • RMLStreamer supported by RML-view-to-CSV in the Performance Track of the KGCW Challenge 2024 [26]. • RMLWeaver-JS: An Algebraic Mapping Engine in the KGCW Challenge 2024 [24]. • Performance Results of FlexRML in the KGCW Challenge 2024 [27]. • Backwards or Forwards? [R2]RML Backwards Compatibility in RMLMapper [28]. • The Conformance of an RML Processor Built from Scratch to Validate RML Specifications and Test Cases [29]. • Results for Knowledge Graph Creation Challenge 2024: SDM-RDFizer [30]. • KGCW2024 Challenge Report: RDFProcessingToolkit [31]. Organizing Committee • David Chaves-Fraga, Universidade de Santiago de Compostela • Anastasia Dimou, KU Leuven, Flanders Make, Leuven.AI • Dylan Van Assche, Ghent University – imec – IDLab • Ana Iglesias-Molina, Universidad Politécnica de Madrid • Umutcan Serles, University of Innsbruck Program Committee • Anelia Kurteva, Delft University of Technology • Beatriz Esteves, Universidad Politécnica de Madrid • Ben De Meester, Ghent University – imec – IDLab • Bram Steenwinckel, Ghent University – imec – IDLab • Christophe Debruyne, Liège University • Claus Stadler, University of Leipzig • Davide Lanti, Free University of Bozen • Edna Ruckhaus Magnus, Universidad Politécnica de Madrid • Els de Vleeschauwer, Ghent University • Enrique Antonio Iglesias, Leibniz University of Hannover • Ernesto Jimenez-Ruiz, City, University of London • Femke Ongenae, Ghent University • Franck Michel, CNRS • Gertjan De Mulder, Ghent University – imec – IDLab • Giorgos Flouris, FORTH-ICS • Hannes Voigt, TU Dresden • Herminio García-González, Kazerne Dossin • Ibai Guillén-pacho, Universidad Politécnica de Madrid • Ioannis Dasoulas, KU Leuven • Jakub Klímek, Charles University • Juliette Opdenplatz, Universität Innsbruck • Jürgen Umbrich, Vienna University of Economics and Business • Manolis Koubarakis, National and Kapodistrian University of Athens • Maria-Esther Vidal, Leibniz University of Hannover • Mario Scrocca, Cefriel • Markus Schröder, German Research Center for AI • Michael Freund, Fraunhofer • Oscar Corcho, Universidad Politécnica de Madrid • Pano Maria, Skemu • Samaneh Jozashoori, metaphacts GmbH • Sergio José Rodríguez Méndez, Australian National University • Sitt Min Oo, Ghent University – imec – IDLab • Sven Lieber, Royal Library Of Belgium • Tobias Schweizer, SWITCH • Vladimir Alexiev, Ontotext References [1] X. L. Dong, X. He, A. Kan, X. Li, Y. Liang, J. Ma, Y. E. Xu, C. Zhang, T. Zhao, G. Blanco Sal- dana, S. Deshpande, A. Michetti Manduca, J. Ren, S. P. Singh, F. Xiao, H.-S. Chang, G. Kara- manolakis, Y. Mao, Y. Wang, C. Faloutsos, A. McCallum, J. Han, AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types, KDD ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 2724–2734. [2] A. Dimou, M. V. Sande, P. Colpaert, R. Verborgh, E. Mannens, R. V. de Walle, RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data, in: Proceedings of the 7th Workshop on Linked Data on the Web (LDOW), 2014. [3] D. Chaves-Fraga, F. Priyatna, I. Perez-Santana, O. Corcho, Virtual Statistics Knowledge Graph Generation from CSV files, in: Emerging Topics in Semantic Technologies: ISWC 2018 Satellite Events, Studies on the Semantic Web, IOS Press, 2018. [4] F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat, xR2RML: Relational and Non- Relational Databases to RDF Mapping Language, Technical Report, 2017. [5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal, SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 3039–3046. [6] U. Şimşek, E. Kärle, D. Fensel, RocketRML - A NodeJS implementation of a Use-Case Specific RML Mapper, in: Proceedings of the 1st Workshop on Knowledge Graph Building, 2019. [7] S. Jozashoori, D. Chaves-Fraga, E. Iglesias, M.-E. Vidal, O. Corcho, FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation, in: International Semantic Web Conference, Springer, 2020, pp. 276–293. [8] J. F. Sequeda, D. P. Miranker, Ultrawrap: SPARQL execution on relational data, Web Semantics: Science, Services and Agents on the WWW (2013). [9] F. Priyatna, O. Corcho, J. Sequeda, Formalisation and Experiences of R2RML-based SPARQL to SQL Query Translation Using Morph, in: Proceedings of the 23rd International Confer- ence on World Wide Web, 2014. [10] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez- Muro, G. Xiao, Ontop: Answering SPARQL Queries over Relational Databases, Semantic Web Journal (2017). [11] M. Lefrançois, A. Zimmermann, N. Bakerally, A SPARQL Extension for Generating RDF from Heterogeneous Formats, in: The Semantic Web: 14th International Conference, 2017. [12] H. García-González, I. Boneva, S. Staworko, J. E. Labra-Gayo, J. M. C. Lovelle, ShExML: improving the usability of heterogeneous data mapping languages for first-time users, PeerJ Computer Science 6 (2020) e318. [13] B. De Meester, A. Dimou, R. Verborgh, E. Mannens, An ontology to semantically declare and describe functions, in: European Semantic Web Conference, 2016, pp. 46–49. [14] A. C. Junior, C. Debruyne, R. Brennan, D. O’Sullivan, FunUL: a method to incorporate functions into uplift mapping languages, in: Proceedings of the 18th International Con- ference on Information Integration and Web-based Applications and Services, 2016, pp. 267–275. [15] A. Iglesias-Molina, D. Van Assche, J. Arenas-Guerrero, B. De Meester, C. Debruyne, S. Joza- shoori, P. Maria, F. Michel, D. Chaves-Fraga, A. Dimou, The RML Ontology: A Community- Driven Modular Redesign After a Decade of Experience in Mapping Heterogeneous Data to RDF, in: The Semantic Web – ISWC 2023: 22nd International Semantic Web Conference, Athens, Greece, November 6–10, 2023, Proceedings, Springer, 2023. [16] M. Scrocca, A. Carenini, M. Grassi, M. Comerio, I. Celino, Not Everybody Speaks RDF: Knowledge Conversion between Different Data Representations, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [17] D. Van Assche, C. Debruyne, BURPing Through RML Test Cases, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [18] D. C. Herreros, D. Chaves-Fraga, M. Poveda-Villalón, R. Pernisch, L. Stork, O. Corcho, Prop- agating Ontology Changes to Declarative Mappings in Construction of Knowledge Graphs, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [19] E. de Vleeschauwer, P. Maria, B. De Meester, P. Colpaert, RML-view-to-CSV: A Proof-of- Concept Implementation for RML Logical Views, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2023. [20] A. Randles, D. O’Sullivan, R2 [RML]-ChatGPT Framework, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [21] M. Hofer, J. Frey, E. Rahm, Towards Self-Configuring Knowledge Graph Construction Pipelines using LLMs - A Case Study with RML, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [22] C. Stadler, L. Bühmann, L.-P. Meyer, M. Martin, Scaling rml and sparql-based knowledge graph construction with apache spark., in: Proceedings of the 4th International Workshop on Knowledge Graph Construction (KGCW 2023), 2023. [23] M. Freund, S. Schmid, R. Dorsch, A. Harth, FlexRML: A Flexible and Memory Efficient Knowledge Graph Materializer, in: The Semantic Web: 21st International Conference, ESWC 2024, Hersonissos, Crete, Greece, May 26–30, 2024, Proceedings, Part II, 2024. [24] S. M. Oo, T. Verbeken, B. De Meester, RMLWeaver-JS: An algebraic mapping engine in the KGCW Challenge 2024, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [25] G. Haesendonck, W. Maroy, P. Heyvaert, R. Verborgh, A. Dimou, Parallel RDF Generation from Heterogeneous Big Data, in: Proceedings of the International Workshop on Semantic Big Data, 2019. [26] E. de Vleeschauwer, B. De Meester, RMLStreamer supported by RML-view-to-CSV in the performance track of the KGCW Challenge 2024, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [27] M. Freund, S. Schmid, R. Dorsch, A. Harth, Performance Results of FlexRML in the KGCW Challenge 2024, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [28] D. Van Assche, J. Jankaj, B. De Meester, Backwards or Forwards? [R2]RML backwards compatibility in RMLMapper, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [29] C. Debruyne, D. Van Assche, The Conformance of an RML Processor Built from Scratch to Validate RML Specifications and Test Cases, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024. [30] E. Iglesias, M.-E. Vidal, Results for Knowledge Graph Creation Challenge 2024: SDM- RDFizer, in: Proceedings of the 5th International Workshop on Knowledge Graph Con- struction, 2024. [31] C. Stadler, S. Bin, KGCW2024 Challenge Report: RDFProcessingToolkit, in: Proceedings of the 5th International Workshop on Knowledge Graph Construction, 2024.