PKGCubes: Personalizing Multidimensional Data Analytics through Personal Knowledge Graph Cubes Fouad Zablith1,* , Shadi Youssef1 1 Olayan School of Business, American University of Beirut, PO Box 11-0236, Riad El Solh, 1107 2020, Beirut, Lebanon Abstract While knowledge graphs are increasingly adopted for supporting data analysis over linked data cubes, it is still challenging for end-users to personalize, preserve, and share cubes that are pertinent to their analytics objectives. Building on Personal Knowledge Graphs, this study introduces the notion of Personal Knowledge Graph Cubes (PKGCubes). PKGCubes serve as a mediator between the web of data cubes, and data analysis platforms. A demo of PKGCubes Manager is presented, enabling data analysts to create, publish, and reuse PKGCubes in standalone data analysis tools. This work contributes to offering more personalized and self-service analytics tasks on the growing web of data. Keywords Personal knowledge graph cubes, linked data, visual analytics, semantic web, OLAP, data cubes 1. Introduction Increased research efforts are aiming to leverage the expressive nature of knowledge graphs for facilitating data analytics tasks [1]. One popular type of data is multidimensional data having measures and dimensions that form cubes for Online Analytical Processing (OLAP) [2]. In this context, related works ranged from studying the effective representation of data cubes through ontologies (e.g., the RDF Data Cube [3] and QB4OLAP [4]), to increasing the usability and value of the graph data [5] through visual [6] and knowledge graph management features [7]. While such efforts are providing greater data sharing and usability opportunities, manipulat- ing knowledge graphs for data analytics still poses some challenges to end-users [8]. With the plethora of published linked open datasets, end-users find it challenging to customize, preserve, and share knowledge graph cubes that are pertinent to their analytical objectives. This demo paper focuses on answering the following research question: how can we better personalize data analysis over multidimensional web of data cubes? 2. Personal Knowledge Graph Cubes Personal Knowledge Graphs (PKG) allow the representation of knowledge graph entities that are relevant and of personal nature to a particular individual [9]. We see an opportunity to build Posters, Demos, and Industry Tracks at ISWC 2024, November 13–15, 2024, Baltimore, USA * Corresponding author. $ fouad.zablith@aub.edu.lb (F. Zablith); say09@mail.aub.edu (S. Youssef) € https://fouad.zablith.org/ (F. Zablith)  0000-0002-8978-9911 (F. Zablith) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings on the notion of PKGs to enable a more personalized depiction of linked data cubes. We propose Personal Knowledge Graph Cubes (PKGCubes) to represent data cubes that fulfill an individual’s analytical needs and requirements. Building on the rich knowledge graph semantics, PKGCubes are meant to be stored, shared, and combined with other cubes. Figure 1 illustrates how we envision the PKGCube. It acts as a mediator between published linked data cubes and data analytics tools. It serves as a personal and persistent snapshot view of graph data, representing entities that connect to linked data cube sources, and feeds the personal data of interest into analytics tools. Ontologically, we design a PKGCube as an extension of the RDF Data Cube vocabulary [3]. While the RDF Data Cube vocabulary is well positioned to represent the data cube entities (e.g., observations, slices, etc.), it lacks the representation of entities needed to make them more personalized. This is the gap that PKGCubes aim to fill. In its initial ontology version, a PKGCube represents a: person entity who published the cube; version for tracking changes; cube hash to encode the content; access specification to set private versus public cubes; source query that enables its recreation; description with information on the data in the cube; link to where the data is available; observations derived from linked data cube sources; and slice information to store filtering settings applied to the PKGCube. Linked Data Cube Personal Knowledge Data Analytics Datasets Graph Cube Tools Version Person Observation Cube Hash PKGCube Slice Access Link Source Query Description Create Publish Reuse Figure 1: Illustration of Personal Knowledge Graph Cube Entities and the Related Framework Steps. We envisage a framework to create, publish, and reuse PKGCubes. The creation of the PKGCubes involves providing individuals access to navigate and select entities from linked data cube sources to include in their PKGCube based on their individual analysis objectives. The created PKGCubes are stored and published on a triplestore. To maximize reusability of the PKGCubes in a variety of data analytics tools, they can be further processed and transformed into more manipulable data formats such as tabular structure in the form of spreadsheets or Comma Separated Values (CSV). 3. Demo: PKGCubes Manager We demonstrate the feasibility of PKGcubes through PKGCubes Manager, a Python Streamlit online app1 that enables data analysts to create PKGCubes through a set of filters, publish the 1 PKGCubes Manager is accessible at: https://linked.aub.edu.lb:8502/. Figure 2: Screenshots of the PKGCubes Manager Features. cubes to a triple store, and reuse them in analytics apps such as Microsoft Power BI [10]. Figure 2 shows the main features of the PKGCubes Manager app2 . We test the app in the context of openly accessible statistical data in several domains (e.g., health care, tourism, and others) that were transformed from various distributed data sources (e.g., ministries) in Lebanon. PKGCubes Manager has so far two main functionalities, the PKGCubes Publisher, and PKGCubes Explorer. The PKGCubes Publisher enables data analysts to specify a SPARQL endpoint that contains RDF data cubes. It offers predefined endpoints in the drop-down menu, or new endpoints that can be provided by analysts. The selected endpoints need to store data cubes with explicit datasets, measures, and dimensions following the RDF Data Cube vocabulary. The publisher app scans the data available in the endpoint using SPARQL templates designed to detect the available datasets and their related entities. The SPARQL results are then used to populate the “Cube Filters” available in the app. Users can then filter the cubes based on the domains, datasets, and the available dataset measures and dimensions. After the selection, the tool builds a SPARQL query in the background based on the filters selection and presents the SPARQL results in tabular format. Users can then check the cube data loaded in the table, fine-tune the filters if needed, and publish the cube. To publish the cube, users need to provide their personal details including their name and email. Then the app executes (1) a Unique Resource Identifiers (URIs) and linkage genera- tion step, (2) a versioning check, followed by (3) a data publication phase. In the first step, the PKGCubes URIs are generated based on the MD5 hash of the following combination: . This configuration enables associating a unique one- way identification of the PKGCubes while preserving the users’ data privacy. It also helps with storing the configuration that the user followed to generate the PKGCube, and appropriate 2 A video demonstration is available at: https://youtu.be/e9NPsrVSXXM linkages among cube versions. The publisher links the PKGCube URIs to the relevant entities (e.g., observations extracted from the endpoint, source query, and other elements mentioned in Figure 1) and the personal user URI generated based on the MD5 hash of their provided email. In the second step, a versioning functionality was implemented to keep track of the different versions of the same PKGCube. Versioning is valuable to have snapshots of the data saved at various points in time. The publisher app checks the version of the PKGCube at two levels. At the first level, the app checks whether the PKGCube URI was already published. If it’s a new PKGCube, the PKGCube entities and related files (i.e., CSV and RDF) are generated. If the cube exists, it checks the extracted content from the cube, compares it to the cube content hash of the latest version, and creates appropriate version linkages that users can explore. Finally, the generated PKGCube entities are published to a triplestore. In the PKGCubes Explorer part, data analysts are able to browse the published PKGCubes details, their linked versions, and reuse the related data in external applications. Another feature of the tool is a “refresh” functionality that updates the PKGCube with the latest data available in the initial endpoint. This is useful to handling cases when the source RDF Data Cubes content changes, allowing analysts to update their PKGCubes with the latest data that can be seamlessly reflected in their external applications. To illustrate the reuse of data, Figure 2 showcases how a PKGCube’s linked CSV file was used in Microsoft Power BI to generate a dashboard on tourism index and guest houses around Lebanon3 . This demonstrates the potential of PKGCubes to create personalized, uniquely referenced, and preserved data cubes that can be reused by analysts in their preferred data analysis environments. 4. Conclusion We presented in this paper the notion of Personal Knowledge Graph Cubes, with a demonstra- tion of its application through the PKGCubes Manager online app. As part of future research, this work can benefit from developing more robust management and access control features of PKGCubes. This conforms with the Personal Knowledge Graph ecosystem laid out by Skjaeve- land et al. [11]. Another interesting research direction would be to investigate additional social interactions around the cubes. A possible approach to investigate is the potential alignment with the Social Linked Data (Solid) principles [12], which provide further privacy and user-control functionalities when publishing data [13]. We are planning to evaluate the impact of PKGCubes on performing data analytics tasks in projects and use cases. Use case data will help improve the ontology and interface design for managing PKGCubes. This research contributes to providing more personalized and self-service analytics [14, 15] on the growing web of data. Acknowledgments This work was partially supported by the Olayan School of Business (OSB) Research Initiative fund, and the American University of Beirut Research Board (URB). 3 The PKGCube used to generate the Power BI visualizations is accessible at: http://linked.aub.edu.lb/pkgcube/ 551015b5649368dd2612f795c2a9c2d8 References [1] M. E. Papadaki, Y. Tzitzikas, M. Mountantonakis, A Brief Survey of Methods for Analytics over RDF Knowledge Graphs, Analytics 2 (2023) 55–74. doi:10.3390/ analytics2010004, number: 1 Publisher: MDPI. [2] A. Abelló, O. Romero, T. B. Pedersen, R. Berlanga, V. Nebot, M. J. Aramburu, A. Simitsis, Using semantic web technologies for exploratory OLAP: a survey, IEEE transactions on knowledge and data engineering 27 (2014) 571–588. Publisher: IEEE. [3] The RDF Data Cube Vocabulary, 2014. URL: https://www.w3.org/TR/vocab-data-cube/. [4] L. Etcheverry, A. A. Vaisman, QB4OLAP: a new vocabulary for OLAP cubes on the semantic web, in: Proceedings of the Third International Conference on Consuming Linked Data, volume 905, CEUR-WS. org, 2012, pp. 27–38. [5] P. Escobar, G. Candela, J. Trujillo, M. Marco-Such, J. Peral, Adding value to Linked Open Data using a multidimensional model approach based on the RDF Data Cube vocabulary, Computer Standards & Interfaces 68 (2020) 103378. doi:10.1016/j.csi.2019.103378. [6] G. Tschinkel, E. E. Veas, B. Mutlu, V. Sabol, Using Semantics for Interactive Visual Analysis of Linked Open Data., in: ISWC (Posters & Demos), Citeseer, 2014, pp. 133–136. [7] P. Haase, D. M. Herzig, A. Kozlov, A. Nikolov, J. Trame, metaphactory: A platform for knowledge graph management, Semantic Web 10 (2019) 1109–1125. Publisher: IOS Press. [8] S. Ferré, Analytical Queries on Vanilla RDF Graphs with a Guided Query Builder Approach, in: T. Andreasen, G. De Tré, J. Kacprzyk, H. Legind Larsen, G. Bordogna, S. Zadrożny (Eds.), Flexible Query Answering Systems, Springer International Publishing, Cham, 2021, pp. 41–53. doi:10.1007/978-3-030-86967-0_4. [9] K. Balog, T. Kenter, Personal knowledge graphs: A research agenda, in: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2019. [10] Power BI - Data Visualization | Microsoft Power Platform, 2024. URL: https://www. microsoft.com/en-us/power-platform/products/power-bi. [11] M. G. Skjæveland, K. Balog, N. Bernard, W. Łajewska, T. Linjordet, An ecosystem for personal knowledge graphs: A survey and research roadmap, AI Open 5 (2024) 55–69. doi:10.1016/j.aiopen.2024.01.003. [12] A. V. Sambra, E. Mansour, S. Hawke, M. Zereba, N. Greco, A. Ghanem, D. Zagidulin, A. Aboulnaga, T. Berners-Lee, Solid: a platform for decentralized social applications based on linked data, Technical Report, MIT CSAIL & Qatar Computing Research Inst., 2016. [13] S. Meckler, R. Dorsch, D. Henselmann, A. Harth, The Web and Linked Data as a Solid Foundation for Dataspaces, in: Companion Proceedings of the ACM Web Conference, WWW ’23 Companion, Association for Computing Machinery, New York, NY, USA, 2023, pp. 1440–1446. doi:10.1145/3543873.3587616. [14] A. Abelló, J. Darmont, L. Etcheverry, M. Golfarelli, J.-N. Mazón, F. Naumann, T. Pedersen, S. B. Rizzi, J. Trujillo, P. Vassiliadis, Fusion cubes: Towards self-service business intelligence, International Journal of Data Warehousing and Mining (IJDWM) 9 (2013) 66–88. Publisher: IGI Global. [15] J. Passlick, L. Grützner, M. Schulz, M. H. Breitner, Self-service business intelligence and analytics application scenarios: A taxonomy for differentiation, Information Systems and e-Business Management 21 (2023) 159–191. doi:10.1007/s10257-022-00574-3.