1. AI-Powered Industrial Anomaly Detection: A Dual Approach with

Francesca De Luzi

Flavia Monti

Massimo Mecella

Francesca De Luzi Flavia Monti Massimo Mecella

0 Department of Computer, Control and Management Engineering, Sapienza Università di Roma , Rome , Italy

2025

This contribution contains the accepted papers of the Research Projects Exhibition, held in conjunction with the 33rd Symposium on Advanced Database Systems (SEBD 2025). This edition of SEBD took place in Ischia (Italy) between June 16th and 19th. We were delighted to contribute to this year's symposium with the first edition of the Research Projects Exhibition (RPE@SEBD'25). The SEBD conference is renowned as the premier Italian venue for presenting innovative and rigorous research across the broad spectrum of advanced database systems. In line with this tradition, the Research Projects Exhibition was specifically organized to provide a dedicated track to showcase ongoing research projects (e.g., European PNRR or Horizon Europe projects, national or regional initiatives, or other funded research) in the context of advanced database systems and their applications. The main objective of this initiative was to create a forum where authors could disseminate intermediate results, present the objectives and achievements of their projects, and receive constructive feedback on project proposals under development. The exhibition also ofered a friendly environment for finding potential research partners, strengthening existing collaborations, and stimulating new ideas. In this first edition of RPE@SEBD'25, we accepted 7 posters that were presented in a dedicated face-to-face session running throughout the symposium. The list of accepted papers is provided below, and each of them is presented in the following sections.

1. AI-Powered Industrial Anomaly Detection: A Dual Approach with LLMs and Machine Learning

Ala Arman1,* , Filippo Bianchini1 , Marco Calamo1 , Loredana Cristaldi2 , Emilia Lenzi2, Matteo Marinacci1 , Davide Martinenghi2, Luca Martiri2, Massimo Mecella1 , Andrea Moschetti2,

Jacopo Rossi1 , Letizia Tanca2

1.1. The MICS Project

The MICS Project [ 1, 2 ] is a major national initiative uniting academia and industry to promote circularity in key sectors by developing data-driven models and methods that support the full lifecycle of industrial processes. see Table 1 for more details.

Project Full Name Duration Participants Funding Agency Total Investment Researchers Involved Active Sub-Projects Cascade Funding Funded Organizations Key Contributors Oficial Website

1.2. The Proposed Approach

Anomaly detection is critical for reliable and cost-efective manufacturing. To address the limitations of traditional inspections, we propose a hybrid intelligent system that combines semantic reasoning with advanced data analysis. Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) interpret technical documents to provide real-time, context-aware guidance to inspectors. Simultaneously, machine learning and deep learning techniques, including Random Forest (RF) and Convolutional Neural Networks (CNNs), analyze high-resolution images to detect and classify defects, with built-in resilience to noisy or imperfect data. This dual-track architecture integrates structured and unstructured data, enabling informed, eficient decision-making and reducing human error in complex inspection scenarios.

Acknowledgements.

This study was carried out within the MICS (Made in Italy – Circular and Sustainable) Extended Partnership and received funding from Next-Generation EU (Italian PNRR – M4 C2, Invest 1.3 – D.D. 1551.11-10-2022, PE00000004). CUP MICS D43C22003120001.

Declaration on Generative AI The authors have not employed any Generative AI tools. 2. An Ontology-based Multidimensional Data Modeling

Domenico Lembo1, Maurizio Lenzerini1, Antonella Poggi1, Federico Maria Scafoglieri1,*, Jacopo Brunetti1, Roberta Radini2, Michele Riccio2, Valerio Santarelli3

1Sapienza Università di Roma, Rome, Italy

2ISTAT Italian National Institute of Statistics, Rome, Italy

3OBDA Systems, Rome, Italy *Corresponding author. Email: scafoglieri@diag.uniroma1.it

The Problem. Aggregate data, also known as macro-data, concern with information produced in summarized form from individual level data, also known as micro-data. Typically gathered from operational databases, and possibly integrated from various data sources, aggregate information is usually managed through Business Intelligence and Data Warehousing solutions [ 1, 2, 3, 4, 5, 6 ].

Data aggregation is usually carried out by referring to the so-called multidimensional model [4], where events of interest for the analysis are represented as logical cubes. These cubes are characterized by dimensions (e.g., time or space), which correspond to the aspects of the business along which one wants to perform aggregation. Dimensions may be associated to hierarchies specifying diferent levels of aggregation (a.k.a. dimensional attributes [4]), and by measures, which are properties of the event on which to make calculations (e.g., sums or averages), and that can be used as business performance indicators (e.g., income of a shop, number of enrollments in a school). Operations performed on data cubes (also called OLAP operations) include the increment or decrement of the level of aggregation, called roll-up and drill-down, respectively, or the selection of a portion of events in the multidimensional space, called slice-and-dice.

Goal. In this project we devidse a new approach for modeling and manipulating aggregate data, which is based on the use of OWL2 ontologies [ 7 ] that provide a rigorous formalization of both the application domain and the multidimensional model. The overall ontology that we devise makes it explicit the way in which macro-data are obtained from micro-data, by exploiting views over the domain ontology [8], which are first-class citizens in our model. Data cubes and hierarchies are indeed seen as constructed from the (SPARQL) queries associated to the views, which allow cubes dimensions, cubes measures and hierarchy levels, to be instantiated from the answers to such queries. This is a distinguishing feature of our approach, considered that other models for multidimensional data (e.g., [9, 10]) do not formalize this aspect, and methodologies for data warehouse design do not provide declarative means to specify the connection between micro- and macro-data, which is usually hidden in ETL procedures, and thus it is dificult to understand and reconstruct, e.g., for data provenance and/or lineage.

Approach. Our work is currently focused on the development of services to support both designand run-time activities related to the production, distribution and integration of aggregate data. Such services are defined according to a formal semantics that extends the Metamodeling Semantics proposed in [11]. This semantics allows us to reason over various representation layers, i.e., the meta-level formalizing the multidimensional model, the actual data cubes designed for the analysis of the business trends, the domain ontology and the views bridging it to the cubes. A fundamental service in this scenario is query answering. Such service is indeed at the basis of several more complex functionalities, such as integration of aggregate data sets [12], possibly linked to the ontology through mappings as in OBDM [13, 14], and publishing of linked open data. Interestingly, queries in our framework may smoothly combine together elements belonging to the various mentioned levels. We finally remark that we are currently working on the implementation of software components, integrated in the OBDM tool Mastro [15, 16], that realize the multidimensional ontology-based approach devised in this project.

Lembo, Lenzerini, and Poggi received support from the PNRR MUR project PE0000013-FAIR; Lembo’s work was also funded by the POLAR project (Code 2022LA8XBH, CUP: B53D23013100006). Scafoglieri’s research was entirely and exclusively supported by the PNRR MUR project PE0000013-FAIR.

Declaration on Generative AI The authors have not employed any Generative AI tools. 3. Building National Data lakehouse Ecosystems for Environmental and Public Health: AnTeA and IDEAH

Mario Cerroni2, Francesca De Luzi3, Tommaso Filippini1, Valentina Fuscoletti2, Marco Giustini2, Rafaele Landi 4, Francesco Leotta3, Luca Lucentini2, Mattia Macrì3, Camilla Marchiafava2, Marco Marras4, Daniela Mattei2, Giampaolo Maugeri4, Massimo Mecella3,

Alessio Pitidis2, Marco Vinceti1

3.1. Introduction

Environmental quality and human health are intrinsically linked. Recognizing this connection, both the World Health Organization and the European Union have prioritized eforts to advance public well-being and support innovation through environmental and health data. In Italy, the National Complementary Plan (PNC) has funded specific actions that integrate and enhance the National Recovery and Resilience Plan (PNRR), ofering an opportunity to reform and innovate the management of environmental and public health data resources. In this context, two key digital platforms have been developed to support these goals: (i) AnTeA1 (Dynamic Territorial Registry of Drinking Water), a platform for the acquisition, management, and analysis of data on water quality and supply in Italy, ensuring compliance with EU Directive 2020/2184 and supporting transparent water governance, and (ii) IDEAH (Integrated Database for Environment And Health), a national data lakehouse [ 1 ] integrating environmental and health data to support research, epidemiology, and policy development. Both projects are coordinated by Sapienza University of Rome with institutional and scientific partners: the Italian National Institute of Health (ISS), the University of Modena and Reggio Emilia, and We-COM, cloud enabler of the National Strategic Hub (PSN), which provides the technological infrastructure for sound and scalable implementation. This collaborative network ensures the development of modern and interoperable digital infrastructures focused on improving public health and environmental monitoring, through cooperation among national institutions, academia, and regional authorities.

3.2. AnTeA – Dynamic Territorial Registry of Drinking Water

AnTeA is a digital platform created to ensure standardized, transparent, and cooperative management of drinking water data in Italy, in line with Legislative Decree no. 18/2023 and the EU Drinking Water Directive. The project addresses the fragmentation of Italy’s water sector - over 2,300 providers using disparate systems - by pursuing the following objectives: • Data harmonization: AnTeA enables the integration of data on water sources, distribution systems, and water quality. It supports internal and external control reporting, incident tracking, risk assessment, and derogation management; • Cooperative framework: AnTeA adopts a Request for Comments (RfC) process to engage institutional stakeholders (e.g., ISS–CeNSiA, ARERA, MASE, ISTAT, Regions, ASLs, EGATOs), ensuring shared governance and continuous improvement; • Interoperability and scalability: built on the National Strategic Hub (PSN), AnTeA leverages a secure cloud infrastructure for data reliability, availability, and exchange with European bodies and international institutions; • Public transparency: the platform enhances citizens’ right to information about water quality, contributing to public trust and informed environmental stewardship.

3.3. IDEAH – Integrated Database for Environment And Health

IDEAH is an initiative led by the ISS, developed within the framework of the SNPS (National System for the Prevention of Health from Environmental and Climate Risks), established by Legislative Decree no. 36/2022. It provides a centralized, scalable data lakehouse architecture that integrates heterogeneous environmental and health datasets across multiple territorial scales, from international to local. The platform enhance risk assessment, disease prevention, and policy-making through advanced analytics and interoperable data access, ofering the following key features: • Integrated data sources: IDEAH consolidates 40 environmental data sources, including terrestrial and satellite data (e.g., Copernicus missions Sentinel-2 and Sentinel-5P), and health data such as mortality, hospital discharge records, emergency room visits, and birth certificates; • Privacy and security: compliance with national data protection regulations is ensured through anonymization, semi-anonymization techniques, and strong authentication mechanisms; • User access and profiling: access is managed via SPID digital identity with role-based permissions.

Researchers provide their professional background and research objectives, allowing IDEAH to tailor data access accordingly; • Interactive dashboards: users can filter and explore datasets through dynamic graphs and maps, extracting specific geographic or temporal subsets; • Flexible data analysis environment: a cloud-based JupyterLab-inspired interface allows users to work in R or Python, import custom containers, and load personal libraries or configuration files.

Declaration on Generative AI The authors have not employed any Generative AI tools. 4. HEREDITARY: HetERogeneous sEmantic Data Integration for the guT-brAin inteRplaY

Gianmaria Silvello1,*

The Project

The HEREDITARY project is a European research initiative funded under the Horizon Europe programme. It aims to build an integrated digital infrastructure for precision medicine using multimodal data. The University of Padua (UNIPD) coordinates the project.

• Project Acronym: HEREDITARY • Total Duration: 48 months • EU Contribution: €9,988,833.75 • Funding Agency: European Commission, Horizon Europe programme • Project Website: https://hereditary-project.eu/ Work Packages Overview. The HEREDITARY project is organized into nine Work Packages (WPs), each addressing a key dimension of the project and led by a dedicated partner. WP1, coordinated by Università degli Studi di Padova (UNIPD), ensures overall project management, including coordination, implementation, and timely delivery. WP2, led by Università degli Studi di Torino (UNITO), defines clinical use cases and provides curated clinical, research, and environmental data through a federated infrastructure. WP3, under Aalborg Universitet (AAU), develops the semantic integration platform supporting multimodal and multilingual data analysis.

Building on this, WP4 (Haute École Spécialisée de Suisse Occidentale – HESSO) implements an analytics and learning platform to extract insights from heterogeneous data sources using advanced AI techniques. WP5, led by Technische Universität Graz (TUGRAZ), focuses on visual analytics, enabling users to explore and interpret data interactively. WP6 (Observa Associazione) enhances societal impact through citizen engagement, public communication, and policy-oriented activities.

WP7, managed by Katholieke Universiteit Leuven (KU Leuven), ensures compliance with legal, ethical, and regulatory standards, particularly concerning data protection and AI governance. WP8, led by Fundación Empresa Universidad Gallega (FEUGA), defines strategies for exploitation, innovation, and dissemination to promote long-term sustainability. Finally, WP9, also led by UNIPD, oversees adherence to the ethical requirements of the project.

Consortium Partners and Roles. The HEREDITARY consortium brings together leading institutions across Europe and beyond. UNIPD acts as Project Coordinator and leads WP1 and WP9 (https://www. unipd.it). ONTOTEXT (ONTO) is Exploitation Manager (https://www.ontotext.com), while FEUGA coordinates WP8 and manages intellectual property (https://www.feuga.es). OBSERVA leads WP6 (https://www.observanet.it), and KU Leuven is responsible for WP7 (https://www.kuleuven.be).

UNITO and AAU lead WP2 and WP3, respectively (https://www.unito.it, https://www.aau.dk), with HESSO heading WP4 (https://www.hes-so.ch) and TUGRAZ in charge of WP5 (https://www.tugraz.at). Other key contributors include SURF BV (https://www.surf.nl), Radboud University Medical Centre (RUMC) (https://www.radboudumc.nl), CRG-CERCA (https://www.crg.eu), UNL (https://www.unl.pt), EUpALS (https://www.eupals.eu), European Brain Council (EBC) as Quality and Risk Manager (https://www.braincouncil.eu), University of Colorado (UCD) (https://www.colorado.edu), EMBL (https://www.embl.org), and CNAG (https://www.cnag.eu), each contributing specialized expertise across the scientific, technical, and societal aspects of the project.

The Project Coordinator is Gianmaria Silvello from UNIPD, and the Scientific and Technical Manager is Manfredo Atzori (UNIPD-HESSO).

Scientific vision and goals

The HEREDITARY project aims to develop a secure and distributed system for linking multimodal health data, such as electronic health records, genomic data, medical imaging, and environmental data. By leveraging secure supercomputing environments and federated learning, data remains localized, respecting privacy and regulatory standards like GDPR. This infrastructure facilitates collaborative analysis without compromising sensitive health information, advancing medical research and improving patient outcomes. In addition, the project focuses on developing semantics-aware learning methods to integrate multimodal and genomics data, enhancing health outcomes. Using advanced AI techniques and Ontology-Based Data Access (OBDA), HEREDITARY creates unified data representations for complex queries and predictive analytics. These methods aim to provide deeper insights into the gut-brain axis and neurodegenerative diseases, ultimately contributing to the development of personalized medicine and healthcare solutions. Furthermore, the project empowers decision-making and strengthens citizen trust through an interactive data-driven platform for visual analytics. This platform enables researchers, clinicians, and policymakers to analyze complex health data using advanced visualization tools. By integrating explainable AI and engaging the public in the research process, HEREDITARY promotes transparency, fosters trust, and enhances public awareness, supporting informed decision-making for better health outcomes.

Resources

All public deliverables released by the project are available on the website at https://hereditary-project. eu/deliverables/ and in Zenodo at https://zenodo.org/communities/hereditaryproject/records.

The scientific publications related to the project are available at the URL https://hereditary-project. eu/publications/ and they are updated at a monthly basis.

During the initial phase of the project (spanning from month 1 to month 18), the HERITAGE consortium generated more than 50 publications spanning the project topics, including Knowledge Graphs and Data Quality [ 1, 2, 3, 4, 5, 6, 7 ], Data Annotation [8, 9], Ontologies [10, 11], Deep Learning in biomedicine [12, 13, 14, 15, 16], Information Extraction and Evaluation [17, 18, 19, 20, 21], Data Integration [22, 23], Synthetic Data [24], Visualization [25, 26], Citizen Science [27], and Terminology [28, 29, 30, 31, 32, 33].

Acknowledgments.

This work is partially supported by the HEREDITARY Project, as part of the European Union’s Horizon Europe research and innovation programme under grant agreement No GA 101137074. the clef 2024 simpletext task 2: Identify and explain dificult concepts, in: CEUR-WS, 2024. URL: https://ceur-ws.org/Vol-3740/#paper-306. [19] A. Nentidis, G. Katsimpras, A. Krithara, M. Krallinger, M. Rodriguez Ortega, N. Loukachevitch, A. Sakhovskiy, E. Tutubalina, G. Tsoumakas, G. Giannakoulas, A. Bekiaridou, A. Samaras, G. M. Di Nunzio, N. Ferro, S. Marchesin, L. Menotti, G. Silvello, G. Paliouras, BioASQ at CLEF2025: The thirteenth edition of the large-scale biomedical semantic indexing and question answering challenge, in: Proc. of the 47th European Conference on Information Retrieval (ECIR 2025), 2025. [20] M. Martinelli, G. Silvello, V. Bonato, G. M. Di Nunzio, N. Ferro, O. Irrera, S. Marchesin, L. Menotti, F. Vezzani, Overview of GutBrainIE@CLEF 2025: Gut-Brain Interplay Information Extraction, in: CLEF 2025 Working Notes, 2025. [21] M. Martinelli, Advancing cross-document relation extraction with hybrid retrieval and knowledgeaugmented reasoning, in: 33rd Symposium On Advanced Database Systems (SEBD 2025), 2025. [22] M. Cazzaro, Design and development of a polystore system for heterogeneous biomedical data, in: 33rd Symposium On Advanced Database Systems (SEBD 2025), 2025. [23] A. Zanola, F. D. Pup, C. Porcaro, M. Atzori, Bidsalign: a library for automatic merging and preprocessing of multiple eeg repositories, Journal of Neural Engineering (2024). [24] F. M. Trudslev, M. Lissandrini, J. M. Rodriguez, M. Bøgsted, D. Dell’Aglio, Priveval: a tool for interactive evaluation of privacy metrics in synthetic data generation, in: Proc. VLDB Endow., 2025. [25] S. Lengauer, P. Waldert, T. Schreck, Droplets: A marker design for visually enhancing local cluster association, in: IEEE VIS 2024 Bio+MedVis Challenge, 2024. [26] B. Kantz, K. Innerebner, P. Waldert, S. Lengauer, E. Lex, T. Schreck, Onset: Ontology and semantic exploration toolkit, in: Proc. of the The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025. [27] G. Pellegrini, C. Lovati, Stakeholders’ engagement for improved health outcomes. a research brief to design a tool for better communication and participation, Frontiers in Public Health 13 (2025). doi:10.3389/fpubh.2025.1536753. [28] V. Bonato, G. M. Di Nunzio, F. Vezzani, A novel approach to semic analysis: Extraction of atoms of meaning to study polysemy and polyreferentiality, MDPI Languages (2024). URL: https://www.mdpi.com/2226-471X/9/4/121. doi:10.3390/languages9040121. [29] F. Vezzani, G. M. Di Nunzio, A. Salgado, R. Costa, When LMF and TMF meet: towards a Unified

Markup Framework (UMF), John Benjamins (2024). [30] F. Vezzani, R. Costa, Variation in psychopathological terminology. a case study on body-dismorphic disorder, John Benjamins (2024). doi:10.1075/term.00078.vez. [31] R. Costa, M. Ramos, M. Canelas, A. Mouro, Exploring terminological collocations in biomedical texts, in: 4th International Conference on Multilingual digital terminology today. Design, representation formats and management systems (MDTT) 2025, 2025. [32] G. M. Di Nunzio, Consumer-centered technology-assisted review: Insights and challenges from the clef ehealth tasks, in: Semmelweis Medical Linguistics Conference 2025, 2025. [33] A. Mouro, M. Canelas, M. Ramos, R. Costa, Big diagnoses, small humans: Making neuro concepts child’s play through plain language, in: CS4Health 2025 Conference, 2025.

5. PRIN 2022 HOMEY project: objectives and current results

Antonio Nocera1 , Emanuele Storti2,* , Paolo Napoletano3

5.1. General information

HOMEY (A Human-centric IoE-based Framework for Supporting the Transition Towards Industry 5.0) is a PRIN 2022 collaborative project by Università di Pavia (UNIPV), Università Politecnica delle Marche (UNIVPM), and Università degli Studi di Milano-Bicocca (UNIMIB), funded by the European Union Next Generation EU, mission 4 component 1 (code: 2022NX7WKE, CUP: F53D23004340006). Duration: from 28/09/2023 to 28/02/2026.

Url: https://homey-prin22.unipv.it/. Repository: https://github.com/Homey-Prin22. Contributors from UNIPV: Antonino Nocera (PI), Marco Ferretti, Claudio Cusano, Tullio Facchinetti, Marco Arazzi; from UNIVPM: Emanuele Storti (sub-PI, research unit coordinator), Paola Pierleoni, Monica Marconi Sciarroni, Domenico Ursino; from UNIMIB: Paolo Napoletano (research unit coordinator), Simone Bianco, Raimondo Schettini, Gianluigi Ciocca, Gabriele Galimberti, Sergio Verga.

5.2. Project Objectives

Industry 5.0 is a novel paradigm identifying the transition from traditional industries towards smart, human-centric, and green-aware industrial ecosystems. In this context, the HOMEY project proposes a comprehensive framework that leverages the Internet of Everything (IoE), an evolution of the IoT interconnecting devices, people, data and processes, to create intelligent, adaptive, and human-centric industrial environments. The project is structured around three main objectives: O.1) Design and implementation of an industrial IoE framework that enables seamless interaction among humans, machines, and data. Semantic-based approaches for data management, access and monitoring ensure interoperability, while context-aware mechanisms for data extraction aim to provide each worker with a personalized view of relevant information. Security is guaranteed by role-based access and privacypreserving anomaly detection mechanisms. O.2) Definition and design of human-centric immersive digital working environment through Augmented Reality and zero-touch interfaces. Wearable devices with sensors and lightweight Machine Learning enable gesture control and real-time feedback for an intuitive, ergonomic experience. O.3) Personalized recommendation for task execution and team building. The system will assess risk levels for tasks and perform optimal tasks assignment based on the monitored worker’s stress level, efort of the task and other organizational constraints.

5.3. Current status and intermediate results

The initial outputs for O.1 are related to data management and include the definition of the metadata model of the IoE network, represented as a Knowledge Graph (KG) based on the SemIoE ontology [ 1 ]. SemIoE2 is an OWL2 ontology built by integrating several modules (e.g., SSN/SOSA, BOT, ORG) designed to provide a structured and standardized description of entities (agents, roles, smart devices, locations, access rights, preferences) and their relations, thereby supporting semantic interoperability at IoE level. Built on top of it, a micro-service architecture delivers both basic functionalities, such as authentication and authorization, and advanced capabilities. The Data Gathering platform [ 2 ] is 2Ontology specification is available at https://w3id.org/semioe. responsible for collecting heterogeneous data streams from a variety of sources, including traditional IoT sensors, computationally capable smart objects, wearable devices and IT modules such as BPM systems. It supports customized stream pre-processing (e.g., filtering, decryption, decompression), data stream collection, data post-processing (e.g., aggregation) and routing to appropriate DBMSs for persistent storage. Stream processing policies are governed by metadata stored in the KG, which describes the streams and their generators. Semantic-based Monitoring and Querying services allow users to access both real-time and historical data by formulating request using the KG terminology. These services enforce context-aware and role-based access control to ensure secure access only to data relevant to the user’s location and assigned tasks. The Anomaly Detection module [3] is responsible for monitoring the behavior of devices deployed within the system and includes a privacy-preserving delegation mechanism. This enables self-monitoring and self-healing capabilities for the IoE. The solution employs a Gated Recurrent Unit (GRU) model to identify the most likely sequences of communication packets and detect anomalies based on the model’s prediction errors. Additionally, to support collaboration among IoE devices in collectively training behavioral models via Federated Learning, the module integrates a homomorphic encryption mechanism, ensuring secure synchronization among agents.

The O.2 explores two complementary innovations in human-machine interaction within the context of Smart Industry 5.0. The first focuses on secure teleoperation of a robotic arm via IMUs worn by the user [4]. As a first output, a biometric authentication system based on logistic regression ensures access control, achieving an average Equal Error Rate of 8.89%, while task recognition with random forest reaches a macro F1-score of 75.60%. The second innovation adapts an arm gesture recognition system to run entirely on a consumer-grade Wear OS smartwatch, eliminating the need for cloud processing. The system runs eficiently on the edge, with only a slight drop in accuracy due to limited data.

An outcome of O.3 consists in the definition of a task recommendation module, which dynamically reallocates activities among workers. The module balances eficiency and sustainability through a lfexible and periodic negotiation process, allowing workers to refuse an activity if it exceeds a sustainable stress level, as monitored via wearable devices [5]. The system is modeled using Mixed Integer Linear Programming (MILP) with a hierarchical objective function, aimed at first maximizing the number of assignments and then minimizing the cost due to reassignments, levels of stress and possible overtimes.

Declaration on Generative AI The authors have not employed any Generative AI tools. 6. Supporting Energy Consumption Prediction: A Sustainable Approach

Zahra Ziran1,* , Massimo Mecella2 , Francesco Muzi1, Giuseppe Piras1

Project Context and Scientific Goals

Within the scope of the Sapienza Research project “Automatic electrical microgrid management system through machine learning techniques”, the broader goal is to develop intelligent systems for the automatic management of electrical microgrids through the application of data-driven methods. These systems must support eficient energy distribution while maintaining low computational overhead, making them suitable for deployment in embedded or real-time operational contexts. Within this framework, our work focuses on identifying which predictive models are most appropriate for such constrained environments by analyzing both their forecasting performance and their consumption of computational resources.

This project builds upon and extends the methodological foundation and experimental analysis established in research on sustainability-aware energy consumption prediction models [ 1 ]. The referenced study conducted a comparative evaluation of various machine learning and deep learning techniques applied to real-world residential energy datasets, underscoring the critical trade-ofs between predictive accuracy and computational resource demands. Expanding on these contributions, the present study introduces the Accuracy-Sustainability Trade-of Index (ASTI), a novel metric that formalizes this trade-of into a unified evaluative criterion. ASTI facilitates principled model selection by integrating accuracy, memory footprint, and power consumption into a single measure, thus aligning predictive performance with the practical requirements of resource-constrained environments such as smart microgrids and edge-computing infrastructures.

In addition to proposing the ASTI metric and validating it through extensive experimentation, the research sets the foundation for future directions in sustainable energy management. These include the incorporation of more advanced and hybrid predictive architectures, the evaluation of model behavior under diferent climatic conditions and building types, and the implementation of adaptive strategies capable of tuning themselves to changing operational constraints. Furthermore, we aim to investigate how the relative importance of accuracy, power, and memory afects the ranking of models under diferent scenarios by conducting sensitivity analyses on ASTI’s weighting parameters. Through this line of inquiry, the project contributes a principled and practical approach to integrating AI in the emerging field of smart and sustainable microgrid systems.

6.1. The Proposed Approach

To address the dual imperative of predictive accuracy and computational sustainability in microgrid environments, we propose a model selection framework grounded in the Accuracy-Sustainability Tradeof Index (ASTI). ASTI is a novel, multi-dimensional metric designed to evaluate forecasting models by jointly considering their accuracy, memory footprint, and power consumption. By consolidating these criteria into a single quantitative score, ASTI enables rigorous, sustainability-aware comparison across a diverse set of machine learning (ML) and deep learning (DL) architectures. Beyond standard evaluation metrics such as 2 and MSE, our approach incorporates empirical assessments of resource utilization to reflect deployment conditions typical of edge-based or embedded systems.

The conceptual development of ASTI is informed by earlier research that advocates for transparent, computationally eficient modeling practices in the context of energy management [ 2 ]. That work highlighted the viability of simple statistical models, showing that robust predictive performance can be achieved even with limited data and minimal computational burden. Building on this foundation, our approach extends the methodological horizon by providing a unified framework for assessing both traditional and advanced learning models under sustainability constraints. In doing so, we ofer a means to formalize and operationalize the trade-ofs often encountered in real-world applications—balancing accuracy with feasibility for long-term, resource-conscious deployment.

To empirically validate this methodology, we apply it to Energy4Rome, a rich dataset encompassing two years of detailed energy consumption data from four major residential complexes in Rome.3 The dataset includes not only granular consumption records but also auxiliary contextual information such as utility bills, occupancy behavior, and architectural specifications. Within this experimental setting, a range of ML (e.g., SVR, Random Forest, XGBoost) and DL (e.g., LSTM, GRU, TCN) models are trained, optimized, and evaluated. Results indicate that although DL models exhibit marginally superior predictive accuracy, tree-based ML models—particularly XGBoost—achieve the most favorable balance according to the ASTI score. These findings underscore the practical utility of our proposed framework in guiding model selection for sustainable, performance-conscious energy forecasting in smart microgrid contexts.

Acknowledgments.

This work has been supported by the Sapienza Research project “Automatic electrical microgrid management system through machine learning techniques” (CUP: B89J21032850001).

Declaration on Generative AI The authors have not employed any Generative AI tools.

3Energy4Rome dataset available at: https://github.com/zahraziran/Energy4Rome

Project Data

Acronym: S-PIC4CHU Duration: February 2025 – January 2027 (24 months) Funding agency: Min. dell’Univ. e della Ricerca (MUR), Bando PRIN 2022 – Scorrimento Project code: 2022XERWK9 Budget: e 210 694 (Funding: e 169 057) Keywords: Data Science, data preparation data quality, semantics, ontologies, inconsistency, incompleteness, knowledge graphs, provenance, explanation, bias

Project Summary

The efectiveness of data-driven solutions, in Data Science as well as in Machine Learning, clearly depends on the quality and interpretability of the underlying data. Unfortunately, real-world data is often incomplete, inconsistent, biased, or lacks adequate semantic description. Traditional data preparation workflows typically rely on ad hoc methods, limited automation, and minimal consideration of domain knowledge, resulting in ineficiencies and unreliable analytical outcomes.

The S-PIC4CHU project proposes embedding semantics at the core of each stage of the process: a novel architecture for data preparation, grounded in a semantics-based methodology that supports provenance tracking, integrity enforcement, and fairness assessment. This paradigm shift is based on the design and implementation of a semantically-aware Data Preparation Pipeline (DPP), integrated with a corresponding Semantic Transformation Pipeline (STP): each data transformation step is semantically annotated through mappings to ontologies and knowledge graphs, enabling enhanced traceability, transparency, and reasoning over data.

A major contribution of the project is the formalization and implementation of semantic enrichment techniques that provide domain-aware annotations to both structured and multimedia data and support advanced operations like semantic imputation of missing values, preference-based resolution of inconsistencies, and detection and mitigation of bias in datasets. Importantly, the project tackles fairness not merely as a downstream property of algorithmic outputs, but as a core feature of input data, thus addressing societal concerns related to discrimination and ethical decision-making in AI systems.

The methodology will be validated through two concrete use cases from diferent domains: healthcare and sustainable urban development. In collaboration with the Policlinico Universitario A. Gemelli (Rome), the project will address challenges in preparing complex medical data for predictive modeling and decision support. Simultaneously, the project will work with the IMM Design Lab at Politecnico di Milano to support policy-making processes in urban planning through the integration of heterogeneous environmental and social datasets.

S-PIC4CHU is expected to yield several impactful outcomes: open-source software tools implementing the proposed methodologies, scientific publications targeting top-tier venues in data management and artificial intelligence, and educational resources for training the next generation of data scientists. The project also emphasizes outreach and engagement with public institutions and private stakeholders to foster the adoption of fairness-aware and semantically-grounded data processing pipelines.

In line with the objectives of the SEBD Research Project Exhibition, S-PIC4CHU represents a forwardlooking initiative with the potential to influence both theoretical research and practical applications in the field of data science. By addressing fundamental issues related to data quality, interpretability, and fairness, the project contributes to the development of trustworthy and socially responsible data-driven technologies.

This work is supported by the Italian Ministry of University and Research (MUR) PRIN 2022 grant 2022XERWK9 “S-PIC4CHU - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, and Unbiased data science”.

Declaration on Generative AI

[1]

Piras ,

Muzi ,

Ziran , Open tool for automated development of renewable energy communities: Artificial intelligence and machine learning techniques for methodological approach , Energies 17 ( 2024 ) 5726 . doi: 10 .3390/en17225726, DOI: https://doi.org/10.3390/en17225726.

[2]

Ziran ,

Mecella ,

Leotta , A simplified and sustainable approach for energy prediction , in: Intelligent Environments 2024: Combined Proceedings of Workshops and Demos & Videos Session , IOS Press, 2024 , pp. 104 - 113 . doi: 10 .3233/AISE240022, DOI: https://doi.org/10.3233/AISE240022.

7. The S-PIC4CHU

Project

: Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science Gianvincenzo Alfano1 , Ilaria

Bartolini2

, Diego Calvanese3 , Paolo

Ciaccia2

, Sergio

Greco1

, Davide

Lanti3

, Emilia

Lenzi4

, Davide

Martinenghi4

, Christian

Molinaro1

, Marco

Patella2

, Letizia

Tanca4

, Riccardo

Torlone5

, Irina Trubitsyna1 1University of Calabria, DIMES, Rende (CS), Italy 2Alma Mater Studiorum University of Bologna, DISI, Bologna, Italy 3Free University of Bozen-Bolzano, Faculty of Engineering, Bolzano, Italy 4Politecnico di Milano , DEIB, Milano, Italy 5Roma Tre University, DICITA, Roma, Italy