<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an AI-assisted T1DM Data Analysis Pipeline</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Khristodulo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giancarlo Mascetti</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgio Delzanno</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marta Bassi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Minuto</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Università degli Studi di Genova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Istituto Giannina Gaslini Genova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>9</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>This paper presents a data-driven approach for analyzing diabetes-related data, with a particular focus on pediatric patients with Type 1 Diabetes Mellitus (T1DM). We propose a methodology that integrates data from multiple wearable devices to examine the impact of physical activity and the use of medical technologies on patient health. Our approach utilizes the JupyterHub framework to combine data analytics and artificial intelligence (AI) within a secure, privacy-preserving environment. We conducted a preliminary case study based on data collected during a pediatric diabetes camp held in September 2024. The analysis employs advanced visualization tools (Plotly and hvPlot) and cross-correlation techniques to uncover patterns between physiological and glycemic parameters. The proposed system enables healthcare professionals to receive graphical and textual insights through an AI assistant, ultimately supporting more informed clinical decision-making and enhancing the quality of care for diabetic children.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Background and Motivation</title>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Proposed Approach</title>
        <p>To address these challenges, we introduce a methodological framework that integrates physical
activity data and glycemic monitoring through a combination of traditional data analysis and artificial
intelligence, with a focus on locally deployable learning models.</p>
        <p>Methodology</p>
        <sec id="sec-1-2-1">
          <title>Specifically, the tasks to be accomplished are as follows: 1. Data integration - combine and preprocess collected data ensuring consistency and compatibility for further analysis,</title>
          <p>2. Data analysis - evaluate the relationship between physical activity levels and blood glucose
lfuctuations to determine the impact of exercise in diabetic patients,
3. Support system development - develop an AI-driven support system that will be able to assist
medical specialists by providing real-time insights, graphical representations, and textual explanations
based on data analysis,
4. Evaluation of Machine Learning methods - compare the efectiveness of traditional data analysis
techniques with AI-driven approaches, accessing the accuracy, usability and practical value of
the provided results.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Novel Contribution</title>
        <p>The novelty of our work lies in its focus on pediatric diabetes, where the unique challenges of managing
lfuctuating glucose levels, growth-related metabolic changes, and physical activity patterns demand
tailored solutions. Furthermore, the integration of local AI-enabled platforms with wearable technologies
not only enhances glycemic control in diabetic patients, but also assists healthcare professionals by
ofering evidence-based recommendations through an intuitive and user-friendly chatbot interface.</p>
      </sec>
      <sec id="sec-1-4">
        <title>1.4. Plan of the paper.</title>
        <p>In Section 2 we discuss related work. In Section 3 we present our case-study involving multi-modal
sensor data. In Section 4 we describe the data processing pipeline designed for our case-study that
involves data fusion, analysis, and visualization and the results obtained with classifical statistical
methods. In Section 6 we present preliminary ideas and experiments related toward th integration of
an AI agents based on SLM and RAG customized on our domain. In Section 7 we address limitations,
conclusions and future work.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        During the last decades, we were observing a huge progress both in informatics and medicine spheres.
They became inseparable parts of each other and one of the most vital benefits of such an alliance
is the significant growth of life expectancy. Artificial Intelligence (AI) is nowadays actively used to
support the research on Type 1 Diabetes. According to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], efective adoption of AI requires profound
research, information security, collaboration across disciplines, and a commitment to patient-centered
approaches. AI has been identified as a transformative force across eight key domains in diabetes
care: 1) Diabetes Management and Treatment, 2) Diagnostic and Imaging Technologies, 3) Health
Monitoring Systems, 4) Developing Predictive Models, 5) Public Health Interventions, 6) Lifestyle
and Dietary Management, 7) Enhancing Clinical Decision-Making, and 8) Patient Engagement and
Self-Management. Each domain showcases AI’s potential to revolutionize care, from personalizing
treatment plans and improving diagnostic accuracy to enhancing patient engagement and predictive
healthcare . In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] the authors provides a detailed review of 77 research papers showing how AI can be
applied to enhance and personalize diabetes treatment. Two key trends of using AI were pointed out:
therapy personalization and therapeutic algorithm optimization, while also some lack of interoperability
and multi modal database analysis was detected, which indicates that existing studies predominantly
focus on single data sources rather than integrating diverse datasets to provide a comprehensive
understanding of diabetes management. This limitation reduces the potential for AI models to capture
complex interactions between physiological, behavioral, and clinical factors, ultimately restricting
their ability to generate precise and personalized therapeutic recommendations. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] the authors
highlight the widespread adoption of supervised learning models, such as Random Forest and Support
Vector Machines (SVM), which consistently demonstrate high accuracy and reliability in predicting
the diabetes risk. Ensemble learning methods, particularly Gradient Boosting, emerged as superior
techniques for predictive performance, while deep learning models, including Convolutional Neural
Networks (CNNs) and Recurrent Neural Networks (RNNs), proved efective in analyzing unstructured
data such as medical images and time-series glucose data. These advances have been showcased
in various conferences, including the Advanced Technologies and Treatments for Diabetes (ATTD)
2025 conference, where AI-driven platforms demonstrated remarkable potential in real-time diabetes
management. AI systems that analyze the vast amounts of data produced by wearable devices and
medical devices have also contributed to improved decision-making and better clinical outcomes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In
the medical domain, insulin pumps, such as those developed by Medtronic, have revolutionized the
treatment of diabetes. These devices provide continuous, controlled insulin delivery, reducing the need
for frequent injections and improving glycemic control. Medtronic’s insulin pumps, equipped with
advanced algorithms and sensors, enable dynamic adjustments to insulin delivery based on real-time
glucose measurements, significantly enhancing the precision of diabetes management [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Medtronic
employs both AI-based and traditional control algorithms to optimize insulin dosing. Non-AI-based
algorithms, such as proportional-integral-derivative (PID) controllers and model predictive control
(MPC), rely on mathematical models and predefined rules to regulate insulin delivery. In contrast,
AI-driven algorithms, including machine learning-based adaptation systems, leverage historical and
real-time data to predict glucose fluctuations and personalize insulin administration more efectively.
For instance, The MiniMed™ 780G system features an algorithm that automatically adjusts basal insulin
delivery and provides auto-correction boluses based on continuous glucose monitoring data, aiming to
maintain glucose levels within a target range [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Additionally, Medtronic has introduced a smart insulin
pen that integrates glucose sensor data, utilizing AI to assist patients with type 1 diabetes who rely
on multiple daily injections [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. On the other hand, wearable devices such Comftech’s Howdy Senior
textile devices could expand the scope of diabetes care by incorporating physical activity monitoring
into routine management. These devices track key physiological parameters, providing real-time data
that can help assess the impact of physical activity on blood glucose levels. The integration of these
wearables with AI-based platforms ofers a holistic approach to diabetes care, allowing healthcare
providers to tailor insulin delivery and lifestyle recommendations based on comprehensive data from
both medical and activity-monitoring devices [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Numerous studies have demonstrated the critical
role of physical activity in managing diabetes, particularly in children and adolescents. Exercise helps
improve insulin sensitivity, lowers blood glucose levels, and enhances cardiovascular health, reducing
the risk of diabetes-related complications. A systematic review by Sun Z. confirmed that regular and
multi-component physical activity improves glycemic control, reduces HbA1c levels and allows to
delay cognitive decline in individuals with diabetes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Moreover, studies have highlighted the positive
efects of exercise on the mental health, quality of life and a general well-being of diabetic patients,
especially in the pediatric population [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The interaction between insulin therapy and physical activity
has been a subject of intense research, with recent studies showing that wearable devices can help
optimize exercise plans by providing personalized feedback based on real-time glucose monitoring.
These AI-driven technologies enable caregivers and patients to adjust insulin doses dynamically to
ensure safe participation in physical activities while avoiding hypoglycemia or hyperglycemia [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>This study focuses specifically on pediatric diabetes, an area where managing blood glucose levels can
be particularly challenging. Children and adolescents with diabetes experience unique physiological and
behavioral factors that complicate glycemic control, including varying levels of physical activity, growth
spurts, and psychosocial influences. The integration of advanced AI algorithms, insulin pumps, and
wearable devices ofers a promising solution to these challenges, allowing for continuous monitoring
and personalized treatment plans tailored to the specific needs of pediatric patients.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Case Study: Glooko Platform and Howdy Senior Wearable Data</title>
      <sec id="sec-3-1">
        <title>3.1. Data Hub</title>
        <p>
          In our project, we leverage JupyterHub as a central platform for acquiring, processing, and analyzing
wearable data in a secure environment. This approach enables collaboration among researchers and
clinicians while ensuring that sensitive health data remains within institutional boundaries. The Jupyter
ecosystem (Hub, Lab and Notebook) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] provides an interactive computing environment that supports
the documentation, analysis, and visualization of complex datasets. Jupyter Notebooks are particularly
well-suited for implementing the Findable, Accessible, Interoperable, and Reusable (FAIR) principles
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. JupyterHub extends these capabilities to multiple users, allowing centralized management of
computational notebooks, authentication through OAuth, and customized virtual environments for
each user.
        </p>
        <p>Our JupyterHub instance is deployed on a virtual machine hosted by the joint laboratory of the
Gaslini Pediatric Hospital and our university department. The system integrates data collected from two
sources: insulin pumps and glucose monitors, and a textile wearable developed by ComfTech. These
datasets are initially stored on proprietary cloud platforms managed by the device manufacturers. We
redirect these data to the JupyterHub instance, ensuring local control and privacy. Figure 1 illustrates
the system architecture: JupyterHub provides pre-configured environments for data ingestion, fusion,
and analysis using the Python ecosystem. The platform allows the generation of analytical reports,
which serve as inputs for AI-based assistants. These assistants are designed to support diabetologists by
ofering structured insights, thereby reducing the cognitive load of interpreting raw data. JupyterLab
also allows professionals to visualize both raw and aggregated data, giving them greater flexibility in
exploring patterns related to glycemic control and physical activity.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Collection</title>
        <p>
          The data used in this study were collected through a collaborative efort between the Gaslini
Pediatric Hospital and Comftech, a non-participatory spin-of of the Polytechnic University of Milan that
specializes in wearable monitoring devices [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. To facilitate data acquisition, a pediatric diabetes
summer camp was organized by Gaslini Hospital in Sarzana, Italy. The primary goal was to monitor
the physiological and metabolic responses of five pediatric patients with T1DM during daily physical
activities. Throughout the camp, all participants were supervised by medical staf and wore two types
of devices: insulin pumps with continuous glucose monitors (CGMs) and Comftech’s Howdy Senior
sensorized garments. These garments measure multiple physiological parameters such as heart rate
variability, ECG, respiration rate, stress index, and movement [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Post-camp, the insulin-related data were downloaded from the Glooko platform, while physical
activity data were obtained from Comftech’s proprietary system. Glooko is a diabetes management
platform designed to facilitate the visualization, interpretation, and management of diabetes-related
data from medical devices. A screenshot of the Glooko interface is shown in Figure 3.</p>
        <p>The interface displays the data over a five-day period, including continuous glucose monitoring
(CGM), carbohydrate intake, and insulin levels.</p>
        <p>All data were anonymized and securely stored in a local repository for further analysis. For this case
study, we focus on the dataset from a single anonymized patient to demonstrate our methodology.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Howdy Senior Dataset</title>
        <p>Our dataset contains multi-modal time-series data captured over five days. Data formats included CSV,
Excel, and JSON, with each device contributing distinct physiological and metabolic metrics. These
datasets were organized for preprocessing and analysis within our JupyterHub environment. The
dataset includes diferent time-series data on diferent physiological measurements. These parameters,
recorded with specific timestamps, are structured as follows: Heart Rate, Breath Rate, Movement.
The data encompasses the following fields: date, value, measure type, and timestamp. The dataset is
structured in a tabular format with 165,138 entries and 4 columns, with each entry corresponding to a
specific timestamp and the corresponding measurement values.</p>
        <p>Continuous heart rate variability (HRV). The continuous HRV data includes the HRV score, stress
index, and high-frequency (HF) power. The data consists of date, value, measure type, and timestamp
ifelds, with a total of 447 records. Certain physiological signals were not directly available for download
from the platform and were instead provided as raw data by the Comftech administrator in .csv and
.json formats. These additional signals include:
ECG trace (tacogram). The ECG data consists of single-lead ECG traces sampled at a rate of 128 Hz.
The values are expressed in millivolts (mV). This dataset provides continuous physiological data over a
substantial period, with 1,200,256 data points captured.</p>
        <p>Respiratory signal. The respiratory data was acquired at a sampling rate of 13 Hz, with the values
expressed in analog-to-digital converter (ADC) levels. This data captures respiratory activity over time,
represented by 129,766 entries and three columns (date, value, and measure type).
Acceleration signals The dataset includes triaxial accelerometer signals, capturing lateral (X-axis),
vertical (Y-axis), and antero-posterior (Z-axis) accelerations. These signals were sampled at 25 Hz, and
the values are expressed in gravitational units (g). There are 229,300 entries across three axes.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Glooko Dataset.</title>
        <p>The data from the Glooko platform was downloaded in .csv format. The dataset includes various
parameters related to insulin delivery, blood glucose monitoring, and alarms/events. These data are
structured as follows:
Insulin data The insulin-related data includes two main categories: basal and bolus insulin delivery,
as well as overall insulin usage. The dataset provides information about the type of insulin used,
the amount administered, and related temporal variables. Basal Insulin Data captures information
on basal insulin delivery, which controls overall blood glucose levels, including the date and time of
administration, type of insulin used, duration (in minutes), percentage of dosage, frequency, total insulin
administered, and serial number. The dataset consists of 466 records. Bolus Insulin Data includes
records on insulin bolus injections, which manages spikes caused by eating, with fields for the date and
time, type of insulin, pre-meal blood glucose level (in mg/dL), carbohydrate consumption (in grams),
carbohydrate-to-insulin ratio, total insulin administered, initial bolus delivery, extended bolus delivery,
and serial number. This dataset contains 37 entries. Total Insulin Data includes information on the total
bolus and basal insulin administered, as well as the overall insulin usage over time. It consists of six
records with date, time, and the corresponding insulin values.</p>
        <p>The blood glucose data This data contains entries for manually recorded blood glucose levels. The
dataset includes the date and time of each measurement, the glucose value (in mg/dL), and whether the
reading was taken manually. This dataset contains 32 records.</p>
        <p>Continuous Glucose Monitoring (CGM) Data The CGM data records continuous glucose
measurements obtained through a CGM system. The dataset includes the date and time of each glucose reading,
the corresponding glucose value (in mg/dL), and the serial number of the device. This dataset comprises
1,067 entries. In summary, the dataset provides a wide and diverse range of both physiological and
diabetes related measures, all of which were collected using various sensors mounted on the subject. The
comprehensive nature of the dataset enables robust analysis and modeling of the subject’s physiological
states over time.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Data Processing Pipeline</title>
      <p>Raw datasets were not immediately compatible with analytical workflows in JupyterLab. To address
this, we employed a preprocessing pipeline using the ”Amphi” tool, which supports interaction with
CSV files and allows for the construction of custom data workflows.</p>
      <p>First, the datasets were separated into nine measurement categories: breath frequency, step count,
heart rate, movement index, stress index, heart rate variability, basal insulin, bolus insulin, and CGM
data. Each category was processed independently to ensure consistency in formatting and timestamps.
Figure 4 illustrates the pipeline used to preprocess the basal and bolus data. The process begins by
importing the original .csv file. The first step in the pipeline involves converting the date field from a
string format to a Date/Time format. The “Select Columns” block is then employed to retain only the
relevant columns—those containing bolus/basal data and corresponding timestamps. In the “Rename
Columns” block, the selected columns are renamed for clarity and convenience. Finally, the modified
.csv file, containing the preprocessed measurement data, is saved. This pipeline is subsequently applied
to the CGM data. In contrast, the preprocessing algorithm for physical data derived from the Comftech
platform requires adjustments (Figure 5). For instance, when processing the stress index and heart rate
variability measures, the “Filter Rows” block isolates the required parameters from the full dataset.</p>
      <p>The “Python Transforms” block is then used to convert the date column from string format into
Date/Time format to ensure consistency in date formatting across the dataset. This procedure is repeated
for the remaining six parameters. Upon completion of these steps within the Amphi environment,
the nine preprocessed .csv files are transferred to an .ipynb notebook for additional processing. More
specifically, the columns in each file were reordered to ensure that the date field appears first. Duplicate
entries in the “Breath Frequency” file were removed due to the detection of repeated values. Furthermore,
in the “Bolus” and “Basal” files, periods in the data were replaced with commas, and the data type was
converted to “float64” for consistency. This pipeline resulted in a clean, structured dataset composed of
nine CSV files, ready for detailed time-series analysis and machine learning tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Data Analysis</title>
      <p>This section presents the outcomes of a detailed analysis of the processed dataset, conducted in the
Jupyter Notebook environment using Python. We employed a range of data visualization and statistical
tools to identify correlations and temporal trends within the physiological and glycemic data. Interactive
visualizations were created using Plotly and hvPlot, while cross-correlation and correlation matrix
techniques were used to explore interdependencies among variables.</p>
      <p>To begin, all nine preprocessed CSV files were loaded into the notebook. A multi-axis scatter plot
was generated using Plotly to visualize selected variables—such as glycemia, insulin delivery, heart rate
variability (HRV), and stress index—across the five-day camp period. Each data point on the graph is
interactive, displaying the associated value, date, and time. The plot includes four drop-down selectors
for the axes and a checkbox to toggle between viewing the entire dataset or specific time intervals.</p>
      <p>For example, in one configuration (Figure 6), glycemia, insulin levels, and movement index are
visualized together. Glycemia levels (blue points) display a cyclical pattern, likely reflecting daily
lfuctuations associated with meals, activity, and insulin injections. The corresponding insulin doses
(orange points) indicate reactive administration in response to glycemic peaks. The movement index
(purple points) suggests a possible relationship between physical activity and blood glucose regulation.
The plot is equipped with five interactive widgets: four drop-down menus for selecting the axes and
one checkbox for toggling between viewing data for all days or specific days. Users also have the option
to exclude axes if fewer parameters need to be visualized.</p>
      <p>As an example, the plot can display glycemia alongside administered insulin and movement index
(Figure 7). The glycemia values, represented by blue points, exhibit a cyclic pattern with peaks and
troughs, suggesting diurnal variations in blood glucose levels. It ranges between 0 and 261 units. Periods
of increased glycemic values are interspersed with lower readings, corresponding to physiological or
behavioral rhythms, such as meals or insulin administration. The erogated insulin in orange points
suggest discrete delivery of insulin in response to elevated glycemia values. The movement index in
purple points displays cyclical fluctuations, with periods of higher activity potentially coinciding with
glycemic trends. This could indicate a relationship between physical activity and glycemic control.</p>
      <p>To enable more flexible visual exploration, a merged dataset was created using timestamp alignment
(nearest match method with 1-second resolution). New categorical columns were added to this dataset
to represent diferent glycemia and movement levels (e.g., low, normal, high). This merged data was
visualized using hvPlot, which provides dynamic plot types tailored for time-series analysis. The
hvPlot scatter plot, visualizing all merged measures, is shown in Figure 8. This visualization method is
particularly advantageous due to its versatility, allowing data to be explored through various plot types.</p>
      <p>We then applied cross-correlation analysis to examine time-lagged relationships between variables.
The merged dataset was split into parameters which were then visualized separately from each other.</p>
      <p>The plot in Fig. 9 represents the cross-correlation between HeartRate and MovementIndex over a
range of time lags, with a maximum lag of 150. Cross-correlation measures the similarity between
two signals as one is shifted relative to the other, providing information on potential time-dependent
relationships between variables.</p>
      <p>Fig. 10 shows the cross-correlation between heart rate and glycemia over a range of time lags reveals
the highest positive correlation at a lag of 123, indicating that an increase in heart rate is followed by
a corresponding increase in glycemia approximately 123 time units later. In contrast, the strongest
negative correlation occurs with a delay of 8, suggesting that an increase in heart rate precedes a
decrease in glycemia within this short time frame. The correlation pattern fluctuates over diferent
lags, with a significant negative correlation around lag 0, which may indicate an immediate inverse
relationship between changes in heart rate and glycemia.</p>
      <p>In contrast, the plot, which represents the cross-correlation between movement index and glycemia
in Fig. 11, exhibits a more erratic pattern. Unlike the structured periodicity observed in the heart
rateglycemia relationship, the correlation between movement and glycemia appears less consistent, with
pronounced spikes particularly around lag 0. The presence of positive and negative correlations across
diferent time lags suggests that the efect of movement on glycemia is influenced by additional factors,
such as insulin administration, food intake, and individual physiological responses. The irregular nature
of these fluctuations implies that movement alone may not be a reliable predictor of glycemia changes
compared to heart rate.</p>
      <p>The cross-correlation values are predominantly positive across all lags, indicating that HeartRate and
MovementIndex are generally positively correlated. This aligns with the physiological expectation that
higher levels of physical activity are associated with an increase in heart rate. The cross-correlation
peaks around positive lags (e.g., 50 to 100), suggesting that changes in the MovementIndex tend to
precede corresponding changes in HeartRate. This implies that heart rate increases with some delay
after physical activity intensifies. The correlation matrix in Fig. 12 represents the relationships between
all the variables. The values in the matrix range from -1 to 1. The strongest correlation is observed
between Glycemia and Basal Insulin (0.79), highlighting a strong direct relationship. The stress index
demonstrates a strong negative correlation with glycemia (-0.72), suggesting that increased stress
levels are associated with lower glycemia, potentially due to metabolic changes or increased energy
expenditure under stress. Interestingly, movement index has a moderate positive correlation with HRV
score (0.67), suggesting that increased movement is associated with improved heart rate variability,
which is often considered a marker of better autonomic function. Breathing frequency correlates
positively with insulin delivery (0.59), possibly reflecting an association between metabolic demands
and respiration. However, its correlation with stress is weak and negative (-0.17), suggesting that stress
levels do not directly influence respiratory rate in a significant way.</p>
      <p>There is a moderate positive relationship between HeartRate and MovementIndex (0.42), reflecting
the physiological response of increased heart rate during physical activity. These insights highlight the
potential of integrated multi-modal data to support more personalized, data-driven diabetes care.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Preliminary Experiments Towards an AI assistant</title>
      <p>
        In our preliminary experiments towards the creation of an AI-based assistants, we followed an approach
based on the combination of Small Language Models (SLMs), specialized embeddings and
RetrievalAugmented Generation (RAG) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. SLMs can perform well on standard devices, without requiring
large-scale infrastructure and can be seen as a way to make machine intelligence accessible and afordable
to anyone. While model compression techniques have enabled the development of smaller models that
are more eficient and can maintain competitive performance, both Large Language Models (LLMs) and
Small Language Models (SLMs) struggle with answer reliability. Answer reliability refers to a model’s
ability to provide accurate, current, and verifiable responses that can be related to attested sources. RAG
has been proposed to solve the problem of traceability since they can trace the knowledge from which
the statement has been generated. Furthermore. RAG combine the generative capabilities of LLMs with
information retrieval techniques. This method enhances the model’s ability to provide accurate and
contextually relevant responses by retrieving information from external knowledge bases. Furthermore,
      </p>
      <sec id="sec-6-1">
        <title>SLMs and RAG can be used to ensure data privacy when working with custom models. In our setting, first of all we created a documental repository by automatically generating, via ad hoc Python scripts, notebooks with data e discussed in the previous section. More precisely, the data encompassed various physiological parameters, including:</title>
        <p>• Heart Rate: Number of heartbeats per minute over the dataset period (five days);
• Breath Frequency: Number of breaths per minute:
• Insulin Administration: Total insulin administered during the monitoring period;
• Blood Glucose Levels (Glicemia): Concentration of glucose in the blood;
• Physical Activity (Step Count): Number of steps taken during the monitoring period;
• Movement Index: Movement intensity during the monitoring period;
• Stress Index: Stress levels;
• HRV Score: Heart Rate Variability (HRV) score reflecting the variation in time intervals between
heartbeats;
• Basal Insulin Administration: Basal insulin administered.</p>
        <p>For this experiment, the knowledge base was exclusively based on the provided biometric data. During
system prompting, additional contextual information was given more precisely, the glycemic values
were categorized as follows.</p>
        <p>• For a healthy person:</p>
        <p>The retrieval mechanism was designed to return relevant data for answering queries. Initially, data
was formatted as CSV, but this approach resulted in poor retrieval performance. Switching to a more
descriptive format, where each measurement was stated in natural language, e.g. “Heart Rate Max Heart
Rate 2024-09-09: 232.0, Min Heart Rate 2024-09-09: 42.0, Mean Heart Rate 024-09-09: 90.37208218293951”,
improved the retrieval system’s accuracy. We include some example of PDF generated by the resulting
notebooks containing max, min and mean values for all parameters for all days of a given (anonymous)
user in Fig. We then formulate the list of 8 questions shown in Fig. 14 based on the considered data
with the answers taken as ground truth.</p>
        <p>With the help of the LangChain framework and Python libraries for extracting the knowledge
based from our documents, we created a ChromaDB vector database to generate a retriever (using the
”as_retriever” method of the ChromaDB vd package) to generate contexts associated to the questions in
our test list.</p>
        <p>The context associated to a given question can then be used to specialize the answer submitted to
a given LLM/SLM. Finally, we combined both the resulting retriever with a compressor to re-rank
documents based on their relevance to a given query. The purpose of this combined structure is to
streamline the retrieval and ranking process. This two-step process improves the quality of the final
results, ensuring that the documents returned are not only relevant but also ranked according to their
true relevance to the user’s query.</p>
        <p>We then perform a series of tests with diferent SLM models (and diferent sizes/number of parameters)
considering an embedding fine-tuned on medical data (more precisely MedEmbed-small-v0.1). For the
considered list of questions the context retrieved by our retriever pipeline from the vector database
turned out to contain the relevant part with respect to the question. Concerning hallucinations, the
best results have been obtained with the Phi-3.5-mini-instruct model (0.25% wrong answers). These
results are very preliminary since our dataset is still under construction and we are currently collecting
additional data to perform more extensive training sessions and to create pipelines for redirecting
queries to custom AI agents depending on the form of considered questions (e.g. descriptive question,
query on tabular data, etc). As stated in the previous section, the LLM component should be seen as a
conceptual demonstration rather than a final solution. A RAG system tailored specifically for biometric
data analysis could be expected to provide more accurate and reliable responses.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Future Directions and Conclusion</title>
      <p>This study demonstrates the feasibility of integrating wearable sensor data with AI-assisted analysis to
support pediatric diabetes management. Our framework, which combines traditional data processing
with machine learning techniques within a Jupyter-based environment, ofers valuable insights into
the interplay between physiological signals and glycemic trends. While the current implementation
focuses on a single-patient case study, the architecture is designed to scale across larger cohorts and
additional device types.</p>
      <p>A key challenge remains in bridging structured data analysis with natural language interpretation.
Although our JupyterHub environment eficiently handles pre-processing and visualization, translating
these findings into actionable recommendations through AI-powered conversational agents is still
in early development. Future work will focus on the integration of small language models (SLMs)
trained on domain-specific data to facilitate interactive querying and personalized decision support.
Furthermore, the integration of the Jupyter ecosystem with data centric architectures such as Spark,
Streaming Spark or Dask and data acquisition engine such as Kafka and RabbitMQ are currently under
consideration to support large scale processing in real time.</p>
      <p>We also plan to evaluate the scalability, usability, and clinical value of the platform in a broader study,
incorporating feedback from diabetologists and healthcare providers. Additionally, integrating more
advanced time-series modeling techniques and privacy-preserving machine learning approaches will
be explored.</p>
      <p>In conclusion, this work presents a novel methodology for leveraging wearable technology and
AI to enhance pediatric diabetes care. The findings underscore the importance of multi-modal data
integration and local computation for privacy and clinical ubtility. With further refinement, this system
has the potential to evolve into a robust clinical decision support tool that empowers both healthcare
professionals and patients.</p>
      <p>Ultimately, integrating such platforms into hospital IT infrastructures and electronic health record
systems could bring AI-driven, personalized treatment planning into routine clinical workflows, supporting
more adaptive and responsive care for young patients with diabetes.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The paper has been written by the authors. Generative AI has been used for language corrections
(DeepL) and to support code generation using Copilot.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A. M.</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence for diabetes: Enhancing prevention, diagnosis, and efective management</article-title>
          ,
          <source>Computer Methods and Programs in Biomedicine Update</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.cmpbup.
          <year>2024</year>
          .
          <volume>100141</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Campanella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Paragliola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cherubini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pierleoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Palma</surname>
          </string-name>
          ,
          <article-title>Towards personalized ai based diabetes therapy: A review</article-title>
          ,
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          <volume>28</volume>
          (
          <year>2024</year>
          )
          <fpage>6944</fpage>
          -
          <lpage>6957</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sohel</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Hasan</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Islam</surname>
          </string-name>
          ,
          <article-title>Machine learning and artificial intelligence in diabetes prediction and management: A comprehensive review of models</article-title>
          ,
          <source>Journal of Next-Gen Engineering Systems</source>
          <year>2024</year>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Integration of wearable devices and artificial intelligence for continuous monitoring in diabetes management</article-title>
          ,
          <source>ATTD 2025 Conference Proceedings</source>
          (
          <year>2025</year>
          )
          <fpage>56</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Medtronic</surname>
          </string-name>
          ,
          <article-title>Insulin pumps: Revolutionizing diabetes care</article-title>
          .
          <source>medtronic technologies</source>
          ,
          <year>2025</year>
          . URL: https://europe.medtronic.com/xd-en/index.html, accessed:
          <fpage>2025</fpage>
          -01-16.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Medtronic</surname>
          </string-name>
          ,
          <article-title>Fda approves medtronic minimed™ 780g system - world's first insulin pump with meal detection technology* featuring 5-minute auto corrections†</article-title>
          §,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Medtronic</surname>
          </string-name>
          ,
          <article-title>Ai is unlocking the future of health tech</article-title>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Comftech</surname>
          </string-name>
          ,
          <article-title>Wearable devices for monitoring physical activity and diabetes, 2025</article-title>
          . URL: https: //comftech.com/en/projects/solutions/, accessed:
          <fpage>2025</fpage>
          -01-16.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Sun</surname>
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <article-title>The efect of multi-component exercise on cognition function in patients with diabetes: A systematic review and meta-analysis</article-title>
          ,
          <source>PLoS ONE 19</source>
          (
          <year>2024</year>
          ). doi:https://doi.org/10.1371/journal.pone.
          <volume>0304795</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abhishek</surname>
          </string-name>
          ,
          <article-title>Impact of aerobic exercise on physical health, cardiorespiratory parameters, and health-related quality of life among children with diabetes mellitus: A narrative review</article-title>
          ,
          <source>Journal of Diabetology</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>325</fpage>
          -
          <lpage>334</lpage>
          . doi:
          <volume>10</volume>
          .4103/jod.jod_
          <volume>74</volume>
          _
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ahmed</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aziz</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abd-alrazaq</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farooq</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheikh</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Overview of artificial intelligence-driven wearable devices for diabetes: Scoping review</article-title>
          ,
          <source>J Med Internet Res</source>
          <volume>24</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] , Jupyter ecosystem,
          <year>2025</year>
          . URL: https://jupyter.org/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>FAIR</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fair</surname>
            <given-names>principles</given-names>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://www.go-fair.org/fair-principles/, accessed:
          <fpage>2025</fpage>
          -04-07.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>ComfTech</surname>
          </string-name>
          , About us,
          <year>2024</year>
          . URL: https://comftech.com/en/about-us/, accessed:
          <fpage>2025</fpage>
          -01-16.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>ComfTech</surname>
          </string-name>
          , Sport,
          <year>2024</year>
          . URL: https://comftech.com/en/sport/, accessed:
          <fpage>2025</fpage>
          -01-16.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for large language models: A survey</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2312.10997. arXiv:
          <volume>2312</volume>
          .
          <fpage>10997</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>